jaxdem.rl.environments.multi_roller#

Multi-agent 3-D rolling environment with LiDAR sensing.

Functions

frictional_wall_force(pos, state, system)

Normal, frictional, and restitution forces for spheres on a \(z = 0\) plane.

Classes

MultiRoller(state, system, env_params, ...)

Multi-agent 3-D rolling environment with cooperative rewards.

class jaxdem.rl.environments.multi_roller.MultiRoller(state: State, system: System, env_params: dict[str, Any], n_lidar_rays: int)#

Bases: Environment

Multi-agent 3-D rolling environment with cooperative rewards.

Each agent is a sphere resting on a \(z = 0\) floor under gravity. Actions are 3-D torque vectors; translational motion arises from frictional contact with the floor (see frictional_wall_force()). Viscous drag -friction * vel and angular damping -ang_damping * ang_vel are applied every step. Objectives are assigned one-to-one via a random permutation. Each agent receives a random priority scalar at reset for symmetry breaking.

Reward

\[R_i = w_s\,(e^{-2d_i} - e^{-2d_i^{\mathrm{prev}}}) + w_g\,\mathbf{1}[d_i < f \cdot r_i] - w_c\,\left\|\sum_j l_j\,\hat{r}_j\right\| - w_w\,\|a_i\|^2 - \bar{r}_i\]

where \(l_j\) and \(\hat{r}_j\) are the LiDAR readings and ray directions respectively, and \(\bar{r}_i\) is an EMA baseline updated with factor \(\alpha\). All weights (\(w_s, w_g, w_c, w_w, \alpha, f\)) are constructor parameters stored in env_params.

Notes

The observation vector per agent is:

Feature

Size

Unit direction to objective (x, y)

2

Clamped displacement (x, y)

2

Velocity (x, y)

2

Angular velocity

3

Own priority

1

LiDAR proximity (normalised)

n_lidar_rays

Radial relative velocity

n_lidar_rays

LiDAR neighbour priority

n_lidar_rays

n_lidar_rays: int#

Number of angular bins for each LiDAR sensor.

classmethod Create(N: int = 64, min_box_size: float = 1.0, max_box_size: float = 1.0, box_padding: float = 5.0, max_steps: int = 5760, friction: float = 0.2, ang_damping: float = 0.07, shaping_weight: float = 1.5, goal_weight: float = 0.001, crowding_weight: float = 0.005, work_weight: float = 0.0005, goal_radius_factor: float = 1.0, alpha_r_bar: float = 0.07, lidar_range: float = 0.3, n_lidar_rays: int = 8) MultiRoller[source]#

Create a multi-agent roller environment.

Parameters:
  • N (int) – Number of agents.

  • min_box_size (float) – Range for the random square domain side length sampled at each reset().

  • max_box_size (float) – Range for the random square domain side length sampled at each reset().

  • box_padding (float) – Extra padding around the domain in multiples of the particle radius.

  • max_steps (int) – Episode length in physics steps.

  • friction (float) – Viscous drag coefficient applied as -friction * vel.

  • ang_damping (float) – Angular damping coefficient applied as -ang_damping * ang_vel.

  • shaping_weight (float) – Multiplier \(w_s\) on the potential-based shaping signal.

  • goal_weight (float) – Bonus \(w_g\) for being on target.

  • crowding_weight (float) – Penalty \(w_c\) per unit of LiDAR crowding vector norm.

  • work_weight (float) – Weight \(w_w\) of the quadratic action penalty \(\|a\|^2\).

  • goal_radius_factor (float) – Multiplicative factor \(f\) applied to the particle radius to define the goal activation threshold \(d < f \cdot r\).

  • alpha_r_bar (float) – EMA smoothing factor \(\alpha\) for the differential reward baseline \(\bar{r}\).

  • lidar_range (float) – Maximum detection range for the LiDAR sensor.

  • n_lidar_rays (int) – Number of angular LiDAR bins spanning \([-\pi, \pi)\).

Returns:

A freshly constructed environment (call reset() before use).

Return type:

MultiRoller

static reset(env: MultiRoller, key: Array | ndarray | bool | number | bool | int | float | complex | TypedNdArray) Environment[source]#

Reset the environment to a random initial configuration.

Parameters:
  • env (Environment) – The environment instance to reset.

  • key (ArrayLike) – PRNG key used to sample the domain, positions, objectives, and initial velocities.

Returns:

The environment with a fresh episode state.

Return type:

Environment

static step(env: MultiRoller, action: Array) Environment[source]#

Advance the environment by one physics step.

Applies torque actions with angular damping and viscous drag. After integration the method updates LiDAR sensors, displacement caches, and computes the reward with a differential baseline.

Parameters:
  • env (Environment) – Current environment.

  • action (jax.Array) – Torque actions for every agent, shape (N * 3,).

Returns:

Updated environment after physics integration, sensor updates, and reward computation.

Return type:

Environment

static observation(env: MultiRoller) Array[source]#

Build the per-agent observation vector from cached sensors.

All state-dependent components are pre-computed in step() and reset(). This method only concatenates cached arrays.

Returns:

Observation matrix of shape (N, obs_dim). See the class docstring for the feature layout.

Return type:

jax.Array

static reward(env: MultiRoller) Array[source]#

Return the reward cached by step().

Returns:

Reward vector of shape (N,).

Return type:

jax.Array

static done(env: MultiRoller) Array[source]#

Return True when the episode has exceeded max_steps.

property action_space_size: int[source]#

Number of scalar actions per agent (3-D torque).

property action_space_shape: tuple[int][source]#

Shape of a single agent’s action ((3,)).

property observation_space_size: int[source]#

Dimensionality of a single agent’s observation vector.