jaxdem.rl.environments.multi_roller#
Multi-agent 3-D rolling environment with LiDAR sensing.
Functions
|
Normal, frictional, and restitution forces for spheres on a \(z = 0\) plane. |
Classes
|
Multi-agent 3-D rolling environment with cooperative rewards. |
- class jaxdem.rl.environments.multi_roller.MultiRoller(state: State, system: System, env_params: dict[str, Any], n_lidar_rays: int)#
Bases:
EnvironmentMulti-agent 3-D rolling environment with cooperative rewards.
Each agent is a sphere resting on a \(z = 0\) floor under gravity. Actions are 3-D torque vectors; translational motion arises from frictional contact with the floor (see
frictional_wall_force()). Viscous drag-friction * veland angular damping-ang_damping * ang_velare applied every step. Objectives are assigned one-to-one via a random permutation. Each agent receives a random priority scalar at reset for symmetry breaking.Reward
\[R_i = w_s\,(e^{-2d_i} - e^{-2d_i^{\mathrm{prev}}}) + w_g\,\mathbf{1}[d_i < f \cdot r_i] - w_c\,\left\|\sum_j l_j\,\hat{r}_j\right\| - w_w\,\|a_i\|^2 - \bar{r}_i\]where \(l_j\) and \(\hat{r}_j\) are the LiDAR readings and ray directions respectively, and \(\bar{r}_i\) is an EMA baseline updated with factor \(\alpha\). All weights (\(w_s, w_g, w_c, w_w, \alpha, f\)) are constructor parameters stored in
env_params.Notes
The observation vector per agent is:
Feature
Size
Unit direction to objective (x, y)
2Clamped displacement (x, y)
2Velocity (x, y)
2Angular velocity
3Own priority
1LiDAR proximity (normalised)
n_lidar_raysRadial relative velocity
n_lidar_raysLiDAR neighbour priority
n_lidar_rays- n_lidar_rays: int#
Number of angular bins for each LiDAR sensor.
- classmethod Create(N: int = 64, min_box_size: float = 1.0, max_box_size: float = 1.0, box_padding: float = 5.0, max_steps: int = 5760, friction: float = 0.2, ang_damping: float = 0.07, shaping_weight: float = 1.5, goal_weight: float = 0.001, crowding_weight: float = 0.005, work_weight: float = 0.0005, goal_radius_factor: float = 1.0, alpha_r_bar: float = 0.07, lidar_range: float = 0.3, n_lidar_rays: int = 8) MultiRoller[source]#
Create a multi-agent roller environment.
- Parameters:
N (int) – Number of agents.
min_box_size (float) – Range for the random square domain side length sampled at each
reset().max_box_size (float) – Range for the random square domain side length sampled at each
reset().box_padding (float) – Extra padding around the domain in multiples of the particle radius.
max_steps (int) – Episode length in physics steps.
friction (float) – Viscous drag coefficient applied as
-friction * vel.ang_damping (float) – Angular damping coefficient applied as
-ang_damping * ang_vel.shaping_weight (float) – Multiplier \(w_s\) on the potential-based shaping signal.
goal_weight (float) – Bonus \(w_g\) for being on target.
crowding_weight (float) – Penalty \(w_c\) per unit of LiDAR crowding vector norm.
work_weight (float) – Weight \(w_w\) of the quadratic action penalty \(\|a\|^2\).
goal_radius_factor (float) – Multiplicative factor \(f\) applied to the particle radius to define the goal activation threshold \(d < f \cdot r\).
alpha_r_bar (float) – EMA smoothing factor \(\alpha\) for the differential reward baseline \(\bar{r}\).
lidar_range (float) – Maximum detection range for the LiDAR sensor.
n_lidar_rays (int) – Number of angular LiDAR bins spanning \([-\pi, \pi)\).
- Returns:
A freshly constructed environment (call
reset()before use).- Return type:
- static reset(env: MultiRoller, key: Array | ndarray | bool | number | bool | int | float | complex | TypedNdArray) Environment[source]#
Reset the environment to a random initial configuration.
- Parameters:
env (Environment) – The environment instance to reset.
key (ArrayLike) – PRNG key used to sample the domain, positions, objectives, and initial velocities.
- Returns:
The environment with a fresh episode state.
- Return type:
- static step(env: MultiRoller, action: Array) Environment[source]#
Advance the environment by one physics step.
Applies torque actions with angular damping and viscous drag. After integration the method updates LiDAR sensors, displacement caches, and computes the reward with a differential baseline.
- Parameters:
env (Environment) – Current environment.
action (jax.Array) – Torque actions for every agent, shape
(N * 3,).
- Returns:
Updated environment after physics integration, sensor updates, and reward computation.
- Return type:
- static observation(env: MultiRoller) Array[source]#
Build the per-agent observation vector from cached sensors.
All state-dependent components are pre-computed in
step()andreset(). This method only concatenates cached arrays.- Returns:
Observation matrix of shape
(N, obs_dim). See the class docstring for the feature layout.- Return type:
jax.Array
- static reward(env: MultiRoller) Array[source]#
Return the reward cached by
step().- Returns:
Reward vector of shape
(N,).- Return type:
jax.Array
- static done(env: MultiRoller) Array[source]#
Return
Truewhen the episode has exceededmax_steps.