jaxdem.rl.environments.swarm_roller_3d#

Multi-agent 3-D swarm rolling environment with magnetic interaction and pyramid objectives.

Classes

SwarmRoller3D(state, system, env_params, ...)

Multi-agent 3-D rolling environment with magnetic interaction and pyramid objectives.

class jaxdem.rl.environments.swarm_roller_3d.SwarmRoller3D(state: State, system: System, env_params: dict[str, Any], n_lidar_rays: int, n_lidar_elevation: int, k_objectives: int, n_objectives: int)#

Bases: Environment

Multi-agent 3-D rolling environment with magnetic interaction and pyramid objectives. Extends the swarm roller with two additions:

  1. Each agent has an extra binary magnet action. When two nearby agents both activate their magnets the mutual attraction is twice as strong:

    \[\mathbf{F}_{ij}^{\text{mag}} = -w_{\text{mag}} \, (m_i + m_j) \, \max\!\bigl(0,\; 1 - d/r_{\text{mag}}\bigr) \, \hat{n}_{ij}\]

    where \(m_i \in \{0, 1\}\) is the magnet flag for agent i, \(d = \|r_{ij}\|\), and \(r_{\text{mag}}\) is magnet_range.

  2. Pyramid objectives. Objectives are arranged in a pyramid: base layer on the floor and elevated apex targets. Agents must stack on top of one another to reach elevated targets. Occupancy uses full 3-D distance to prevent false apex claims.

Reward

\[R_i = w_s\,\sum_{j \in \text{top-}k} (e^{-2d_{ij}} - e^{-2d_{ij}^{\mathrm{prev}}}) + w_{th}\,\frac{1}{N}\sum_{m=1}^{N} z_m + w_g\,\mathbf{1}[\text{on target}] - w_w\,\|a_i\|^2 - w_{\mathrm{vel}}\,\|v_i\|^2 - \bar{r}_i\]

where \(\bar{r}_i\) is an EMA baseline updated with factor \(\alpha\), \(w_{th}\) scales the reward for the average team height, \(w_g\) is the bonus for being on a target, and \(w_{\mathrm{vel}}\) penalises high agent velocity. All weights are constructor parameters stored in env_params.

Notes

The observation vector per agent is:

Feature

Size

Velocity (x, y, z)

3

Angular velocity

3

Magnet flag

1

LiDAR proximity (normalised)

n_lidar_rays * n_lidar_elevation

Radial relative velocity

n_lidar_rays * n_lidar_elevation

Objective LiDAR proximity

n_lidar_rays * n_lidar_elevation

Unit direction to top k objectives

k_objectives * 3

Clamped displacement to top k

k_objectives * 3

Occupancy status of top k

k_objectives

n_lidar_rays: int#

Number of azimuthal bins for the 3-D LiDAR sensor.

n_lidar_elevation: int#

Number of elevation bins for the 3-D LiDAR sensor.

k_objectives: int#

Number of closest objectives tracked per agent.

n_objectives: int#

Number of shared objectives.

classmethod Create(N: int = 5, n_objectives: int = 5, min_box_size: float = 1.0, max_box_size: float = 1.0, box_padding: float = 0.0, max_steps: int = 5760, friction: float = 0.2, ang_damping: float = 0.07, shaping_weight: float = 2.0, team_height_weight: float = 1.0, goal_weight: float = 0.0, work_weight: float = 0.0, velocity_weight: float = 0.018, goal_radius_factor: float = 1.0, alpha_r_bar: float = 0.07, lidar_range: float = 0.4, n_lidar_rays: int = 6, n_lidar_elevation: int = 6, k_objectives: int = 4, magnet_strength: float = 40.0, magnet_range: float = 0.12) SwarmRoller3D[source]#

Create a swarm roller 3-D environment.

Parameters:
  • N (int) – Number of agents.

  • n_objectives (int) – Number of shared objectives.

  • min_box_size (float) – Range for the random square domain side length sampled at each reset().

  • max_box_size (float) – Range for the random square domain side length sampled at each reset().

  • box_padding (float) – Extra padding around the domain in multiples of the particle radius.

  • max_steps (int) – Episode length in physics steps.

  • friction (float) – Viscous drag coefficient applied as -friction * vel.

  • ang_damping (float) – Angular damping coefficient applied as -ang_damping * ang_vel.

  • shaping_weight (float) – Multiplier \(w_s\) on the potential-based shaping signal summed over the k nearest objectives.

  • team_height_weight (float) – Weight \(w_{th}\) scaling the average z-height of the swarm as a global reward.

  • goal_weight (float) – Bonus \(w_g\) for being positioned on a target.

  • work_weight (float) – Weight \(w_w\) of the quadratic action penalty \(\|a\|^2\).

  • velocity_weight (float) – Penalty \(w_{\mathrm{vel}}\) on the squared velocity magnitude \(\|v_i\|^2\).

  • goal_radius_factor (float) – Multiplicative factor \(f\) applied to the particle radius to define the goal activation threshold \(d < f \cdot r\).

  • alpha_r_bar (float) – EMA smoothing factor \(\alpha\) for the differential reward baseline \(\bar{r}\).

  • lidar_range (float) – Maximum detection range for the LiDAR sensor.

  • n_lidar_rays (int) – Number of azimuthal LiDAR bins spanning \([-\pi, \pi)\).

  • n_lidar_elevation (int) – Number of elevation LiDAR bins spanning \([-\pi/2, \pi/2]\).

  • k_objectives (int) – Number of closest objectives tracked per agent.

  • magnet_strength (float) – Magnitude of the magnetic attraction force.

  • magnet_range (float) – Maximum range for magnetic interaction (beyond this the force is zero).

Returns:

A freshly constructed environment (call reset() before use).

Return type:

SwarmRoller3D

static reset(env: SwarmRoller3D, key: Array | ndarray | bool | number | bool | int | float | complex | TypedNdArray) Environment[source]#

Reset the environment to a random initial configuration.

Parameters:
  • env (Environment) – The environment instance to reset.

  • key (ArrayLike) – PRNG key used to sample the domain, positions, objectives, and initial velocities.

Returns:

The environment with a fresh episode state.

Return type:

Environment

static step(env: SwarmRoller3D, action: Array) Environment[source]#

Advance the environment by one physics step.

Applies torque actions with angular damping, viscous drag, and pairwise magnetic attraction. After integration the method updates all sensor caches and computes the reward with a differential baseline. The shaping signal is summed over the k nearest objectives.

Parameters:
  • env (Environment) – Current environment.

  • action (jax.Array) – Actions for every agent, shape (N * 4,) (3-D torque + magnet flag).

Returns:

Updated environment after physics integration, sensor updates, and reward computation.

Return type:

Environment

static observation(env: SwarmRoller3D) Array[source]#

Build the per-agent observation vector from cached sensors. All state-dependent components are pre-computed in step() and reset(). This method only concatenates cached arrays.

Returns:

Observation matrix of shape (N, obs_dim). See the class docstring for the feature layout.

Return type:

jax.Array

static reward(env: SwarmRoller3D) Array[source]#

Return the reward cached by step().

Returns:

Reward vector of shape (N,).

Return type:

jax.Array

static done(env: SwarmRoller3D) Array[source]#

Return True when the episode has exceeded max_steps.

property action_space_size: int[source]#

Number of scalar actions per agent (3-D torque + magnet).

property action_space_shape: tuple[int][source]#

Shape of a single agent’s action ((4,)).

property observation_space_size: int[source]#

Dimensionality of a single agent’s observation vector.