jaxdem.rl.environments.swarm_roller_3d#
Multi-agent 3-D swarm rolling environment with magnetic interaction and pyramid objectives.
Classes
|
Multi-agent 3-D rolling environment with magnetic interaction and pyramid objectives. |
- class jaxdem.rl.environments.swarm_roller_3d.SwarmRoller3D(state: State, system: System, env_params: dict[str, Any], n_lidar_rays: int, n_lidar_elevation: int, k_objectives: int, n_objectives: int)#
Bases:
EnvironmentMulti-agent 3-D rolling environment with magnetic interaction and pyramid objectives. Extends the swarm roller with two additions:
Each agent has an extra binary magnet action. When two nearby agents both activate their magnets the mutual attraction is twice as strong:
\[\mathbf{F}_{ij}^{\text{mag}} = -w_{\text{mag}} \, (m_i + m_j) \, \max\!\bigl(0,\; 1 - d/r_{\text{mag}}\bigr) \, \hat{n}_{ij}\]where \(m_i \in \{0, 1\}\) is the magnet flag for agent i, \(d = \|r_{ij}\|\), and \(r_{\text{mag}}\) is
magnet_range.Pyramid objectives. Objectives are arranged in a pyramid: base layer on the floor and elevated apex targets. Agents must stack on top of one another to reach elevated targets. Occupancy uses full 3-D distance to prevent false apex claims.
Reward
\[R_i = w_s\,\sum_{j \in \text{top-}k} (e^{-2d_{ij}} - e^{-2d_{ij}^{\mathrm{prev}}}) + w_{th}\,\frac{1}{N}\sum_{m=1}^{N} z_m + w_g\,\mathbf{1}[\text{on target}] - w_w\,\|a_i\|^2 - w_{\mathrm{vel}}\,\|v_i\|^2 - \bar{r}_i\]where \(\bar{r}_i\) is an EMA baseline updated with factor \(\alpha\), \(w_{th}\) scales the reward for the average team height, \(w_g\) is the bonus for being on a target, and \(w_{\mathrm{vel}}\) penalises high agent velocity. All weights are constructor parameters stored in
env_params.Notes
The observation vector per agent is:
Feature
Size
Velocity (x, y, z)
3Angular velocity
3Magnet flag
1LiDAR proximity (normalised)
n_lidar_rays * n_lidar_elevationRadial relative velocity
n_lidar_rays * n_lidar_elevationObjective LiDAR proximity
n_lidar_rays * n_lidar_elevationUnit direction to top k objectives
k_objectives * 3Clamped displacement to top k
k_objectives * 3Occupancy status of top k
k_objectives- n_lidar_rays: int#
Number of azimuthal bins for the 3-D LiDAR sensor.
- n_lidar_elevation: int#
Number of elevation bins for the 3-D LiDAR sensor.
- k_objectives: int#
Number of closest objectives tracked per agent.
- n_objectives: int#
Number of shared objectives.
- classmethod Create(N: int = 5, n_objectives: int = 5, min_box_size: float = 1.0, max_box_size: float = 1.0, box_padding: float = 0.0, max_steps: int = 5760, friction: float = 0.2, ang_damping: float = 0.07, shaping_weight: float = 2.0, team_height_weight: float = 1.0, goal_weight: float = 0.0, work_weight: float = 0.0, velocity_weight: float = 0.018, goal_radius_factor: float = 1.0, alpha_r_bar: float = 0.07, lidar_range: float = 0.4, n_lidar_rays: int = 6, n_lidar_elevation: int = 6, k_objectives: int = 4, magnet_strength: float = 40.0, magnet_range: float = 0.12) SwarmRoller3D[source]#
Create a swarm roller 3-D environment.
- Parameters:
N (int) – Number of agents.
n_objectives (int) – Number of shared objectives.
min_box_size (float) – Range for the random square domain side length sampled at each
reset().max_box_size (float) – Range for the random square domain side length sampled at each
reset().box_padding (float) – Extra padding around the domain in multiples of the particle radius.
max_steps (int) – Episode length in physics steps.
friction (float) – Viscous drag coefficient applied as
-friction * vel.ang_damping (float) – Angular damping coefficient applied as
-ang_damping * ang_vel.shaping_weight (float) – Multiplier \(w_s\) on the potential-based shaping signal summed over the k nearest objectives.
team_height_weight (float) – Weight \(w_{th}\) scaling the average z-height of the swarm as a global reward.
goal_weight (float) – Bonus \(w_g\) for being positioned on a target.
work_weight (float) – Weight \(w_w\) of the quadratic action penalty \(\|a\|^2\).
velocity_weight (float) – Penalty \(w_{\mathrm{vel}}\) on the squared velocity magnitude \(\|v_i\|^2\).
goal_radius_factor (float) – Multiplicative factor \(f\) applied to the particle radius to define the goal activation threshold \(d < f \cdot r\).
alpha_r_bar (float) – EMA smoothing factor \(\alpha\) for the differential reward baseline \(\bar{r}\).
lidar_range (float) – Maximum detection range for the LiDAR sensor.
n_lidar_rays (int) – Number of azimuthal LiDAR bins spanning \([-\pi, \pi)\).
n_lidar_elevation (int) – Number of elevation LiDAR bins spanning \([-\pi/2, \pi/2]\).
k_objectives (int) – Number of closest objectives tracked per agent.
magnet_strength (float) – Magnitude of the magnetic attraction force.
magnet_range (float) – Maximum range for magnetic interaction (beyond this the force is zero).
- Returns:
A freshly constructed environment (call
reset()before use).- Return type:
- static reset(env: SwarmRoller3D, key: Array | ndarray | bool | number | bool | int | float | complex | TypedNdArray) Environment[source]#
Reset the environment to a random initial configuration.
- Parameters:
env (Environment) – The environment instance to reset.
key (ArrayLike) – PRNG key used to sample the domain, positions, objectives, and initial velocities.
- Returns:
The environment with a fresh episode state.
- Return type:
- static step(env: SwarmRoller3D, action: Array) Environment[source]#
Advance the environment by one physics step.
Applies torque actions with angular damping, viscous drag, and pairwise magnetic attraction. After integration the method updates all sensor caches and computes the reward with a differential baseline. The shaping signal is summed over the k nearest objectives.
- Parameters:
env (Environment) – Current environment.
action (jax.Array) – Actions for every agent, shape
(N * 4,)(3-D torque + magnet flag).
- Returns:
Updated environment after physics integration, sensor updates, and reward computation.
- Return type:
- static observation(env: SwarmRoller3D) Array[source]#
Build the per-agent observation vector from cached sensors. All state-dependent components are pre-computed in
step()andreset(). This method only concatenates cached arrays.- Returns:
Observation matrix of shape
(N, obs_dim). See the class docstring for the feature layout.- Return type:
jax.Array
- static reward(env: SwarmRoller3D) Array[source]#
Return the reward cached by
step().- Returns:
Reward vector of shape
(N,).- Return type:
jax.Array
- static done(env: SwarmRoller3D) Array[source]#
Return
Truewhen the episode has exceededmax_steps.