jaxdem.rl.environments.swarm_navigator#

Environment where multiple agents navigate towards nearby shared targets.

Classes

SwarmNavigator(state, system, env_params, ...)

Multi-agent navigation environment toward nearby shared targets.

class jaxdem.rl.environments.swarm_navigator.SwarmNavigator(state: State, system: System, env_params: dict[str, Any], n_lidar_rays: int)#

Bases: Environment

Multi-agent navigation environment toward nearby shared targets.

Each agent controls a force vector that is applied directly to a sphere inside a reflective box. Viscous drag -friction * vel is added each step. Objectives are sampled globally, and each agent observes objective LiDAR and agent LiDAR.

At reset, a small subset of agents is spawned in the central objective region while the rest are spawned in the outer padding ring.

The reward uses exponential potential-based shaping:

\[R_i = (S_i - S_i^{\mathrm{prev}}) - w_{\mathrm{ke}}(K_i - K_i^{\mathrm{prev}}) + w_{\mathrm{coop}} \cdot \frac{1}{N}\sum_m (S_m - S_m^{\mathrm{prev}}) + w_{\mathrm{near}}\,\mathbf{1}[d_i \le r_i]\]

where \(d_i\) is the distance to the closest objective, \(K_i\) is the translational kinetic energy of agent \(i\), and \(S_i = \sum_{r \in \text{obj-LiDAR}} e^{-2 d_{ir}}\) sums exponential shaping over objectives detected by objective LiDAR rays.

Notes

The observation vector per agent is:

Feature

Size

Velocity

dim

Objective LiDAR proximity

n_lidar_rays

Agent LiDAR proximity

n_lidar_rays

n_lidar_rays: int#

Number of angular bins for each LiDAR sensor.

classmethod Create(N: int = 64, min_box_size: float = 20.0, max_box_size: float = 20.0, box_padding: float = 20.0, max_steps: int = 100000, friction: float = 0.2, ke_weight: float = 0.1, coop_weight: float = 0.2, near_goal_bonus: float = 0.1, lidar_range: float = 10.0, n_lidar_rays: int = 24) SwarmNavigator[source]#

Create a swarm navigator environment.

Parameters:
  • N (int) – Number of agents and number of sampled objectives.

  • min_box_size (float) – Range for the random square domain side length sampled at each reset().

  • max_box_size (float) – Range for the random square domain side length sampled at each reset().

  • box_padding (float) – Extra padding around the domain in multiples of the particle radius. The padding region is used as the outer spawn ring.

  • max_steps (int) – Episode length in physics steps.

  • friction (float) – Viscous drag coefficient applied as -friction * vel.

  • ke_weight (float) – Weight for the differential kinetic energy penalty.

  • coop_weight (float) – Weight for the shared team-progress bonus.

  • near_goal_bonus (float) – Reward bonus applied when an agent is within one radius of its closest objective.

  • lidar_range (float) – Maximum detection range for the LiDAR sensor.

  • n_lidar_rays (int) – Number of angular LiDAR bins spanning \([-\pi, \pi)\).

Returns:

A freshly constructed environment (call reset() before use).

Return type:

SwarmNavigator

static reset(env: SwarmNavigator, key: Array | ndarray | bool | number | bool | int | float | complex) Environment[source]#

Initialize the environment with random positions and objectives.

Parameters:
  • env (Environment) – Current environment instance.

  • key (ArrayLike) – JAX random number generator key.

Returns:

Freshly initialized environment.

Return type:

Environment

static step(env: SwarmNavigator, action: Array) Environment[source]#

Advance one step. Actions are forces; simple drag is applied (-friction * vel).

Parameters:
  • env (Environment) – The current environment.

  • action (jax.Array) – The vector of actions each agent in the environment should take.

Returns:

The updated environment state.

Return type:

Environment

static observation(env: SwarmNavigator) Array[source]#

Build per-agent observations.

Contents per agent#

  • Velocity (shape (dim,)).

  • Objective LiDAR proximity, normalized by lidar_range (shape (n_lidar_rays,)).

  • Agent LiDAR proximity, normalized by lidar_range (shape (n_lidar_rays,)).

returns:

Array of shape (N, dim + 2 * n_lidar_rays)

rtype:

jax.Array

static reward(env: SwarmNavigator) Array[source]#

Returns a vector of per-agent rewards.

\[\mathrm{rew}_t = (S_t - S_t^{\mathrm{prev}}) - w_{\text{ke}} (K_t - K_{t-1}) + w_{\text{coop}} \cdot \mathrm{mean}\left( (S_t - S_t^{\mathrm{prev}})\right) + w_{\text{near}} \cdot \mathbf{1}[d_t \le r]\]

where \(d_t\) is the distance to the closest objective at step \(t\), \(K_t\) is the kinetic energy at step \(t\), and \(S_t\) is the per-agent sum of \(e^{-2d}\) over objectives detected by objective LiDAR rays, \(w_{\text{ke}}\) is the kinetic-energy penalty weight, and \(w_{\text{coop}}\) weights a shared team-progress bonus, and \(w_{\text{near}}\) weights a near-goal bonus.

Parameters:

env (Environment) – Current environment.

Returns:

Shape (N,).

Return type:

jax.Array

static done(env: SwarmNavigator) Array[source]#

Returns a boolean indicating whether the environment has ended. The episode terminates when the maximum number of steps is reached.

Parameters:

env (Environment) – The current environment.

Returns:

Boolean array indicating whether the environment has ended.

Return type:

jax.Array

property action_space_size: int[source]#

Flattened action size per agent. Actions passed to step() have shape (A, action_space_size).

property action_space_shape: tuple[int][source]#

Original per-agent action shape (useful for reshaping inside the environment).

property observation_space_size: int[source]#

Flattened observation size per agent. observation() returns shape (A, observation_space_size).