jaxdem.rl.environments.multi_navigator#

Multi-agent 2-D navigation with collision avoidance and cooperative rewards.

Classes

MultiNavigator(state, system, env_params, ...)

Multi-agent 2-D navigation with cooperative rewards.

class jaxdem.rl.environments.multi_navigator.MultiNavigator(state: State, system: System, env_params: dict[str, Any], n_lidar_rays: int)#

Bases: Environment

Multi-agent 2-D navigation with cooperative rewards.

Each agent controls a force vector applied directly to a sphere inside a reflective box. Viscous drag -friction * vel is added every step. Objectives are assigned one-to-one via a random permutation. Each agent receives a random priority scalar at reset for symmetry breaking.

Reward

\[R_i = w_s\,(e^{-2d_i} - e^{-2d_i^{\mathrm{prev}}}) + w_g\,\mathbf{1}[d_i < f \cdot r_i] - w_c\,\left\|\sum_j l_j\,\hat{r}_j\right\| - w_w\,\|a_i\|^2 - \bar{r}_i\]

where \(l_j\) and \(\hat{r}_j\) are the LiDAR readings and ray directions respectively, and \(\bar{r}_i\) is an EMA baseline updated with factor \(\alpha\). All weights (\(w_s, w_g, w_c, w_w, \alpha, f\)) are constructor parameters stored in env_params.

Notes

The observation vector per agent is:

Feature

Size

Unit direction to objective

dim

Clamped displacement

dim

Velocity

dim

Own priority

1

LiDAR proximity (normalised)

n_lidar_rays

Radial relative velocity

n_lidar_rays

LiDAR neighbour priority

n_lidar_rays

n_lidar_rays: int#

Number of angular bins for each LiDAR sensor.

classmethod Create(N: int = 64, min_box_size: float = 1.0, max_box_size: float = 1.0, box_padding: float = 5.0, max_steps: int = 5760, friction: float = 0.2, shaping_weight: float = 1.5, goal_weight: float = 0.001, crowding_weight: float = 0.005, work_weight: float = 0.0005, goal_radius_factor: float = 1.0, alpha_r_bar: float = 0.07, lidar_range: float = 0.3, n_lidar_rays: int = 8) MultiNavigator[source]#

Create a multi-agent navigator environment.

Parameters:
  • N (int) – Number of agents.

  • min_box_size (float) – Range for the random square domain side length sampled at each reset().

  • max_box_size (float) – Range for the random square domain side length sampled at each reset().

  • box_padding (float) – Extra padding around the domain in multiples of the particle radius.

  • max_steps (int) – Episode length in physics steps.

  • friction (float) – Viscous drag coefficient applied as -friction * vel.

  • shaping_weight (float) – Multiplier \(w_s\) on the potential-based shaping signal.

  • goal_weight (float) – Bonus \(w_g\) for being on target.

  • crowding_weight (float) – Penalty \(w_c\) per unit of LiDAR proximity sum.

  • work_weight (float) – Weight \(w_w\) of the quadratic action penalty \(\|a\|^2\).

  • goal_radius_factor (float) – Multiplicative factor \(f\) applied to the particle radius to define the goal activation threshold \(d < f \cdot r\).

  • alpha_r_bar (float) – EMA smoothing factor \(\alpha\) for the differential reward baseline \(\bar{r}\).

  • lidar_range (float) – Maximum detection range for the LiDAR sensor.

  • n_lidar_rays (int) – Number of angular LiDAR bins spanning \([-\pi, \pi)\).

Returns:

A freshly constructed environment (call reset() before use).

Return type:

MultiNavigator

static reset(env: MultiNavigator, key: Array | ndarray | bool | number | bool | int | float | complex | TypedNdArray) Environment[source]#

Reset the environment to a random initial configuration.

Parameters:
  • env (Environment) – The environment instance to reset.

  • key (ArrayLike) – PRNG key used to sample the domain, positions, objectives, and initial velocities.

Returns:

The environment with a fresh episode state.

Return type:

Environment

static step(env: MultiNavigator, action: Array) Environment[source]#

Advance the environment by one physics step.

Applies force actions with viscous drag. After integration the method updates LiDAR sensors, displacement caches, and computes the reward with a differential baseline.

Parameters:
  • env (Environment) – Current environment.

  • action (jax.Array) – Force actions for every agent, shape (N * dim,).

Returns:

Updated environment after physics integration, sensor updates, and reward computation.

Return type:

Environment

static observation(env: MultiNavigator) Array[source]#

Build the per-agent observation vector from cached sensors.

All state-dependent components are pre-computed in step() and reset(). This method only concatenates cached arrays.

Returns:

Observation matrix of shape (N, obs_dim). See the class docstring for the feature layout.

Return type:

jax.Array

static reward(env: MultiNavigator) Array[source]#

Return the reward cached by step().

Returns:

Reward vector of shape (N,).

Return type:

jax.Array

static done(env: MultiNavigator) Array[source]#

Return True when the episode has exceeded max_steps.

property action_space_size: int[source]#

Number of scalar actions per agent (equal to dim).

property action_space_shape: tuple[int][source]#

Shape of a single agent’s action ((dim,)).

property observation_space_size: int[source]#

Dimensionality of a single agent’s observation vector.