jaxdem.rl.environments.multi_navigator#

Multi-agent navigation task with collision penalties.

Functions

PoissonDisk(N, dim, rad, l_bounds, u_bounds, key)

Classes

MultiNavigator(state, system, env_params[, ...])

Multi-agent navigation environment with collision penalties.

class jaxdem.rl.environments.multi_navigator.MultiNavigator(state: State, system: System, env_params: Dict[str, Any], max_num_agents: int = 0, action_space_size: int = 0, action_space_shape: Tuple[int, ...] = (), observation_space_size: int = 0)[source]#

Bases: Environment

Multi-agent navigation environment with collision penalties.

classmethod Create(N: int = 2, min_box_size: float = 1.0, max_box_size: float = 2.0, max_steps: int = 5000, final_reward: float = 0.05, shaping_factor: float = 1.0, collision_penalty: float = -2.0, lidar_range: float = 0.35, n_lidar_rays: int = 12) MultiNavigator[source][source]#
static reset(env: Environment, key: Array | ndarray | bool | number | bool | int | float | complex) Environment[source][source]#

Initialize the environment with randomly placed particles and velocities.

Parameters:
  • env (Environment) – Current environment instance.

  • key (jax.random.PRNGKey) – JAX random number generator key.

Returns:

Freshly initialized environment.

Return type:

Environment

static step(env: Environment, action: Array) Environment[source][source]#

Advance the simulation by one step. Actions are interpreted as accelerations.

Parameters:
  • env (Environment) – The current environment.

  • action (jax.Array) – The vector of actions each agent in the environment should take.

Returns:

The updated environment state.

Return type:

Environment

static observation(env: Environment) Array[source][source]#

Returns the observation vector for each agent.

LiDAR bins store proximity values as max(0, R - d_min); a value of 0 means no detection or that an object lies beyond the LiDAR range. The observation concatenates the displacement to the objective, the particle velocity, and the LiDAR readings normalized by R.

static reward(env: Environment) Array[source][source]#

Returns a vector of per-agent rewards.

Equation

Let \(\delta_i=\operatorname{displacement}(\mathbf{x}_i,\mathbf{objective})\), \(d_i=\lVert\delta_i\rVert_2\), and \(\mathbf{1}[\cdot]\) the indicator. With shaping factor \(\alpha\), final reward \(R_f\), radius \(r_i\), previous reward \(\mathrm{rew}^{\text{prev}}_i\), collision-penalty coefficient \(C_\mathrm{col}\le 0\), LiDAR range \(R\), measured proximities \(\mathrm{prox}_{i,j}\), and safety factor \(\kappa=2.05\):

\[\mathrm{rew}^{\text{shape}}_i \;=\; \mathrm{rew}^{\text{prev}}_i \;-\; \alpha\, d_i\]

Define per-beam “too close” hits using a distance threshold \(\tau_i = \max(0,\, R - \kappa\, r_i)\):

\[\mathrm{hit}_{i,j} \;=\; \mathbf{1}\!\left[\,\mathrm{prox}_{i,j} > \tau_i\,\right],\qquad n^{\text{hits}}_i \;=\; \sum_j \mathrm{hit}_{i,j}\]

Total reward:

\[\mathrm{rew}_i \;=\; \mathrm{rew}^{\text{shape}}_i \;+\; R_f\,\mathbf{1}[\,d_i < r_i\,] \;+\; C_\mathrm{col}\, n^{\text{hits}}_i\]

The function updates \(\mathrm{rew}^{\text{prev}}_i \leftarrow \mathrm{rew}^{\text{shape}}_i\) and returns \((\mathrm{rew}_i)_{i=1}^N\) reshaped to (env.max_num_agents,).

static done(env: Environment) Array[source][source]#

Returns a boolean indicating whether the environment has ended. The episode terminates when the maximum number of steps is reached.

Parameters:

env (Environment) – The current environment.

Returns:

Boolean array indicating whether the episode has ended.

Return type:

jax.Array

classmethod registry_name() str[source]#
property type_name: str[source]#