jaxdem.rl.environments.multi_navigator#
Multi-agent navigation task with collision penalties.
Functions
|
Classes
|
Multi-agent navigation environment with collision penalties. |
- class jaxdem.rl.environments.multi_navigator.MultiNavigator(state: State, system: System, env_params: Dict[str, Any], max_num_agents: int = 0, action_space_size: int = 0, action_space_shape: Tuple[int, ...] = (), observation_space_size: int = 0)[source]#
Bases:
Environment
Multi-agent navigation environment with collision penalties.
- classmethod Create(N: int = 2, min_box_size: float = 1.0, max_box_size: float = 2.0, max_steps: int = 5000, final_reward: float = 0.05, shaping_factor: float = 1.0, collision_penalty: float = -2.0, lidar_range: float = 0.35, n_lidar_rays: int = 12) → MultiNavigator[source][source]#
- static reset(env: Environment, key: Array | ndarray | bool | number | bool | int | float | complex) → Environment[source][source]#
Initialize the environment with randomly placed particles and velocities.
- Parameters:
env (Environment) – Current environment instance.
key (jax.random.PRNGKey) – JAX random number generator key.
- Returns:
Freshly initialized environment.
- Return type:
- static step(env: Environment, action: Array) → Environment[source][source]#
Advance the simulation by one step. Actions are interpreted as accelerations.
- Parameters:
env (Environment) – The current environment.
action (jax.Array) – The vector of actions each agent in the environment should take.
- Returns:
The updated environment state.
- Return type:
- static observation(env: Environment) → Array[source][source]#
Returns the observation vector for each agent.
LiDAR bins store proximity values as
max(0, R - d_min)
; a value of 0 means no detection or that an object lies beyond the LiDAR range. The observation concatenates the displacement to the objective, the particle velocity, and the LiDAR readings normalized byR
.
- static reward(env: Environment) → Array[source][source]#
Returns a vector of per-agent rewards.
Equation
Let \(\delta_i=\operatorname{displacement}(\mathbf{x}_i,\mathbf{objective})\), \(d_i=\lVert\delta_i\rVert_2\), and \(\mathbf{1}[\cdot]\) the indicator. With shaping factor \(\alpha\), final reward \(R_f\), radius \(r_i\), previous reward \(\mathrm{rew}^{\text{prev}}_i\), collision-penalty coefficient \(C_\mathrm{col}\le 0\), LiDAR range \(R\), measured proximities \(\mathrm{prox}_{i,j}\), and safety factor \(\kappa=2.05\):
\[\mathrm{rew}^{\text{shape}}_i \;=\; \mathrm{rew}^{\text{prev}}_i \;-\; \alpha\, d_i\]Define per-beam “too close” hits using a distance threshold \(\tau_i = \max(0,\, R - \kappa\, r_i)\):
\[\mathrm{hit}_{i,j} \;=\; \mathbf{1}\!\left[\,\mathrm{prox}_{i,j} > \tau_i\,\right],\qquad n^{\text{hits}}_i \;=\; \sum_j \mathrm{hit}_{i,j}\]Total reward:
\[\mathrm{rew}_i \;=\; \mathrm{rew}^{\text{shape}}_i \;+\; R_f\,\mathbf{1}[\,d_i < r_i\,] \;+\; C_\mathrm{col}\, n^{\text{hits}}_i\]The function updates \(\mathrm{rew}^{\text{prev}}_i \leftarrow \mathrm{rew}^{\text{shape}}_i\) and returns \((\mathrm{rew}_i)_{i=1}^N\) reshaped to
(env.max_num_agents,)
.
- static done(env: Environment) → Array[source][source]#
Returns a boolean indicating whether the environment has ended. The episode terminates when the maximum number of steps is reached.
- Parameters:
env (Environment) – The current environment.
- Returns:
Boolean array indicating whether the episode has ended.
- Return type:
jax.Array