jaxdem.rl.environments.multi_navigator#
Multi-agent navigation task with collision penalties.
Classes
|
Multi-agent navigation environment with collision penalties. |
- class jaxdem.rl.environments.multi_navigator.MultiNavigator(state: State, system: System, env_params: Dict[str, Any], n_lidar_rays: int)[source]#
Bases:
EnvironmentMulti-agent navigation environment with collision penalties.
Agents seek fixed objectives in a 2D reflective box. Each step applies a force-like action, advances simple dynamics, updates LiDAR, and returns shaped rewards with an optional final bonus on goal.
- n_lidar_rays: int#
Number of lidar rays for the vision system.
- classmethod Create(N: int = 64, min_box_size: float = 1.0, max_box_size: float = 1.0, box_padding: float = 5.0, max_steps: int = 5760, final_reward: float = 1.0, shaping_factor: float = 0.005, prev_shaping_factor: float = 0.0, global_shaping_factor: float = 0.0, collision_penalty: float = -0.005, goal_threshold: float = 0.6666666666666666, lidar_range: float = 0.45, n_lidar_rays: int = 16) → MultiNavigator[source][source]#
- static reset(env: Environment, key: Array | ndarray | bool | number | bool | int | float | complex | TypedNdArray) → Environment[source][source]#
Initialize the environment with randomly placed particles and velocities.
- Parameters:
env (Environment) – Current environment instance.
key (jax.random.PRNGKey) – JAX random number generator key.
- Returns:
Freshly initialized environment.
- Return type:
- static step(env: Environment, action: Array) → Environment[source][source]#
Advance one step. Actions are forces; simple drag is applied.
- Parameters:
env (Environment) – The current environment.
action (jax.Array) – The vector of actions each agent in the environment should take.
- Returns:
The updated environment state.
- Return type:
- static observation(env: Environment) → Array[source][source]#
Build per-agent observations.
Contents per agent#
Wrapped displacement to objective
Δx(shape(2,)).Velocity
v(shape(2,)).LiDAR proximities (shape
(n_lidar_rays,)).
- returns:
Array of shape
(N, 2 * dim + n_lidar_rays)scaled by the maximum box size for normalization.- rtype:
jax.Array
- static reward(env: Environment) → Array[source][source]#
Per-agent reward with distance shaping, goal bonus, LiDAR collision penalty, and a global shaping term.
Equations
Let \(\delta_i=\operatorname{displacement}(\mathbf{x}_i,\mathbf{objective})\), \(d_i=\lVert\delta_i\rVert_2\), and \(\mathbf{1}[\cdot]\) the indicator. With shaping factors \(\alpha_{\text{prev}},\alpha\), final reward \(R_f\), collision penalty math:C, global shaping factor math:beta, and radius \(r_i\). Let \(\ell_{i,k}\) be the LiDAR proximities for agent \(i\) and ray \(k\), and \(h_i = \sum_k \mathbf{1}[\ell_{i,k} > (\text{LIDAR_range} - 2r_i)]\) be the collision count. The rewards consists on:
\[\mathrm{rew}^{\text{shape}}_i = \alpha_{\text{prev}}\,d^{\text{prev}}_i - \alpha\, d_i\]\[\mathrm{rew}_i = \mathrm{rew}^{\text{shape}}_i + R_f\,\mathbf{1}[\,d_i < \text{goal_threshold}\times r_i\,] + C\, h_i - \beta\, \overline{d},\]\[\overline{d} = \tfrac{1}{N}\sum_j d_j\]- Parameters:
env (Environment) – Current environment.
- Returns:
Shape
(N,). The normalized per-agent reward vector.- Return type:
jax.Array
- static done(env: Environment) → Array[source][source]#
Returns a boolean indicating whether the environment has ended. The episode terminates when the maximum number of steps is reached.
- Parameters:
env (Environment) – The current environment.
- Returns:
Boolean array indicating whether the episode has ended.
- Return type:
jax.Array
- property action_space_size: int[source]#
Flattened action size per agent. Actions passed to
step()have shape(A, action_space_size).
- property action_space_shape: Tuple[int][source]#
Original per-agent action shape (useful for reshaping inside the environment).
- property observation_space_size: int[source]#
Flattened observation size per agent.
observation()returns shape(A, observation_space_size).