jaxdem.rl.environments.swarm_navigator#
Environment where multiple agents navigate towards nearby shared targets.
Classes
|
Multi-agent navigation environment toward nearby shared targets. |
- class jaxdem.rl.environments.swarm_navigator.SwarmNavigator(state: State, system: System, env_params: dict[str, Any], n_lidar_rays: int)#
Bases:
EnvironmentMulti-agent navigation environment toward nearby shared targets.
Each agent controls a force vector that is applied directly to a sphere inside a reflective box. Viscous drag
-friction * velis added each step. Objectives are sampled globally, and each agent observes objective LiDAR and agent LiDAR.At reset, a small subset of agents is spawned in the central objective region while the rest are spawned in the outer padding ring.
The reward uses exponential potential-based shaping:
\[R_i = (S_i - S_i^{\mathrm{prev}}) - w_{\mathrm{ke}}(K_i - K_i^{\mathrm{prev}}) + w_{\mathrm{coop}} \cdot \frac{1}{N}\sum_m (S_m - S_m^{\mathrm{prev}}) + w_{\mathrm{near}}\,\mathbf{1}[d_i \le r_i]\]where \(d_i\) is the distance to the closest objective, \(K_i\) is the translational kinetic energy of agent \(i\), and \(S_i = \sum_{r \in \text{obj-LiDAR}} e^{-2 d_{ir}}\) sums exponential shaping over objectives detected by objective LiDAR rays.
Notes
The observation vector per agent is:
Feature
Size
Velocity
dimObjective LiDAR proximity
n_lidar_raysAgent LiDAR proximity
n_lidar_rays- n_lidar_rays: int#
Number of angular bins for each LiDAR sensor.
- classmethod Create(N: int = 64, min_box_size: float = 20.0, max_box_size: float = 20.0, box_padding: float = 20.0, max_steps: int = 100000, friction: float = 0.2, ke_weight: float = 0.1, coop_weight: float = 0.2, near_goal_bonus: float = 0.1, lidar_range: float = 10.0, n_lidar_rays: int = 24) → SwarmNavigator[source]#
Create a swarm navigator environment.
- Parameters:
N (int) – Number of agents and number of sampled objectives.
min_box_size (float) – Range for the random square domain side length sampled at each
reset().max_box_size (float) – Range for the random square domain side length sampled at each
reset().box_padding (float) – Extra padding around the domain in multiples of the particle radius. The padding region is used as the outer spawn ring.
max_steps (int) – Episode length in physics steps.
friction (float) – Viscous drag coefficient applied as
-friction * vel.ke_weight (float) – Weight for the differential kinetic energy penalty.
coop_weight (float) – Weight for the shared team-progress bonus.
near_goal_bonus (float) – Reward bonus applied when an agent is within one radius of its closest objective.
lidar_range (float) – Maximum detection range for the LiDAR sensor.
n_lidar_rays (int) – Number of angular LiDAR bins spanning \([-\pi, \pi)\).
- Returns:
A freshly constructed environment (call
reset()before use).- Return type:
- static reset(env: SwarmNavigator, key: Array | ndarray | bool | number | bool | int | float | complex) → Environment[source]#
Initialize the environment with random positions and objectives.
- Parameters:
env (Environment) – Current environment instance.
key (ArrayLike) – JAX random number generator key.
- Returns:
Freshly initialized environment.
- Return type:
- static step(env: SwarmNavigator, action: Array) → Environment[source]#
Advance one step. Actions are forces; simple drag is applied (-friction * vel).
- Parameters:
env (Environment) – The current environment.
action (jax.Array) – The vector of actions each agent in the environment should take.
- Returns:
The updated environment state.
- Return type:
- static observation(env: SwarmNavigator) → Array[source]#
Build per-agent observations.
Contents per agent#
Velocity (shape (dim,)).
Objective LiDAR proximity, normalized by
lidar_range(shape (n_lidar_rays,)).Agent LiDAR proximity, normalized by
lidar_range(shape (n_lidar_rays,)).
- returns:
Array of shape
(N, dim + 2 * n_lidar_rays)- rtype:
jax.Array
- static reward(env: SwarmNavigator) → Array[source]#
Returns a vector of per-agent rewards.
\[\mathrm{rew}_t = (S_t - S_t^{\mathrm{prev}}) - w_{\text{ke}} (K_t - K_{t-1}) + w_{\text{coop}} \cdot \mathrm{mean}\left( (S_t - S_t^{\mathrm{prev}})\right) + w_{\text{near}} \cdot \mathbf{1}[d_t \le r]\]where \(d_t\) is the distance to the closest objective at step \(t\), \(K_t\) is the kinetic energy at step \(t\), and \(S_t\) is the per-agent sum of \(e^{-2d}\) over objectives detected by objective LiDAR rays, \(w_{\text{ke}}\) is the kinetic-energy penalty weight, and \(w_{\text{coop}}\) weights a shared team-progress bonus, and \(w_{\text{near}}\) weights a near-goal bonus.
- Parameters:
env (Environment) – Current environment.
- Returns:
Shape
(N,).- Return type:
jax.Array
- static done(env: SwarmNavigator) → Array[source]#
Returns a boolean indicating whether the environment has ended. The episode terminates when the maximum number of steps is reached.
- Parameters:
env (Environment) – The current environment.
- Returns:
Boolean array indicating whether the environment has ended.
- Return type:
jax.Array
- property action_space_size: int[source]#
Flattened action size per agent. Actions passed to
step()have shape(A, action_space_size).
- property action_space_shape: tuple[int][source]#
Original per-agent action shape (useful for reshaping inside the environment).
- property observation_space_size: int[source]#
Flattened observation size per agent.
observation()returns shape(A, observation_space_size).