jaxdem.rl.environments.single_navigator#
Environment where a single agent navigates towards a target.
Classes
|
Single-agent navigation environment toward a fixed target. |
- class jaxdem.rl.environments.single_navigator.SingleNavigator(state: State, system: System, env_params: dict[str, Any])#
Bases:
EnvironmentSingle-agent navigation environment toward a fixed target.
The agent controls a force vector that is applied directly to a sphere inside a reflective box. Viscous drag
-friction * velis added each step. The reward uses exponential potential-based shaping:\[\mathrm{rew} = e^{-2\,d} - e^{-2\,d^{\mathrm{prev}}}\]Notes
The observation vector per agent is:
Feature
Size
Unit direction to objective
dimClamped displacement
dimVelocity
dim- classmethod Create(dim: int = 2, min_box_size: float = 2.0, max_box_size: float = 2.0, max_steps: int = 1000, friction: float = 0.2, work_weight: float = 0.0001) → SingleNavigator[source]#
Create a single-agent navigator environment.
- Parameters:
dim (int) – Spatial dimensionality (2 or 3).
min_box_size (float) – Range for the random square domain side length.
max_box_size (float) – Range for the random square domain side length.
max_steps (int) – Episode length in physics steps.
friction (float) – Viscous drag coefficient applied as
-friction * vel.work_weight (float) – Penalty coefficient for large actions.
- Returns:
A freshly constructed environment (call
reset()before use).- Return type:
- static reset(env: SingleNavigator, key: Array | ndarray | bool | number | bool | int | float | complex | TypedNdArray) → Environment[source]#
Initialize the environment with a randomly placed particle and velocity.
- Parameters:
env ('SingleNavigator') – Current environment instance.
key (jax.random.PRNGKey) – JAX random number generator key.
- Returns:
Freshly initialized environment.
- Return type:
- static step(env: SingleNavigator, action: Array) → Environment[source]#
Advance one step. Actions are forces; simple drag is applied (-friction * vel).
- Parameters:
env (Environment) – The current environment.
action (jax.Array) – The vector of actions each agent in the environment should take.
- Returns:
The updated environment state.
- Return type:
- static observation(env: SingleNavigator) → Array[source]#
Build per-agent observations.
Contents per agent#
Unit vector to objective (shape (dim,)) –> Direction
Clamped delta to objective (shape (dim,)) –> Local precision
Velocity (shape (dim,))
- returns:
Array of shape
(N, 3 * dim)- rtype:
jax.Array
- static reward(env: SingleNavigator) → Array[source]#
Returns a vector of per-agent rewards.
Reward:
\[\mathrm{rew}_i = e^{-2 \cdot d_i} - e^{-2 \cdot d_i^{\mathrm{prev}}}\]where \(d_i\) is the distance from agent \(i\) to the objective.
- Parameters:
env (Environment) – Current environment.
- Returns:
Shape
(N,).- Return type:
jax.Array
- static done(env: SingleNavigator) → Array[source]#
Returns a boolean indicating whether the environment has ended. The episode terminates when the maximum number of steps is reached.
- Parameters:
env (Environment) – The current environment.
- Returns:
Boolean array indicating whether the episode has ended.
- Return type:
jax.Array
- property action_space_size: int[source]#
Flattened action size per agent. Actions passed to
step()have shape(A, action_space_size).
- property action_space_shape: tuple[int][source]#
Original per-agent action shape (useful for reshaping inside the environment).
- property observation_space_size: int[source]#
Flattened observation size per agent.
observation()returns shape(A, observation_space_size).