jaxdem.rl.environments.single_roller#
Environment where a single agent rolls towards a target on the floor.
Functions
|
Normal, frictional, and restitution forces for a sphere on a \(z = 0\) plane. |
Classes
|
Single-agent 3D navigation via torque-controlled rolling. |
- class jaxdem.rl.environments.single_roller.SingleRoller3D(state: State, system: System, env_params: dict[str, Any])#
Bases:
EnvironmentSingle-agent 3D navigation via torque-controlled rolling.
The agent is a sphere resting on a \(z = 0\) floor under gravity. Actions are 3-D torque vectors; translational motion arises from frictional contact with the floor (see
frictional_wall_force()). A viscous drag-friction * veland a fixed angular damping of-friction * ang_velare applied each step.The reward uses exponential potential-based shaping:
\[\mathrm{rew}_t = (e^{-2 \cdot d_t} - e^{-2 \cdot d_t^{\mathrm{prev}}}) - w_{\text{ke}} (K_t - K_{t-1})\]where \(d_t\) is the distance to the objective at step \(t\), \(K_t\) is the kinetic energy at step \(t\), and \(w_{\text{ke}}\) is the weight for the kinetic energy penalty.
Notes
The observation vector per agent is:
Feature
Size
Unit direction to objective
2
Clamped displacement (x, y)
2
Velocity (x, y)
2
Angular velocity
3
If one wants some realistic parameters for training,
skip_frames = 50will give a response rate of 200 Hz, meaning thatnum_steps_epoch = 100gives a horizon of 0.5 seconds.- classmethod Create(min_box_size: float = 40.0, max_box_size: float = 40.0, max_steps: int = 20000, friction: float = 0.2, ke_weight: float = 0.1) SingleRoller3D[source]#
Create a single-agent roller environment.
- Parameters:
min_box_size (float) – Range for the random square domain side length.
max_box_size (float) – Range for the random square domain side length.
max_steps (int) – Episode length in physics steps.
friction (float) – Viscous drag coefficient applied as
-friction * vel.ke_weight (float) – Weight for the differential kinetic energy penalty.
- Returns:
A freshly constructed environment (call
reset()before use).- Return type:
- static reset(env: SingleRoller3D, key: Array | ndarray | bool | number | bool | int | float | complex) Environment[source]#
Randomly place the agent and objective on the floor.
- Parameters:
env (Environment) – Current environment instance.
key (ArrayLike) – JAX PRNG key.
- Returns:
Freshly initialised environment.
- Return type:
- static step(env: SingleRoller3D, action: Array) Environment[source]#
Apply a torque action, advance physics by one step.
- Parameters:
env (Environment) – Current environment.
action (jax.Array) – 3-D torque vector per agent.
- Returns:
Updated environment after one physics step.
- Return type:
- static observation(env: SingleRoller3D) Array[source]#
Per-agent observation vector.
Contents per agent:
Unit displacement to objective projected to x-y (shape
(2,)).Clamped displacement to objective projected to x-y (shape
(2,)).Velocity projected to x-y (shape
(2,)).Angular velocity (shape
(3,)).
- Returns:
Shape
(N, 9).- Return type:
jax.Array
- static reward(env: SingleRoller3D) Array[source]#
Returns a vector of per-agent rewards.
Exponential potential-based shaping:
\[\mathrm{rew}_t = (e^{-2 \cdot d_t} - e^{-2 \cdot d_t^{\mathrm{prev}}}) - w_{\text{ke}} (K_t - K_{t-1})\]- Returns:
Shape
(N,).- Return type:
jax.Array
- static done(env: SingleRoller3D) Array[source]#
Truewhenstep_countexceedsmax_steps.