jaxdem.rl.environments.two_gears#
Two-dimensional environment with two gears for RL training.
Functions
|
Classes
|
Two-dimensional environment with two gears. |
- jaxdem.rl.environments.two_gears.frictional_floor_force(pos: Array, state: State, system: System) Tuple[Array, Array][source]#
- class jaxdem.rl.environments.two_gears.TwoGears(state: State, system: System, env_params: dict[str, Any])#
Bases:
EnvironmentTwo-dimensional environment with two gears.
The environment consists of two gears composed of spheres. One gear is frozen on the floor, and the other is an active agent that can apply torque to itself. The objective is to navigate the active gear to a specified target position above the frozen gear. The active gear is attracted to the frozen gear by a magnetic force.
Note
After experimentation, one needs the max torque to be at least
4.0 * mgrfor the gear to be able to climb correctly, and attraction at least1 * mg. If one wants some realistic parameters for training,skip_frames = 50will give a response rate of 200 Hz, meaning thatnum_steps_epoch = 100gives a horizon of 0.5 seconds.- classmethod Create(box_size: float = 10.0, max_steps: int = 100000, friction: float = 0.2, ke_weight: float = 0.1, attraction_mag: float = 4.0) TwoGears[source]#
Create a two-gears 2-D environment.
- Parameters:
box_size (float) – Size of the square bounding box.
max_steps (int) – Episode length in physics steps.
friction (float) – Viscous drag coefficient applied as
-friction * vel.ke_weight (float) – Weight for the differential kinetic energy penalty.
attraction_mag (float) – Magnitude of the attraction force between the two gears.
- Returns:
A freshly constructed environment (call
reset()before use).- Return type:
- static reset(env: TwoGears, key: Array) Environment[source]#
Reset the environment to a random initial configuration.
- Parameters:
env (Environment) – The environment instance to reset.
key (jax.Array) – PRNG key used to sample the initial positions and objective.
- Returns:
The environment with a fresh episode state.
- Return type:
- static step(env: TwoGears, action: Array) Environment[source]#
Advance the environment by one step.
Applies torque to the active agent, computes the attraction force between the gears, and applies viscous drag.
The attraction force is defined as:
\[\mathbf{F}_{\text{attraction}} = - \frac{C}{d^3} \hat{n},\]when \(d < 3 r\), where \(d\) is the distance between the centers, \(\hat{n}\) is the unit vector from the frozen gear to the active gear, and \(C\) is determined by
attraction_magas \(C = m_{\text{attr}} (2r)^3\). r is the gear radius.- Parameters:
env (Environment) – Current environment.
action (jax.Array) – Actions for the active gear.
- Returns:
Updated environment after physics integration and sensor updates.
- Return type:
- static observation(env: TwoGears) Array[source]#
Build the observation vector.
The observation vector contains 16 features:
Feature
Size
Distance to floor
1Distance to left/right walls
2Unit vector to target
2Clamped displacement to target
2Unit vector to frozen gear
2Clamped displacement to frozen gear
2\(\sin(\Delta\theta)\)
1\(\cos(\Delta\theta)\)
1Velocity (x, y)
2Angular velocity
1- Returns:
Observation vector of size
16.- Return type:
jax.Array
- static reward(env: TwoGears) Array[source]#
Compute the reward.
The reward is based on the differential distance to the objective minus a penalty for the change in kinetic energy:
\[R_t = (d_{t-1} - d_t) - w_{\text{ke}} (K_t - K_{t-1})\]where \(d_t\) is the distance to the objective at step \(t\), \(K_t\) is the kinetic energy at step \(t\), and \(w_{\text{ke}}\) is the weight for the kinetic energy penalty.
- Returns:
Reward value for the active agent.
- Return type:
jax.Array
- property action_space_size: int[source]#
Flattened action size per agent. Actions passed to
step()have shape(A, action_space_size).
- property action_space_shape: tuple[int][source]#
Original per-agent action shape (useful for reshaping inside the environment).
- property observation_space_size: int[source]#
Flattened observation size per agent.
observation()returns shape(A, observation_space_size).