jaxdem.rl.environments.two_gears#

Two-dimensional environment with two gears for RL training.

Functions

frictional_floor_force(pos, state, system)

Classes

TwoGears(state, system, env_params, num_gears)

Two-dimensional environment with N dynamic gears building a tower.

jaxdem.rl.environments.two_gears.frictional_floor_force(pos: Array, state: State, system: System) → Tuple[Array, Array][source]#

class jaxdem.rl.environments.two_gears.TwoGears(state: State, system: System, env_params: dict[str, Any], num_gears: int)#

Bases: Environment

Two-dimensional environment with N dynamic gears building a tower.

All num_gears gears are dynamic agents that each apply torque to themselves. Each episode samples a random target x and stacks num_gears objectives vertically into a tower (gear i must reach level i, bottom to top). The gears spawn at random, non-overlapping floor positions — not necessarily under the tower — and must navigate to assemble the stack. Gears attract each other pairwise via a magnetic force, and each gear observes its nearest neighbour.

Note

After experimentation, one needs the max torque to be at least 4.0 * mgr for the gear to be able to climb correctly, and attraction at least 1 * mg. If one wants some realistic parameters for training, skip_frames = 50 will give a response rate of 200 Hz, meaning that num_steps_epoch = 100 gives a horizon of 0.5 seconds. box_size must fit num_gears gears of radius rr side by side on the floor (box_size >= 2*rr*(num_gears+1)) and fit the tower height 2*rr*num_gears vertically.

num_gears: int#: Number of gears (agents) that must form the tower.

classmethod Create(num_gears: int = 3, box_size: float = 20.0, max_steps: int = 100000, friction: float = 0.2, ke_weight: float = 0.1, attraction_mag: float = 4.0) → TwoGears[source]#

Create an N-gear tower environment.

Parameters:

num_gears (int) – Number of dynamic gears (agents) that must form the tower.
box_size (float) – Size of the square bounding box.
max_steps (int) – Episode length in physics steps.
friction (float) – Viscous drag coefficient applied as -friction * vel.
ke_weight (float) – Weight for the differential kinetic energy penalty.
attraction_mag (float) – Magnitude of the pairwise attraction force between gears.

Returns:

A freshly constructed environment (call reset() before use).

Return type:

TwoGears

static reset(env: TwoGears, key: Array) → Environment[source]#

Reset the environment to a random initial configuration.

Parameters:

env (Environment) – The environment instance to reset.
key (jax.Array) – PRNG key used to sample the initial positions and objective.

Returns:

The environment with a fresh episode state.

Return type:

Environment

static step(env: TwoGears, action: Array) → Environment[source]#

Advance the environment by one step.

Applies each gear’s torque, computes the pairwise attraction force between all gears, and applies viscous drag.

The attraction on gear \(i\) from gear \(j\) is:

\[\mathbf{F}_{ij} = - \frac{C}{d_{ij}^3} \hat{n}_{ij},\]

when \(d_{ij} < 3 r\), where \(d_{ij}\) is the center-to-center distance, \(\hat{n}_{ij} = \mathrm{unit}(\mathbf{r}_i - \mathbf{r}_j)\) (so the force points from \(i\) toward \(j\)), and \(C = m_{\text{attr}} (2r)^3\) with \(r\) the gear radius. The net force on gear \(i\) is \(\sum_{j \ne i} \mathbf{F}_{ij}\).

Parameters:

env (Environment) – Current environment.
action (jax.Array) – Torque action for each gear, shape (num_gears, 1).

Returns:

Updated environment after physics integration and sensor updates.

Return type:

Environment

static observation(env: TwoGears) → Array[source]#

Build the per-gear observation vector.

Each gear receives a 16-feature observation; the “other gear” slot is filled by its nearest neighbour:

Feature	Size
Distance to floor	`1`
Distance to left/right walls	`2`
Unit vector to target	`2`
Clamped displacement to target	`2`
Unit vector to nearest gear	`2`
Clamped displacement to nearest gear	`2`
\(\sin(\Delta\theta)\)	`1`
\(\cos(\Delta\theta)\)	`1`
Velocity (x, y)	`2`
Angular velocity	`1`

Returns:: Observation of shape (num_gears, 16) — one row per gear.
Return type:: jax.Array

static reward(env: TwoGears) → Array[source]#

Compute the reward.

The reward is based on the differential distance to the objective minus a penalty for the change in kinetic energy:

\[R_t = (d_{t-1} - d_t) - w_{\text{ke}} (K_t - K_{t-1})\]

where \(d_t\) is the distance from gear \(i\) to its objective at step \(t\), \(K_t\) is that gear’s kinetic energy at step \(t\), and \(w_{\text{ke}}\) is the weight for the kinetic energy penalty.

Returns:: Per-gear reward of shape (num_gears,).
Return type:: jax.Array

static done(env: TwoGears) → Array[source]#

property action_space_size: int[source]#: Flattened action size per agent. Actions passed to step() have shape (A, action_space_size).

property action_space_shape: tuple[int][source]#: Original per-agent action shape (useful for reshaping inside the environment).

property observation_space_size: int[source]#: Flattened observation size per agent. observation() returns shape (A, observation_space_size).

property max_num_agents: int[source]#: Maximum number of active agents in the environment.