Environments

Multi Cartpole

Ver. 1.3.6 (2025-08-03)

This module provides an environment with multivariate state and action spaces based on the OpenAI Gym environment ‘CartPole-v1’.

class mlpro_int_gymnasium.envs.multicartpole.MultiCartPole(p_num_envs=2, p_reward_type=0, p_visualize: bool = True, p_logging=True)

Bases: Environment

This environment multivariate space and action spaces by duplicating the OpenAI Gym environment ‘CartPole-v1’. The number of internal CartPole sub-enironments can be parameterized.

C_NAME = 'MultiCartPole'

C_LATENCY = datetime.timedelta(seconds=1)

C_INFINITY = np.float32(3.4028235e+38)

C_PLOT_ACTIVE: bool = True

_init_cartpoles()

static load(p_path, p_filename)

Static method to load an object of the current class from a file using pickle/dill. During unpickling the given file, standard method __setstate__() is called. This in turn is implemented specifically and calls the MLPro custom method _complete_state(). This method allows the completion of the unpickled object from further externally stored data.

Parameters:

p_path (str) – Path where file will be saved
p_filename (str = None) – File name (if None an internal filename will be used)

Returns:

Object of the given class that was unpickled from the given file.

Return type:

Object

_save(p_path, p_filename) → bool

switch_logging(p_logging)

static setup_spaces()

Static template method to set up and return state and action space of environment.

Returns:

state_space (MSpace) – State space object
action_space (MSpace) – Action space object

_setup_spaces()

collect_substates() → State

get_cycle_limit(): Returns limit of cycles per training episode.

_reset(p_seed=None)

Custom method to reset the system to an initial/defined state. Use method _set_status() to set the state.

Parameters:: p_seed (int) – Seed parameter for an internal random generator

simulate_reaction(p_state: State, p_action: Action) → State

Simulates a state transition based on a state and an action. The simulation step itself is carried out either by an internal custom implementation in method _simulate_reaction() or by an embedded external function.

Parameters:

p_state (State) – Current state.
p_action (Action) – Action.

Returns:

Subsequent state after transition

Return type:

State

_compute_reward(p_state_old: State, p_state_new: State) → Reward: Custom reward method. See method compute_reward() for further details.

compute_success(p_state: State) → bool

Assesses the given state whether it is a ‘success’ state. Assessment is carried out either by a custom implementation in method _compute_success() or by an embedded external function.

Parameters:: p_state (State) – State to be assessed.
Returns:: success – True, if the given state is a ‘success’ state. False otherwise.
Return type:: bool

compute_broken(p_state: State) → bool

Assesses the given state whether it is a ‘broken’ state. Assessment is carried out either by a custom implementation in method _compute_broken() or by an embedded external function.

Parameters:: p_state (State) – State to be assessed.
Returns:: broken – True, if the given state is a ‘broken’ state. False otherwise.
Return type:: bool

init_plot(p_figure: Figure = None, p_plot_settings: list = Ellipsis, p_plot_depth: int = 0, p_detail_level: int = 0, p_step_rate: int = 0, **p_kwargs)

Initializes the plot functionalities of the class.

Parameters:

p_figure (Matplotlib.figure.Figure, optional) – Optional MatPlotLib host figure, where the plot shall be embedded. The default is None.
p_plot_settings (PlotSettings) – Optional plot settings. If None, the default view is plotted (see attribute C_PLOT_DEFAULT_VIEW).

update_plot(**p_kwargs)

Updates the plot.

Parameters:: **p_kwargs – Implementation-specific plot data and/or parameters.

Rubiks Cube 2x2x2

Ver. 1.0.3 (2026-03-10)

This module provides a standardized MLPro environment for a 2x2x2 Rubik’s Cube based on the rubiks-cube-gym package (https://github.com/DoubleGremlin181/RubiksCubeGym).

The environment wraps the registered Gymnasium environment ‘rubiks-cube-222-v0’ using WrEnvGYM2MLPro and exposes it as a single-agent MLPro RL environment.

Observation space : Discrete(3674160) – index into the full state dictionary Action space : Discrete(3) – moves F=0, R=1, U=2 Reward : Defined by reward strategy. Default: -1 per step, +100 on solve.

class mlpro_int_gymnasium.envs.rubikscube2x2x2.RubiksCube222(p_shaped_reward: bool = False, p_visualize: bool = True, p_logging=True)

Bases: Environment

MLPro RL environment for the 2x2x2 Rubik’s Cube.

Wraps the Gymnasium environment ‘rubiks-cube-222-v0’ from the rubiks-cube-gym package using WrEnvGYM2MLPro.

C_NAME = 'RubiksCube222'

C_LATENCY = datetime.timedelta(seconds=1)

C_PLOT_ACTIVE: bool = True

_init_env(): Instantiates the gym env, optionally wraps it with ShapedRewardCubeWrapper, then wraps the result with WrEnvGYM2MLPro. Called once during __init__ and again after loading from file (see load()). Guarded against premature calls before spaces are initialised.

static load(p_path, p_filename)

Static method to load an object of the current class from a file using pickle/dill. During unpickling the given file, standard method __setstate__() is called. This in turn is implemented specifically and calls the MLPro custom method _complete_state(). This method allows the completion of the unpickled object from further externally stored data.

Parameters:

p_path (str) – Path where file will be saved
p_filename (str = None) – File name (if None an internal filename will be used)

Returns:

Object of the given class that was unpickled from the given file.

Return type:

Object

_save(p_path, p_filename) → bool

switch_logging(p_logging)

static setup_spaces()

Static template method to set up and return state and action space of environment.

Returns:

state_space (MSpace) – State space object
action_space (MSpace) – Action space object

_setup_spaces()

Defines the MLPro state and action spaces:

State : one integer dimension, index in [0, 3674159]
Action : one integer dimension, index in [0, 2] (F=0, R=1, U=2)

_reset(p_seed=None)

Custom method to reset the system to an initial/defined state. Use method _set_status() to set the state.

Parameters:: p_seed (int) – Seed parameter for an internal random generator

simulate_reaction(p_state: State, p_action: Action) → State: Clips the continuous action value produced by SB3 to a valid discrete index {0, 1, 2} before passing it to the wrapped gym environment.

_compute_reward(p_state_old: State, p_state_new: State) → Reward: Default reward: +100 on solve, -1 on every other step. When p_shaped_reward=True, ShapedRewardCubeWrapper overrides the gym env’s reward signal before it reaches this method.

compute_success(p_state: State) → bool

Assesses the given state whether it is a ‘success’ state. Assessment is carried out either by a custom implementation in method _compute_success() or by an embedded external function.

Parameters:: p_state (State) – State to be assessed.
Returns:: success – True, if the given state is a ‘success’ state. False otherwise.
Return type:: bool

compute_broken(p_state: State) → bool

Assesses the given state whether it is a ‘broken’ state. Assessment is carried out either by a custom implementation in method _compute_broken() or by an embedded external function.

Parameters:: p_state (State) – State to be assessed.
Returns:: broken – True, if the given state is a ‘broken’ state. False otherwise.
Return type:: bool

init_plot(p_figure=None, p_plot_settings=Ellipsis, p_plot_depth: int = 0, p_detail_level: int = 0, p_step_rate: int = 0, **p_kwargs)

Initializes the plot functionalities of the class.

Parameters:

p_figure (Matplotlib.figure.Figure, optional) – Optional MatPlotLib host figure, where the plot shall be embedded. The default is None.
p_plot_settings (PlotSettings) – Optional plot settings. If None, the default view is plotted (see attribute C_PLOT_DEFAULT_VIEW).

update_plot(**p_kwargs)

Updates the plot.

Parameters:: **p_kwargs – Implementation-specific plot data and/or parameters.