NS-Gym Wrappers Module¶

class ns_gym.wrappers.NSClassicControlWrapper(env, tunable_params, change_notification=False, delta_change_notification=False, in_sim_change=False, **kwargs)[source]¶

Bases: NSWrapper

A non-stationary wrapper for Gymnasium’s Classic Control environments.

Parameters:

env (gym.Env) – Base gym environment.
tunable_params (dict[str,base.UpdateFn]) – Dictionary of parameter names and their associated update functions.
change_notification (bool, optional) – Flag to indicate whether to notify the agent of changes in the environment. Defaults to False.
delta_change_notification (bool, optional) – Flag to indicate whether to notify the agent of changes in the transition function. Defaults to False.
in_sim_change (bool, optional) – Flag to allow environmental changes to occur in the ‘planning’ environment. Defaults to False.

close()[source]¶: Closes the wrapper and env.

get_default_params()[source]¶: Get dictionary of default parameters and their initial values

get_planning_env()[source]¶

Return a copy of the environment .. note:

- If the environment is a simulation environment, the function returns a deepcopy of the simulation environment.
- If change notification is enabled, the function returns a deepcopy of the current environment because the decision making agent needs to be aware of the changes in the environment.
- If change notification is disabled, the function returns a deepcopy of the environment with the initial parameters.

reset(*, seed=None, options=None)[source]¶: Reset environment

step(action)[source]¶

Step through environment and update environmental parameters

Parameters:: action (Union[float,int]) – Action to take in environment
Returns:: NS-Gym Observation dictionary, reward, done flag, truncated flag, info dictionary
Return type:: tuple[dict[str, Any], base.Reward, bool, bool, dict[str, Any]]

class ns_gym.wrappers.NSFrozenLakeWrapper(env, tunable_params, change_notification=False, delta_change_notification=False, in_sim_change=False, initial_prob_dist=[1, 0, 0], modified_rewards=None, **kwargs)[source]¶

Bases: NSWrapper

A wrapper for the FrozenLake environment that allows for non-stationary transitions.

Parameters:

env (gym.Env) – The base FrozenLake environment to be wrapped.
tunable_params (dict[str,base.UpdateFn]]) – Dictionary of tunable parameters and their update functions. Currently only supports “P” for transition probabilities.
change_notification (bool, optional) – Do we notify the agent of a change in the MDP. Defaults to False.
delta_change_notification (bool, optional) – Do we notify the agent of the amount of change in the MDP. Defaults to False.
initial_prob_dist (list[float], optional) – The initial probability distribution over the action space. Defaults to [1,0,0].
is_slippery (bool, optional) – Is the environment slipperry. . Defaults to True.

Keyword Arguments:

modified_rewards (dict[str,Type[base.UpdateFn]], optional) – Set instantanious reward values as such: {“H”: -1, “G”: 1, “F”: 0,”S”:0} where “H” is the hole, “G” is the goal, “F” is the frozen lake and values are the rewards.

close()[source]¶: Closes the wrapper and env.

get_planning_env()[source]¶

Get the planning environment.

Returns a copy of the current environment in its current state but the “transition function” is set to the initial transition function. Subclasses must implement this method.

inc(row, col, a)[source]¶

reset(*, seed=None, options=None)[source]¶

Parameters:

seed (int | None, optional) – The random seed for initialization. Defaults to None.
options (dict[str, Any] | None, optional) – Additional options for resetting the environment. Defaults to None.

Returns:

The initial observation and additional information.

Return type:

tuple[Any, dict[str, Any]]

state_decoding(state)[source]¶

state_encoding(row, col)[source]¶

step(action)[source]¶

Parameters:: action (int) – The action to take in the environment.
Returns:: The observation, reward, termination signal, truncation signal, and additional information.
Return type:: tuple[dict, base.Reward, bool, bool, dict[str, Any]]

to_s(row, col)[source]¶

update_probability_matrix(row, col, action)[source]¶

class ns_gym.wrappers.NSCliffWalkingWrapper(env, tunable_params, change_notification=False, delta_change_notification=False, in_sim_change=False, initial_prob_dist=[1, 0, 0, 0], modified_rewards=None, terminal_cliff=False, **kwargs)[source]¶

Bases: NSWrapper

Wrapper for gridworld environments that allows for non-stationary transitions.

Parameters:

env (gym.Env) – The base CliffWalking environment to be wrapped.
tunable_params (dict[str,base.UpdateFn]]) – Dictionary of tunable parameters and their update functions. Currently only supports “P” for transition probabilities.
change_notification (bool, optional) – Do we notify the agent of a change in the MDP. Defaults to False.
delta_change_notification (bool, optional) – Do we notify the agent of the amount of change in the MDP. Defaults to False.
initial_prob_dist (list[float], optional) – The initial probability distribution over the action space. Defaults to [1,0,0,0].
modified_rewards (dict[str,int], optional) – Set instantanious reward values as such: {“H”: -100, “G”: 0, “F”: -1,”S”:-1} where “H” is the hole, “G” is the goal, “F” is the frozen lake and values are the rewards.
terminal_cliff (bool, optional) – Does stepping on the cliff terminate the episode. Defaults to False.

close()[source]¶: Closes the wrapper and env.

get_planning_env()[source]¶

Get the planning environment.

Returns a copy of the current environment in its current state but the “transition function” is set to the initial transition function. Subclasses must implement this method.

reset(*, seed=None, options=None)[source]¶

Reset function for the environment. Resets the environment to its initial state and resets the time step counter.

Parameters:

seed (int | None) – Seed for the environment. Defaults to None.
options (dict[str, Any] | None) – Additional options for the environment. Defaults to None.

Returns:

observation and additional information.

Return type:

tuple[Any, dict[str, Any]]

step(action)[source]¶

Parameters:: action (int) – The action to take in the environment.
Returns:: The observation, reward, termination signal, truncation signal, and additional information.
Return type:: tuple[dict, base.Reward, bool, bool, dict[str, Any]]

class ns_gym.wrappers.NSBridgeWrapper(env, tunable_params, change_notification=False, delta_change_notification=False, in_sim_change=False, initial_prob_dist=[1, 0, 0], modified_rewards=None)[source]¶

Bases: NSWrapper

Bridge environment wrapper that allows for non-stationary transitions.

get_planning_env()[source]¶

Get the planning environment.

Returns a copy of the current environment in its current state but the “transition function” is set to the initial transition function. Subclasses must implement this method.

reset(*, seed=None, options=None)[source]¶

Reset function for the environment. Resets the environment to its initial state and resets the time step counter.

Parameters:

seed (int | None) – Seed for the environment. Defaults to None.
options (dict[str, Any] | None) – Additional options for the environment. Defaults to None.

Returns:

observation and additional information.

Return type:

tuple[Any, dict[str, Any]]

step(action)[source]¶

Step function for the environment. Augments observations and rewards with additional information about changes in the environment and transition function.

Subclasses of this class will handle the actual environment dynamics and updating of parameters. This base class handles the notification mechanism that emulates the run-time monitor and model updater components of the decision-making infrastructure. The subclass must call this function via super().step(action, env_change, delta_change).

Parameters:

action (int) – Action taken by the agent
env_change (dict[str,bool]) – Environment change flags. Keys are parameter names and values are boolean flags indicating whether the parameter has changed.
delta_change (dict[str,bool]) – The amount of change a parameter has undergone. Keys are parameter names and values are the amount of change.

Returns:

observation, reward, termination flag, truncation flag, and additional information.

Return type:

tuple[observation, Type[Reward], bool, bool, dict[str, Any]]

property transition_matrix¶

class ns_gym.wrappers.MujocoWrapper(env, tunable_params, change_notification=False, delta_change_notification=False, in_sim_change=False, **kwargs)[source]¶

Bases: NSWrapper

reset(*, seed=None, options=None)[source]¶

Reset environment and restore initial model parameters.

Return type:: tuple[Any, dict[str, Any]]

step(action)[source]¶

Applies physics changes and then steps the environment.

Return type:: tuple[Any, Any, bool, bool, dict[str, Any]]

class ns_gym.wrappers.PursuitEvasionWrapper(env, tunable_params, change_notification=False, delta_change_notification=False, in_sim_change=False, **kwargs)[source]¶

Bases: NSWrapper

Wrapper to adapt CityEnvGym’s Pursuit-Evasion environment to the ns_gym interface.

get_planning_env()[source]¶

Get the planning environment.

Returns a copy of the current environment in its current state but the “transition function” is set to the initial transition function. Subclasses must implement this method.

render()[source]¶: Uses the render() of the env that can be overwritten to change the returned data.

reset(**kwargs)[source]¶

Reset function for the environment. Resets the environment to its initial state and resets the time step counter.

Parameters:

seed (int | None) – Seed for the environment. Defaults to None.
options (dict[str, Any] | None) – Additional options for the environment. Defaults to None.

Returns:

observation and additional information.

Return type:

tuple[Any, dict[str, Any]]

step(action, env_change, delta_change)[source]¶

Step function for the environment. Augments observations and rewards with additional information about changes in the environment and transition function.

Parameters:

action (int) – Action taken by the agent
env_change (dict[str,bool]) – Environment change flags. Keys are parameter names and values are boolean flags indicating whether the parameter has changed.
delta_change (dict[str,bool]) – The amount of change a parameter has undergone. Keys are parameter names and values are the amount of change.

Returns:

observation, reward, termination flag, truncation flag, and additional information.

Return type:

tuple[observation, Type[Reward], bool, bool, dict[str, Any]]