NS-Gym Base Module

class ns_gym.base.Reward(reward, env_change, delta_change, relative_time)[source]

Bases: object

Reward dataclass type. This is the output of the step function in the environment.

reward: Union[int, float]

The reward received from the environment

env_change: dict[str, bool]

A dictionary of boolean flags indicating what param of the environment has changed.

delta_change: Optional[float]

The change in the reward function of the environment.

relative_time: Union[int, float]

The relative time of the observation since the start of the environment episode.

class ns_gym.base.Scheduler(start=0, end=inf)[source]

Bases: ABC

Base class for scheduler functions. This class is used to determine when to update a parameter in the environment.

Start and end times inclusive.

__call__(t)[source]

Call method to determine whether to update the parameter or not. Subclasses must implement this method. :type t: int :param t: MDP timestep :type t: int

Returns:

Boolean flag indicating whether to update the parameter or not.

Return type:

bool

class ns_gym.base.UpdateFn(scheduler)[source]

Bases: ABC

Base class for update functions that update a single parameter. Updates a scalar parameter

Overview:

Instances of this class (and all subclasses) are callable and should be used to apply an update to a parameter. When an instance is called it executes the update logic defined in the subclass’s _update method. The __call__ method checks with the provided Scheduler to determine if an update should occur at the current time step. If an update is warranted, it invokes the _update method to modify the parameter and calculates the change in value.

Parameters:
  • scheduler (Scheduler) – scheduler object that determines when to update the parameter

  • scheduler – scheduler object that determines when to update the parameter

prev_param

The previous parameter value

prev_time

The previous time the parameter was updated

__call__(param, t)[source]

Update the parameter if the scheduler returns True

Parameters:
  • param (Any) – The parameter to be updated

  • t (Union[int,float]) – The current time step

Returns:

The updated parameter int: Binary flag indicating whether the parameter was updated or not, 1 means updated, 0 means not updated float: The amount of change in the parameter

Return type:

Union[int, float]

class ns_gym.base.UpdateDistributionFn(scheduler)[source]

Bases: UpdateFn

Base class for all update functions that update a distribution represented as a list

class ns_gym.base.NSWrapper(env, tunable_params, change_notification=False, delta_change_notification=False, in_sim_change=False, **kwargs)[source]

Bases: Wrapper

Base class for non-stationary wrappers

Parameters:
  • env (Env) – Gym environment

  • tunable_params (dict[str,Union[Type[UpdateFn],Type[UpdateDistributionFn]]) – Dictionary of parameter names and their associated update functions.

  • change_notification (bool) – Sets a basic notification level. Returns a boolean flag to indicate whether to notify the agent of changes in the environment. Defaults to False.

  • delta_change_notification (bool) – Sets detailed notification levle. Returns Flag to indicate whether to notify the agent of changes in the transition function. Defaults to False.

  • in_sim_change (bool) – Flag to indicate whether to allow changes in the environment during simulation (e.g MCTS rollouts). Defaults to False.

frozen

Flag to indicate whether the environment is frozen or not.

Type:

bool

is_sim_env

Flag to indicate whether the environment is a simulation environment or not.

Type:

bool

step(action, env_change, delta_change)[source]

Step function for the environment. Augments observations and rewards with additional information about changes in the environment and transition function.

Subclasses of this class will handle the actual environment dynamics and updating of parameters. This base class handles the notification mechanism that emulates the run-time monitor and model updater components of the decision-making infrastructure. The subclass must call this function via super().step(action, env_change, delta_change).

Parameters:
  • action (int) – Action taken by the agent

  • env_change (dict[str,bool]) – Environment change flags. Keys are parameter names and values are boolean flags indicating whether the parameter has changed.

  • delta_change (dict[str,bool]) – The amount of change a parameter has undergone. Keys are parameter names and values are the amount of change.

Returns:

observation, reward, termination flag, truncation flag, and additional information.

Return type:

tuple[observation, Type[Reward], bool, bool, dict[str, Any]]

reset(*, seed=None, options=None)[source]

Reset function for the environment. Resets the environment to its initial state and resets the time step counter.

Parameters:
  • seed (int | None) – Seed for the environment. Defaults to None.

  • options (dict[str, Any] | None) – Additional options for the environment. Defaults to None.

Returns:

observation and additional information.

Return type:

tuple[Any, dict[str, Any]]

freeze(mode=True)[source]

“Freezes” the current MDP so that the environment dynamics do not change. :type mode: bool :param mode: Boolean flag indicating whether to freeze the environment or not. Defaults to True. :type mode: bool

unfreeze()[source]

Unfreeze the environment dynamics for simulation.

This function “unfreezes” the current MDP so that the environment dynamics can change.

__deepcopy__(memo)[source]

Keeps track of deepcopying for the environment.

If a derived class of this environement is made we set a flag to indicate that the environment is the simulation environment.

This is the intended behavior for the deepcopy function. `python env = gym.make("FrozenLake-v1") env = NSFrozenLakeWrapper(env,updatefn,is_slippery=False) sim_env = deepcopy(env) ` Then sim_env.is_sim_env will be set to True.

Subclasses must implement this method.

get_planning_env()[source]

Get the planning environment.

Returns a copy of the current environment in its current state but the “transition function” is set to the initial transition function. Subclasses must implement this method.

get_default_params()[source]

Get dictionary of default parameters and their initial values

Returns:

Dictionary of parameter names and their initial values.

Return type:

dict[str,SupportsFloat]

__str__()[source]

Change the string representation of the environment so that user can see what/how parameters are being updated.

class ns_gym.base.Agent[source]

Bases: ABC

Base class for agents.

act(obs, *args, **kwargs)[source]

Agent decision making function. Subclasses must implement this method.

Parameters:

obs – Observation from the environment

Returns:

Action to be taken by the agent

Return type:

Any

class ns_gym.base.StableBaselineWrapper(model)[source]

Bases: object

Interface for StableBaseline3 Models and NS-Gym environments. Makes it so that you can call the stable baseline functions as you would other NS_Gym agents.

act(obs, *args, **kwargs)[source]

Agent decision making function. Calls the predict function of the StableBaseline3 model. :type obs: :param obs: Observation from the environment

Return type:

Any

class ns_gym.base.Evaluator(*args, **kwargs)[source]

Bases: ABC

Evaluator base class. This class is used to evaluate the difficulty of a transition between two environments.

evaluate(env_1, env_2, *args, **kwargs)[source]

Evaluate the difficulty of transitioning from env_1 to env_2. Subclasses must implement this method.

Parameters:
  • env_1 (Type[Env]) – The initial environment

  • env_2 (Type[Env]) – The target environment

Return type:

float

ns_gym.base.SUPPORTED_GRID_WORLD_ENV_IDS = ['CliffWalking-v1', 'FrozenLake-v1']

Tunable parameters dictionary. Keys are environment names and values are dictionaries of parameter names and their initial values.