NS-Gym Evaluation Module¶

class ns_gym.evaluate.EnsembleMetric(agents={})[source]¶

Bases: Evaluator

Evaluates the difficulty of an NS-MDP by comparing mean reward over an ensemble of agents.

Parameters:: agents (dict) – A dictionary of agents to evaluate. The keys are the agent names and the values are the agent objects. Defaults to an empty dictionary.

evaluate(env, M=100, include_MCTS=False, include_RL=True, include_AlphaZero=False, verbose=True)[source]¶

Evaluate the difficulty of a particular NS-MDP by comparing the mean reward over an ensemble of agents. NS-Gym uses the following procedure to evaluate the difficulty of a particular NS-MDP:

For a particular NS-MDP, NS-Gym will look too see if there are saved agents in the directory. By default we will evaluate using StableBaseline3 RL agents. If there are no saved agents (say for custom environments), you will be prompted to train the agents.

Parameters:

env (gym.Env) – The non-stationary environment to evaluate
M (int) – The number of episodes to run. Defaults to 100.
include_MCTS (bool) – Whether to include the MCTS agent in the ensemble. Defaults to False.
include_RL (bool) – Whether to include the RL agents in the ensemble. Defaults to True.
include_AlphaZero (bool) – Whether to include the AlphaZero agent in the ensemble. Defaults to False.
verbose (bool) – Whether to print the results of the evaluation. Defaults to True.

Returns:

The mean reward over the ensemble of agents performance (dict): A dictionary of the performance of each agent in the ensemble

Return type:

ensemble_performance (float)

class ns_gym.evaluate.PAMCTS_Bound[source]¶

Bases: ComparativeEvaluator

Evaluates the difficulty of a transition between two environments using the PAMCTS-Bound metric.

\[\forall a \in A: \mid \mid P_t(s'\mid s,a) - P_0(s'\mid a,s)\mid \mid_{\infty}\]

evaluate(env_1, env_2, verbose=True)[source]¶

Evaluate the difficulty of a transition between two environments.

Parameters:

env_1 (gym.Env) – The original environment
env_2 (gym.Env) – The new environment
verbose (bool) – Whether to print the results of the evaluation. Defaults to True.

Returns:

The maximum difference between the transition probabilities of the two environments

Return type:

float