fishyrl.dreamer module#

Main functions for training and running the Dreamer agent.

fishyrl.dreamer.compute_actions(obs: Tensor, world_model: ContainerModule, actor_critic_model: ContainerModule, actions: Tensor = None, posteriors: Tensor = None, hidden_states: Tensor = None, initializations: Tensor = None) dict[str, ndarray | Tensor]#

Compute an action given the current observation.

Parameters:
  • obs (torch.Tensor) – The current observation from the environment, as a tensor of shape (batch_dim, obs_dim).

  • world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.

  • actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.

  • actions (torch.Tensor) – The actions from the previous step. If not provided, will use a default. (Default: None)

  • posteriors (torch.Tensor) – The posterior states from the previous step. If not provided, will use a default. (Default: None)

  • hidden_states (torch.Tensor) – The hidden states from the previous step. If not provided, will use a default. (Default: None)

  • initializations (torch.Tensor) – The initializations (terminations | truncations) from the previous step. If not provided, will use a default. (Default: None)

Returns:

A dictionary containing the environment actions, actions, posteriors, hidden states, and world model output.

Return type:

dict[str, Union[np.ndarray, torch.Tensor]]

fishyrl.dreamer.construct_models(env_obs_dim: int, env_actions: list[Action], env_num: int, model_global_embedded: int = 1024, model_global_blocks: int = 5, model_global_dense: int = 1024, model_global_categorical_bins: int = 32, model_global_reward_bins: int = 255, model_global_stochastic_dim: int = 32, model_global_deterministic_dim: int = 4096, model_world_lr: float = 0.0001, model_world_eps: float = 1e-08, model_world_learnable_initial_state: bool = True, model_actor_lr: float = 8e-05, model_actor_eps: float = 1e-05, model_critic_lr: float = 8e-05, model_critic_eps: float = 1e-05, buffer_capacity: int = 1000000, scaler_eps: float = 1.0, replay_ratio: float = 0.5, device: device | str = 'cpu', **kwargs: dict[str, Any]) tuple[ContainerModule, ContainerModule, Container]#

Construct the models for the Dreamer agent.

Can optionally take a configuration dictionary with argument cfg.

Parameters:
  • env_obs_dim (int) – The dimension of the environment observations.

  • env_actions (list[frl_actions.Action]) – A list of actions for the environment.

  • env_num (int) – The number of parallel environments. (Default: 1)

  • model_global_embedded (int) – The dimension of the embedded observation space. (Default: 1024)

  • model_global_blocks (int) – The number of blocks in the MLP models. (Default: 5)

  • model_global_dense (int) – The dimension of the dense layers in the MLP models. (Default: 1024)

  • model_global_categorical_bins (int) – The number of categorical bins for the stochastic state. (Default: 32)

  • model_global_reward_bins (int) – The number of categorical bins for the reward prediction. (Default: 255)

  • model_global_stochastic_dim (int) – The dimension of the stochastic state. (Default: 32)

  • model_global_deterministic_dim (int) – The dimension of the deterministic state. (Default: 4096)

  • model_world_lr (float) – The learning rate for the world model. (Default: 1e-4)

  • model_world_eps (float) – The epsilon for the world model optimizer. (Default: 1e-8)

  • model_world_learnable_initial_state (bool) – Whether the initial state of the RSSM model is learnable. (Default: True)

  • model_actor_lr (float) – The learning rate for the actor model. (Default: 8e-5)

  • model_actor_eps (float) – The epsilon for the actor model optimizer. (Default: 1e-5)

  • model_critic_lr (float) – The learning rate for the critic model. (Default: 8e-5)

  • model_critic_eps (float) – The epsilon for the critic model optimizer. (Default: 8e-5)

  • buffer_capacity (int) – The capacity of the replay buffer. (Default: 10**6)

  • scaler_eps (float) – The epsilon for the lambda value normalizer. (Default: 1.)

  • replay_ratio (float) – The ratio of model updates to environment steps. (Default: 0.5)

  • device (torch.device | str) – The device to use for the models. (Default: 'cuda' if available, else 'cpu')

  • kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with utilities.optional_flatten_cfg.

Returns:

A tuple containing the world model, agent model, and utility modules.

Return type:

tuple[frl_utilities.ContainerModule, frl_utilities.ContainerModule, frl_utilities.Container]

fishyrl.dreamer.evaluate(env_name: str, world_model: ContainerModule, actor_critic_model: ContainerModule, seed: int = None) tuple[ndarray, float]#

Run a single evaluation episode for the Dreamer agent.

Parameters:
  • env_name (str) – The name of the environment to evaluate in.

  • world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.

  • actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.

  • seed (int) – The random seed for the environment. (Default: None)

Returns:

A tuple containing the video frames of the episode (frames, height, width, channels) and the frames per second (FPS) of the environment rendering.

Return type:

tuple[np.ndarray, float]

fishyrl.dreamer.get_environments_and_actions(env_name: str, env_num: int = 1, env_actions: list[dict[str, Any]] = [], **kwargs: dict[str, Any]) tuple[AsyncVectorEnv, list[Action]]#

Get the environments and actions for the Dreamer agent, given a configuration.

Can optionally take a configuration dictionary with argument cfg.

Parameters:
  • env_name (str) – The identifier of the environment to create.

  • env_num (int) – The number of parallel environments to create. (Default: 1)

  • env_actions (list[dict[str, Any]]) – A list of action configurations for the environment. (Default: [])

  • kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with utilities.optional_flatten_cfg.

Returns:

A tuple containing the vectorized environments and a list of actions.

fishyrl.dreamer.learning_step(batch: dict[str, Tensor], world_model: ContainerModule, actor_critic_model: ContainerModule, utility_modules: Container, tensorboard_writer: SummaryWriter = None, environment_step: int = -1, training_imagination_horizon: int = 15, training_free_nats: float = 1.0, training_kl_dyn: float = 0.5, training_kl_rep: float = 0.1, training_kl_reg: float = 1.0, training_continue_reg: float = 1.0, training_gamma: float = 0.997, training_lmbda: float = 0.95, training_world_model_grad_clip: float = 1000.0, training_actor_model_grad_clip: float = 100.0, training_critic_model_grad_clip: float = 100.0, **kwargs: dict[str, Any]) None#

Perform a learning step for the Dreamer agent.

Parameters:
  • batch (dict[str, torch.Tensor]) – A batch of data from the replay buffer, containing tensors for observations, actions, rewards, terminations, and truncations.

  • world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.

  • actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.

  • utility_modules (frl_utilities.Container) – The utility modules for the Dreamer agent.

  • tensorboard_writer (torch.utils.tensorboard.SummaryWriter) – The TensorBoard writer for logging, if desired. (Default: None)

  • environment_step (int) – The current environment step. Used for logging purposes. (Default: -1)

  • training_imagination_horizon (int) – The number of steps to imagine into the future for actor-critic updates. (Default: 15)

  • training_free_nats (float) – The number of free nats for KL balancing. (Default: 1.0)

  • training_kl_dyn (float) – The weight for the dynamic KL loss. (Default: 0.5)

  • training_kl_rep (float) – The weight for the representation KL loss. (Default: 0.1)

  • training_kl_reg (float) – The overall weight for the KL regularization loss. (Default: 1.0)

  • training_continue_reg (float) – The weight for the continue loss. (Default: 1.0)

  • training_gamma (float) – The discount factor for future rewards. (Default: 0.997)

  • training_lmbda (float) – The lambda parameter for computing lambda returns. (Default: 0.95)

  • kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with utilities.optional_flatten_cfg.

fishyrl.dreamer.load_models(path: str = '.', world_model: ContainerModule = None, actor_critic_model: ContainerModule = None, utility_modules: Container = None, include_optimizers: bool = True) None#

Load the models and utilities from a file.

Uses weights_only=False to allow loading optimizer states, which is somewhat unsafe, so make sure to only load from trusted weight files.

Parameters:
  • path (str) – The path to the weights file. (Default: '.')

  • world_model (frl_utilities.ContainerModule) – The world model for applying the loaded weights.

  • actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for applying the loaded weights.

  • utility_modules (frl_utilities.Container) – The utility modules for applying the loaded weights.

  • include_optimizers (bool) – Whether to load optimizer states or not. (Default: True)

fishyrl.dreamer.save_models(path: str = '.', world_model: ContainerModule | None = None, actor_critic_model: ContainerModule | None = None, utility_modules: Container | None = None, include_optimizers: bool = True) None#

Save the models and utilities to a file.

Can optionally take a configuration dictionary with argument cfg.

Parameters:
  • path (str) – The file path for saving the models. (Default: '.')

  • world_model (frl_utilities.ContainerModule) – The world model to save. Excluded if not provided.

  • actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model to save. Excluded if not provided.

  • utility_modules (frl_utilities.Container) – The utility modules to save. Excluded if not provided.

  • include_optimizers (bool) – Whether to include the optimizer states in the saved file. (Default: True)

fishyrl.dreamer.train_loop(envs: AsyncVectorEnv, world_model: ContainerModule, actor_critic_model: ContainerModule, utility_modules: Container, tensorboard_writer: SummaryWriter = None, env_name: str = None, eval_frequency: int = 1000, training_steps: int = 1000000, training_pretrain_steps: int = 1024, training_critic_target_update_freq: int = 1, training_batch_size: int = 16, training_sequence_length: int = 64, training_tau: float = 0.02, **kwargs: dict[str, Any]) None#

Train the Dreamer agent in the given environment.

Parameters:
  • envs (gym.vector.AsyncVectorEnv) – The vectorized environments to train in.

  • model_actions (list[frl_actions.Action]) – A list of actions for the model to use. Should correspond to the environment action space.

  • world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.

  • actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.

  • utility_modules (frl_utilities.Container) – The utility modules for the Dreamer agent.

  • tensorboard_writer (torch.utils.tensorboard.SummaryWriter) – The TensorBoard writer for logging, if desired. (Default: None)

  • training_steps (int) – The total number of environment steps to train for. (Default: 10**6)

  • training_pretrain_steps (int) – The number of environment steps to pretrain the world model before starting training. (Default: 1024)

  • training_critic_target_update_freq (int) – The interval for soft updates of the target critic. (Default: 1)

  • training_batch_size (int) – The batch size for training. (Default: 16)

  • training_sequence_length (int) – The batch sample sequence length for training. (Default: 64)

  • training_tau (float) – The tau parameter for soft updates of the target critic. (Default: 0.02)

  • kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with utilities.optional_flatten_cfg.