fishyrl.dreamer module#
Main functions for training and running the Dreamer agent.
- fishyrl.dreamer.compute_actions(obs: Tensor, world_model: ContainerModule, actor_critic_model: ContainerModule, actions: Tensor = None, posteriors: Tensor = None, hidden_states: Tensor = None, initializations: Tensor = None) dict[str, ndarray | Tensor]#
Compute an action given the current observation.
- Parameters:
obs (torch.Tensor) – The current observation from the environment, as a tensor of shape (batch_dim, obs_dim).
world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.
actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.
actions (torch.Tensor) – The actions from the previous step. If not provided, will use a default. (Default:
None)posteriors (torch.Tensor) – The posterior states from the previous step. If not provided, will use a default. (Default:
None)hidden_states (torch.Tensor) – The hidden states from the previous step. If not provided, will use a default. (Default:
None)initializations (torch.Tensor) – The initializations (terminations | truncations) from the previous step. If not provided, will use a default. (Default:
None)
- Returns:
A dictionary containing the environment actions, actions, posteriors, hidden states, and world model output.
- Return type:
dict[str, Union[np.ndarray, torch.Tensor]]
- fishyrl.dreamer.construct_models(env_obs_dim: int, env_actions: list[Action], env_num: int, model_global_embedded: int = 1024, model_global_blocks: int = 5, model_global_dense: int = 1024, model_global_categorical_bins: int = 32, model_global_reward_bins: int = 255, model_global_stochastic_dim: int = 32, model_global_deterministic_dim: int = 4096, model_world_lr: float = 0.0001, model_world_eps: float = 1e-08, model_world_learnable_initial_state: bool = True, model_actor_lr: float = 8e-05, model_actor_eps: float = 1e-05, model_critic_lr: float = 8e-05, model_critic_eps: float = 1e-05, buffer_capacity: int = 1000000, scaler_eps: float = 1.0, replay_ratio: float = 0.5, device: device | str = 'cpu', **kwargs: dict[str, Any]) tuple[ContainerModule, ContainerModule, Container]#
Construct the models for the Dreamer agent.
Can optionally take a configuration dictionary with argument
cfg.- Parameters:
env_obs_dim (int) – The dimension of the environment observations.
env_actions (list[frl_actions.Action]) – A list of actions for the environment.
env_num (int) – The number of parallel environments. (Default:
1)model_global_embedded (int) – The dimension of the embedded observation space. (Default:
1024)model_global_blocks (int) – The number of blocks in the MLP models. (Default:
5)model_global_dense (int) – The dimension of the dense layers in the MLP models. (Default:
1024)model_global_categorical_bins (int) – The number of categorical bins for the stochastic state. (Default:
32)model_global_reward_bins (int) – The number of categorical bins for the reward prediction. (Default:
255)model_global_stochastic_dim (int) – The dimension of the stochastic state. (Default:
32)model_global_deterministic_dim (int) – The dimension of the deterministic state. (Default:
4096)model_world_lr (float) – The learning rate for the world model. (Default:
1e-4)model_world_eps (float) – The epsilon for the world model optimizer. (Default:
1e-8)model_world_learnable_initial_state (bool) – Whether the initial state of the
RSSMmodel is learnable. (Default:True)model_actor_lr (float) – The learning rate for the actor model. (Default:
8e-5)model_actor_eps (float) – The epsilon for the actor model optimizer. (Default:
1e-5)model_critic_lr (float) – The learning rate for the critic model. (Default:
8e-5)model_critic_eps (float) – The epsilon for the critic model optimizer. (Default:
8e-5)buffer_capacity (int) – The capacity of the replay buffer. (Default:
10**6)scaler_eps (float) – The epsilon for the lambda value normalizer. (Default:
1.)replay_ratio (float) – The ratio of model updates to environment steps. (Default:
0.5)device (torch.device | str) – The device to use for the models. (Default:
'cuda'if available, else'cpu')kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with
utilities.optional_flatten_cfg.
- Returns:
A tuple containing the world model, agent model, and utility modules.
- Return type:
tuple[frl_utilities.ContainerModule, frl_utilities.ContainerModule, frl_utilities.Container]
- fishyrl.dreamer.evaluate(env_name: str, world_model: ContainerModule, actor_critic_model: ContainerModule, seed: int = None) tuple[ndarray, float]#
Run a single evaluation episode for the Dreamer agent.
- Parameters:
env_name (str) – The name of the environment to evaluate in.
world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.
actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.
seed (int) – The random seed for the environment. (Default:
None)
- Returns:
A tuple containing the video frames of the episode (frames, height, width, channels) and the frames per second (FPS) of the environment rendering.
- Return type:
tuple[np.ndarray, float]
- fishyrl.dreamer.get_environments_and_actions(env_name: str, env_num: int = 1, env_actions: list[dict[str, Any]] = [], **kwargs: dict[str, Any]) tuple[AsyncVectorEnv, list[Action]]#
Get the environments and actions for the Dreamer agent, given a configuration.
Can optionally take a configuration dictionary with argument
cfg.- Parameters:
env_name (str) – The identifier of the environment to create.
env_num (int) – The number of parallel environments to create. (Default:
1)env_actions (list[dict[str, Any]]) – A list of action configurations for the environment. (Default:
[])kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with
utilities.optional_flatten_cfg.
- Returns:
A tuple containing the vectorized environments and a list of actions.
- fishyrl.dreamer.learning_step(batch: dict[str, Tensor], world_model: ContainerModule, actor_critic_model: ContainerModule, utility_modules: Container, tensorboard_writer: SummaryWriter = None, environment_step: int = -1, training_imagination_horizon: int = 15, training_free_nats: float = 1.0, training_kl_dyn: float = 0.5, training_kl_rep: float = 0.1, training_kl_reg: float = 1.0, training_continue_reg: float = 1.0, training_gamma: float = 0.997, training_lmbda: float = 0.95, training_world_model_grad_clip: float = 1000.0, training_actor_model_grad_clip: float = 100.0, training_critic_model_grad_clip: float = 100.0, **kwargs: dict[str, Any]) None#
Perform a learning step for the Dreamer agent.
- Parameters:
batch (dict[str, torch.Tensor]) – A batch of data from the replay buffer, containing tensors for observations, actions, rewards, terminations, and truncations.
world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.
actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.
utility_modules (frl_utilities.Container) – The utility modules for the Dreamer agent.
tensorboard_writer (torch.utils.tensorboard.SummaryWriter) – The TensorBoard writer for logging, if desired. (Default:
None)environment_step (int) – The current environment step. Used for logging purposes. (Default:
-1)training_imagination_horizon (int) – The number of steps to imagine into the future for actor-critic updates. (Default:
15)training_free_nats (float) – The number of free nats for KL balancing. (Default:
1.0)training_kl_dyn (float) – The weight for the dynamic KL loss. (Default:
0.5)training_kl_rep (float) – The weight for the representation KL loss. (Default:
0.1)training_kl_reg (float) – The overall weight for the KL regularization loss. (Default:
1.0)training_continue_reg (float) – The weight for the continue loss. (Default:
1.0)training_gamma (float) – The discount factor for future rewards. (Default:
0.997)training_lmbda (float) – The lambda parameter for computing lambda returns. (Default:
0.95)kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with
utilities.optional_flatten_cfg.
- fishyrl.dreamer.load_models(path: str = '.', world_model: ContainerModule = None, actor_critic_model: ContainerModule = None, utility_modules: Container = None, include_optimizers: bool = True) None#
Load the models and utilities from a file.
Uses
weights_only=Falseto allow loading optimizer states, which is somewhat unsafe, so make sure to only load from trusted weight files.- Parameters:
path (str) – The path to the weights file. (Default:
'.')world_model (frl_utilities.ContainerModule) – The world model for applying the loaded weights.
actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for applying the loaded weights.
utility_modules (frl_utilities.Container) – The utility modules for applying the loaded weights.
include_optimizers (bool) – Whether to load optimizer states or not. (Default:
True)
- fishyrl.dreamer.save_models(path: str = '.', world_model: ContainerModule | None = None, actor_critic_model: ContainerModule | None = None, utility_modules: Container | None = None, include_optimizers: bool = True) None#
Save the models and utilities to a file.
Can optionally take a configuration dictionary with argument
cfg.- Parameters:
path (str) – The file path for saving the models. (Default:
'.')world_model (frl_utilities.ContainerModule) – The world model to save. Excluded if not provided.
actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model to save. Excluded if not provided.
utility_modules (frl_utilities.Container) – The utility modules to save. Excluded if not provided.
include_optimizers (bool) – Whether to include the optimizer states in the saved file. (Default:
True)
- fishyrl.dreamer.train_loop(envs: AsyncVectorEnv, world_model: ContainerModule, actor_critic_model: ContainerModule, utility_modules: Container, tensorboard_writer: SummaryWriter = None, env_name: str = None, eval_frequency: int = 1000, training_steps: int = 1000000, training_pretrain_steps: int = 1024, training_critic_target_update_freq: int = 1, training_batch_size: int = 16, training_sequence_length: int = 64, training_tau: float = 0.02, **kwargs: dict[str, Any]) None#
Train the Dreamer agent in the given environment.
- Parameters:
envs (gym.vector.AsyncVectorEnv) – The vectorized environments to train in.
model_actions (list[frl_actions.Action]) – A list of actions for the model to use. Should correspond to the environment action space.
world_model (frl_utilities.ContainerModule) – The world model for the Dreamer agent.
actor_critic_model (frl_utilities.ContainerModule) – The actor-critic model for the Dreamer agent.
utility_modules (frl_utilities.Container) – The utility modules for the Dreamer agent.
tensorboard_writer (torch.utils.tensorboard.SummaryWriter) – The TensorBoard writer for logging, if desired. (Default:
None)training_steps (int) – The total number of environment steps to train for. (Default:
10**6)training_pretrain_steps (int) – The number of environment steps to pretrain the world model before starting training. (Default:
1024)training_critic_target_update_freq (int) – The interval for soft updates of the target critic. (Default:
1)training_batch_size (int) – The batch size for training. (Default:
16)training_sequence_length (int) – The batch sample sequence length for training. (Default:
64)training_tau (float) – The tau parameter for soft updates of the target critic. (Default:
0.02)kwargs (dict[str, Any]) – Catch keyword arguments for compatibility with
utilities.optional_flatten_cfg.