fishyrl.actions module#
Utility action definitions for reinforcement learning agents.
- class fishyrl.actions.ACTION_IDENTIFIERS(*values)#
Bases:
EnumString identifiers for action definitions, mapped to their corresponding classes.
- CONTINUOUS = <class 'fishyrl.actions.ContinuousActions'>#
- DISCRETE = <class 'fishyrl.actions.DiscreteAction'>#
- DISCRETIZED_CONTINUOUS = <class 'fishyrl.actions.DiscretizedContinuousAction'>#
- TWO_HOT_DISCRETIZED_CONTINUOUS = <class 'fishyrl.actions.TwoHotDiscretizedContinuousAction'>#
- class fishyrl.actions.Action(*args: Any, **kwargs: Any)#
Bases:
ModuleBase class for actions.
- abstractmethod construct(action: Tensor) Tensor#
Construct the full action from the simplified action.
- Parameters:
action (torch.Tensor) – The simplified action of shape (batch_dim).
- Returns:
The full action of shape (batch_dim, output_dim).
- Return type:
torch.Tensor
- abstractmethod sample(logits: Tensor) tuple[Tensor, Distribution]#
Sample an action from the logits.
- Parameters:
logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
- Returns:
The sampled action of shape (batch_dim, output_dim) and the corresponding distribution.
- Return type:
tuple[torch.Tensor, torch.distributions.Distribution]
- abstractmethod simplify(x: Tensor) Tensor#
Simplify each action to a single value.
- Parameters:
x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
- Returns:
The simplified action of shape (batch_dim).
- Return type:
torch.Tensor
- abstract property input_dim: int#
The number of input features for the action.
- Type:
int
- abstract property num_actions: int#
The number of actions.
- Type:
int
- abstract property output_dim: int#
The number of output features for the action.
- Type:
int
- class fishyrl.actions.ContinuousActions(num_actions: int = 1, std_init: float = 2, std_min: float = 0.1, std_max: float = 1, clip: float = 0)#
Bases:
ActionContinuous action definition using
torch.distributionsNormal.Computed using mean
tanh(mean)and std(std_max - std_min) * sigmoid(std + std_init) + std_min.- __init__(num_actions: int = 1, std_init: float = 2, std_min: float = 0.1, std_max: float = 1, clip: float = 0) None#
Initialize the action definition.
- Parameters:
num_actions (int) – The number of actions to initialize.
std_init (float) – The initial standard deviation.
std_min (float) – The minimum standard deviation.
std_max (float) – The maximum standard deviation.
clip (float) – The maximum absolute value of the action.
- construct(action: Tensor) Tensor#
Construct the full action from the simplified action.
This is a no-op for continuous actions.
- Parameters:
action (torch.Tensor) – The simplified action of shape (batch_dim, output_dim).
- Returns:
The same action(s) of shape (batch_dim, output_dim).
- Return type:
torch.Tensor
- sample(logits: Tensor) tuple[Tensor, Distribution]#
Sample logits using
Normal, clipped to [-clip, clip].- Parameters:
logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
- Returns:
Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
- Return type:
tuple[torch.Tensor, torch.distributions.Distribution]
- simplify(x: Tensor) Tensor#
Simplify each action to a single value.
This is a no-op for continuous actions.
- Parameters:
x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
- Returns:
The same action(s) of shape (batch_dim, output_dim).
- Return type:
torch.Tensor
- property input_dim: int#
The number of input features for the continuous action, equal to 2 * num_actions.
- Type:
int
- property num_actions: int#
The number of actions.
- Type:
int
- property output_dim: int#
The number of output features for the continuous action, equal to num_actions.
- Type:
int
- class fishyrl.actions.DiscreteAction(num_options: int)#
Bases:
ActionDiscrete action definition using
OneHotCategoricalStraightThrough.- __init__(num_options: int) None#
Initialize the action definition.
- Parameters:
num_options (int) – The number of options for the discrete action.
- construct(action: Tensor) Tensor#
Construct the full action from the simplified action.
Converts the provided action index to a one-hot vector.
- Parameters:
action (torch.Tensor) – The simplified action of shape (batch_dim, 1).
- Returns:
One-hot action of shape (batch_dim, output_dim).
- Return type:
torch.Tensor
- sample(logits: Tensor) tuple[Tensor, Distribution]#
Sample logits using
OneHotCategoricalStraightThrough.- Parameters:
logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
- Returns:
Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
- Return type:
tuple[torch.Tensor, torch.distributions.Distribution]
- simplify(x: Tensor) Tensor#
Simplify each action to a single value.
Takes the argmax of the provided one-hot vector.
- Parameters:
x (torch.Tensor) – One-hot action of shape (batch_dim, output_dim).
- Returns:
The same action(s) of shape (batch_dim, 1).
- Return type:
torch.Tensor
- property input_dim: int#
The number of input features for the discrete action, equal to the number of options.
- Type:
int
- property num_actions: int#
The number of actions.
- Type:
int
- property output_dim: int#
The number of output features for the discrete action, equal to the number of options.
- Type:
int
- class fishyrl.actions.DiscretizedContinuousAction(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08)#
Bases:
ActionDiscretized continuous action definition using a one-hot encoding.
- __init__(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08) None#
Initialize the action definition.
- Parameters:
num_actions (int) – The number of actions to initialize.
bins (int) – The number of bins to use for discretization.
low (float) – The lower bound of the action values.
high (float) – The upper bound of the action values.
pre_func (callable) – A function to apply to the input logits before creating the distribution. (Default:
symlog)post_func (callable) – A function to apply to the output of the distribution. (Default:
symexp)eps (float) – A small value to add when computing entropy to avoid numerical issues. (Default:
1e-8)
- construct(action: Tensor) Tensor#
Construct the full action from the simplified action.
Takes the simplified action value and returns a one-hot encoding corresponding to the proper bin.
- Parameters:
action (torch.Tensor) – The simplified action of shape (batch_dim, 1).
- Returns:
The full action of shape (batch_dim, output_dim).
- Return type:
torch.Tensor
- sample(logits: Tensor) tuple[Tensor, Distribution]#
Sample logits using a one-hot encoding.
- Parameters:
logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
- Returns:
Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
- Return type:
tuple[torch.Tensor, torch.distributions.Distribution]
- simplify(x: Tensor) Tensor#
Simplify each action to a single value.
Takes the index of the one-hot encoded action and returns the corresponding bin value.
- Parameters:
x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
- Returns:
The simplified action of shape (batch_dim, 1).
- Return type:
torch.Tensor
- property input_dim: int#
The number of input features for the discretized continuous action, equal to the number of bins.
- Type:
int
- property num_actions: int#
The number of actions.
- Type:
int
- property output_dim: int#
The number of output features for the discretized continuous action, equal to the number of bins.
- Type:
int
- class fishyrl.actions.TwoHotDiscretizedContinuousAction(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08)#
Bases:
ActionDiscretized continuous action definition using a two-hot encoding.
- __init__(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08) None#
Initialize the action definition.
- Parameters:
num_actions (int) – The number of actions to initialize.
bins (int) – The number of bins to use for discretization.
low (float) – The lower bound of the action values.
high (float) – The upper bound of the action values.
pre_func (callable) – A function to apply to the input logits before creating the distribution. (Default:
symlog)post_func (callable) – A function to apply to the output of the distribution. (Default:
symexp)eps (float) – A small value to add when computing entropy to avoid numerical issues. (Default:
1e-8)
- construct(action: Tensor) Tensor#
Construct the full action from the simplified action. Is a no-op for discretized continuous actions.
- Parameters:
action (torch.Tensor) – The simplified action of shape (batch_dim, output_dim).
- Returns:
The input tensor
action.- Return type:
torch.Tensor
- sample(logits: Tensor) tuple[Tensor, TwoHot]#
Sample logits using a two-hot encoding.
- Parameters:
logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
- Returns:
Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
- Return type:
tuple[torch.Tensor, frl_distributions.TwoHot]
- simplify(x: Tensor) Tensor#
Simplify each action to a single value. Is a no-op for discretized continuous actions.
- Parameters:
x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
- Returns:
The input tensor
x.- Return type:
torch.Tensor
- property input_dim: int#
The number of input features for the discretized continuous action, equal to the number of bins.
- Type:
int
- property num_actions: int#
The number of actions.
- Type:
int
- property output_dim: int#
The number of output features for the discretized continuous action, always 1.
- Type:
int
- fishyrl.actions.construct_actions(actions: Tensor, model_actions: list[Action]) Tensor#
Construct actions using the action definitions provided.
- Parameters:
actions (torch.Tensor) – The simplified actions of shape (batch_dim, sum(num_actions)).
model_actions (list[Action]) – A list of action definitions.
- Returns:
The full actions of shape (batch_dim, sum(output_dim)).
- Return type:
torch.Tensor
- fishyrl.actions.simplify_actions(actions: Tensor, model_actions: list[Action]) Tensor#
Simplify actions using the action definitions provided.
- Parameters:
actions (torch.Tensor) – The actions of shape (batch_dim, sum(output_dim)).
model_actions (list[Action]) – A list of action definitions.
- Returns:
The simplified actions of shape (batch_dim, sum(num_actions)).
- Return type:
torch.Tensor