fishyrl.actions module#

Utility action definitions for reinforcement learning agents.

class fishyrl.actions.ACTION_IDENTIFIERS(*values)#

Bases: Enum

String identifiers for action definitions, mapped to their corresponding classes.

CONTINUOUS = <class 'fishyrl.actions.ContinuousActions'>#

DISCRETE = <class 'fishyrl.actions.DiscreteAction'>#

DISCRETIZED_CONTINUOUS = <class 'fishyrl.actions.DiscretizedContinuousAction'>#

TWO_HOT_DISCRETIZED_CONTINUOUS = <class 'fishyrl.actions.TwoHotDiscretizedContinuousAction'>#

class fishyrl.actions.Action(*args: Any, **kwargs: Any)#

Bases: Module

Base class for actions.

abstractmethod construct(action: Tensor) → Tensor#

Construct the full action from the simplified action.

Parameters:: action (torch.Tensor) – The simplified action of shape (batch_dim).
Returns:: The full action of shape (batch_dim, output_dim).
Return type:: torch.Tensor

abstractmethod sample(logits: Tensor) → tuple[Tensor, Distribution]#

Sample an action from the logits.

Parameters:: logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
Returns:: The sampled action of shape (batch_dim, output_dim) and the corresponding distribution.
Return type:: tuple[torch.Tensor, torch.distributions.Distribution]

abstractmethod simplify(x: Tensor) → Tensor#

Simplify each action to a single value.

Parameters:: x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
Returns:: The simplified action of shape (batch_dim).
Return type:: torch.Tensor

abstract property input_dim: int#

The number of input features for the action.

Type:: int

abstract property num_actions: int#

The number of actions.

Type:: int

abstract property output_dim: int#

The number of output features for the action.

Type:: int

class fishyrl.actions.ContinuousActions(num_actions: int = 1, std_init: float = 2, std_min: float = 0.1, std_max: float = 1, clip: float = 0)#

Bases: Action

Continuous action definition using torch.distributionsNormal.

Computed using mean tanh(mean) and std (std_max - std_min) * sigmoid(std + std_init) + std_min.

__init__(num_actions: int = 1, std_init: float = 2, std_min: float = 0.1, std_max: float = 1, clip: float = 0) → None#

Initialize the action definition.

Parameters:

num_actions (int) – The number of actions to initialize.
std_init (float) – The initial standard deviation.
std_min (float) – The minimum standard deviation.
std_max (float) – The maximum standard deviation.
clip (float) – The maximum absolute value of the action.

construct(action: Tensor) → Tensor#

Construct the full action from the simplified action.

This is a no-op for continuous actions.

Parameters:: action (torch.Tensor) – The simplified action of shape (batch_dim, output_dim).
Returns:: The same action(s) of shape (batch_dim, output_dim).
Return type:: torch.Tensor

sample(logits: Tensor) → tuple[Tensor, Distribution]#

Sample logits using Normal, clipped to [-clip, clip].

Parameters:: logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
Returns:: Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
Return type:: tuple[torch.Tensor, torch.distributions.Distribution]

simplify(x: Tensor) → Tensor#

Simplify each action to a single value.

This is a no-op for continuous actions.

Parameters:: x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
Returns:: The same action(s) of shape (batch_dim, output_dim).
Return type:: torch.Tensor

property input_dim: int#

The number of input features for the continuous action, equal to 2 * num_actions.

Type:: int

property num_actions: int#

The number of actions.

Type:: int

property output_dim: int#

The number of output features for the continuous action, equal to num_actions.

Type:: int

class fishyrl.actions.DiscreteAction(num_options: int)#

Bases: Action

Discrete action definition using OneHotCategoricalStraightThrough.

__init__(num_options: int) → None#

Initialize the action definition.

Parameters:: num_options (int) – The number of options for the discrete action.

construct(action: Tensor) → Tensor#

Construct the full action from the simplified action.

Converts the provided action index to a one-hot vector.

Parameters:: action (torch.Tensor) – The simplified action of shape (batch_dim, 1).
Returns:: One-hot action of shape (batch_dim, output_dim).
Return type:: torch.Tensor

sample(logits: Tensor) → tuple[Tensor, Distribution]#

Sample logits using OneHotCategoricalStraightThrough.

Parameters:: logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
Returns:: Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
Return type:: tuple[torch.Tensor, torch.distributions.Distribution]

simplify(x: Tensor) → Tensor#

Simplify each action to a single value.

Takes the argmax of the provided one-hot vector.

Parameters:: x (torch.Tensor) – One-hot action of shape (batch_dim, output_dim).
Returns:: The same action(s) of shape (batch_dim, 1).
Return type:: torch.Tensor

property input_dim: int#

The number of input features for the discrete action, equal to the number of options.

Type:: int

property num_actions: int#

The number of actions.

Type:: int

property output_dim: int#

The number of output features for the discrete action, equal to the number of options.

Type:: int

class fishyrl.actions.DiscretizedContinuousAction(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08)#

Bases: Action

Discretized continuous action definition using a one-hot encoding.

__init__(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08) → None#

Initialize the action definition.

Parameters:

num_actions (int) – The number of actions to initialize.
bins (int) – The number of bins to use for discretization.
low (float) – The lower bound of the action values.
high (float) – The upper bound of the action values.
pre_func (callable) – A function to apply to the input logits before creating the distribution. (Default: symlog)
post_func (callable) – A function to apply to the output of the distribution. (Default: symexp)
eps (float) – A small value to add when computing entropy to avoid numerical issues. (Default: 1e-8)

construct(action: Tensor) → Tensor#

Construct the full action from the simplified action.

Takes the simplified action value and returns a one-hot encoding corresponding to the proper bin.

Parameters:: action (torch.Tensor) – The simplified action of shape (batch_dim, 1).
Returns:: The full action of shape (batch_dim, output_dim).
Return type:: torch.Tensor

sample(logits: Tensor) → tuple[Tensor, Distribution]#

Sample logits using a one-hot encoding.

Parameters:: logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
Returns:: Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
Return type:: tuple[torch.Tensor, torch.distributions.Distribution]

simplify(x: Tensor) → Tensor#

Simplify each action to a single value.

Takes the index of the one-hot encoded action and returns the corresponding bin value.

Parameters:: x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
Returns:: The simplified action of shape (batch_dim, 1).
Return type:: torch.Tensor

property input_dim: int#

The number of input features for the discretized continuous action, equal to the number of bins.

Type:: int

property num_actions: int#

The number of actions.

Type:: int

property output_dim: int#

The number of output features for the discretized continuous action, equal to the number of bins.

Type:: int

class fishyrl.actions.TwoHotDiscretizedContinuousAction(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08)#

Bases: Action

Discretized continuous action definition using a two-hot encoding.

__init__(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08) → None#

Initialize the action definition.

Parameters:

num_actions (int) – The number of actions to initialize.
bins (int) – The number of bins to use for discretization.
low (float) – The lower bound of the action values.
high (float) – The upper bound of the action values.
pre_func (callable) – A function to apply to the input logits before creating the distribution. (Default: symlog)
post_func (callable) – A function to apply to the output of the distribution. (Default: symexp)
eps (float) – A small value to add when computing entropy to avoid numerical issues. (Default: 1e-8)

construct(action: Tensor) → Tensor#

Construct the full action from the simplified action. Is a no-op for discretized continuous actions.

Parameters:: action (torch.Tensor) – The simplified action of shape (batch_dim, output_dim).
Returns:: The input tensor action.
Return type:: torch.Tensor

sample(logits: Tensor) → tuple[Tensor, TwoHot]#

Sample logits using a two-hot encoding.

Parameters:: logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).
Returns:: Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.
Return type:: tuple[torch.Tensor, frl_distributions.TwoHot]

simplify(x: Tensor) → Tensor#

Simplify each action to a single value. Is a no-op for discretized continuous actions.

Parameters:: x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).
Returns:: The input tensor x.
Return type:: torch.Tensor

property input_dim: int#

The number of input features for the discretized continuous action, equal to the number of bins.

Type:: int

property num_actions: int#

The number of actions.

Type:: int

property output_dim: int#

The number of output features for the discretized continuous action, always 1.

Type:: int

fishyrl.actions.construct_actions(actions: Tensor, model_actions: list[Action]) → Tensor#

Construct actions using the action definitions provided.

Parameters:

actions (torch.Tensor) – The simplified actions of shape (batch_dim, sum(num_actions)).
model_actions (list[Action]) – A list of action definitions.

Returns:

The full actions of shape (batch_dim, sum(output_dim)).

Return type:

torch.Tensor

fishyrl.actions.simplify_actions(actions: Tensor, model_actions: list[Action]) → Tensor#

Simplify actions using the action definitions provided.

Parameters:

actions (torch.Tensor) – The actions of shape (batch_dim, sum(output_dim)).
model_actions (list[Action]) – A list of action definitions.

Returns:

The simplified actions of shape (batch_dim, sum(num_actions)).

Return type:

torch.Tensor

fishyrl.actions module#

This Page