fishyrl.actions module#

Utility action definitions for reinforcement learning agents.

class fishyrl.actions.ACTION_IDENTIFIERS(*values)#

Bases: Enum

String identifiers for action definitions, mapped to their corresponding classes.

CONTINUOUS = <class 'fishyrl.actions.ContinuousActions'>#
DISCRETE = <class 'fishyrl.actions.DiscreteAction'>#
DISCRETIZED_CONTINUOUS = <class 'fishyrl.actions.DiscretizedContinuousAction'>#
TWO_HOT_DISCRETIZED_CONTINUOUS = <class 'fishyrl.actions.TwoHotDiscretizedContinuousAction'>#
class fishyrl.actions.Action(*args: Any, **kwargs: Any)#

Bases: Module

Base class for actions.

abstractmethod construct(action: Tensor) Tensor#

Construct the full action from the simplified action.

Parameters:

action (torch.Tensor) – The simplified action of shape (batch_dim).

Returns:

The full action of shape (batch_dim, output_dim).

Return type:

torch.Tensor

abstractmethod sample(logits: Tensor) tuple[Tensor, Distribution]#

Sample an action from the logits.

Parameters:

logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).

Returns:

The sampled action of shape (batch_dim, output_dim) and the corresponding distribution.

Return type:

tuple[torch.Tensor, torch.distributions.Distribution]

abstractmethod simplify(x: Tensor) Tensor#

Simplify each action to a single value.

Parameters:

x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).

Returns:

The simplified action of shape (batch_dim).

Return type:

torch.Tensor

abstract property input_dim: int#

The number of input features for the action.

Type:

int

abstract property num_actions: int#

The number of actions.

Type:

int

abstract property output_dim: int#

The number of output features for the action.

Type:

int

class fishyrl.actions.ContinuousActions(num_actions: int = 1, std_init: float = 2, std_min: float = 0.1, std_max: float = 1, clip: float = 0)#

Bases: Action

Continuous action definition using torch.distributionsNormal.

Computed using mean tanh(mean) and std (std_max - std_min) * sigmoid(std + std_init) + std_min.

__init__(num_actions: int = 1, std_init: float = 2, std_min: float = 0.1, std_max: float = 1, clip: float = 0) None#

Initialize the action definition.

Parameters:
  • num_actions (int) – The number of actions to initialize.

  • std_init (float) – The initial standard deviation.

  • std_min (float) – The minimum standard deviation.

  • std_max (float) – The maximum standard deviation.

  • clip (float) – The maximum absolute value of the action.

construct(action: Tensor) Tensor#

Construct the full action from the simplified action.

This is a no-op for continuous actions.

Parameters:

action (torch.Tensor) – The simplified action of shape (batch_dim, output_dim).

Returns:

The same action(s) of shape (batch_dim, output_dim).

Return type:

torch.Tensor

sample(logits: Tensor) tuple[Tensor, Distribution]#

Sample logits using Normal, clipped to [-clip, clip].

Parameters:

logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).

Returns:

Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.

Return type:

tuple[torch.Tensor, torch.distributions.Distribution]

simplify(x: Tensor) Tensor#

Simplify each action to a single value.

This is a no-op for continuous actions.

Parameters:

x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).

Returns:

The same action(s) of shape (batch_dim, output_dim).

Return type:

torch.Tensor

property input_dim: int#

The number of input features for the continuous action, equal to 2 * num_actions.

Type:

int

property num_actions: int#

The number of actions.

Type:

int

property output_dim: int#

The number of output features for the continuous action, equal to num_actions.

Type:

int

class fishyrl.actions.DiscreteAction(num_options: int)#

Bases: Action

Discrete action definition using OneHotCategoricalStraightThrough.

__init__(num_options: int) None#

Initialize the action definition.

Parameters:

num_options (int) – The number of options for the discrete action.

construct(action: Tensor) Tensor#

Construct the full action from the simplified action.

Converts the provided action index to a one-hot vector.

Parameters:

action (torch.Tensor) – The simplified action of shape (batch_dim, 1).

Returns:

One-hot action of shape (batch_dim, output_dim).

Return type:

torch.Tensor

sample(logits: Tensor) tuple[Tensor, Distribution]#

Sample logits using OneHotCategoricalStraightThrough.

Parameters:

logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).

Returns:

Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.

Return type:

tuple[torch.Tensor, torch.distributions.Distribution]

simplify(x: Tensor) Tensor#

Simplify each action to a single value.

Takes the argmax of the provided one-hot vector.

Parameters:

x (torch.Tensor) – One-hot action of shape (batch_dim, output_dim).

Returns:

The same action(s) of shape (batch_dim, 1).

Return type:

torch.Tensor

property input_dim: int#

The number of input features for the discrete action, equal to the number of options.

Type:

int

property num_actions: int#

The number of actions.

Type:

int

property output_dim: int#

The number of output features for the discrete action, equal to the number of options.

Type:

int

class fishyrl.actions.DiscretizedContinuousAction(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08)#

Bases: Action

Discretized continuous action definition using a one-hot encoding.

__init__(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08) None#

Initialize the action definition.

Parameters:
  • num_actions (int) – The number of actions to initialize.

  • bins (int) – The number of bins to use for discretization.

  • low (float) – The lower bound of the action values.

  • high (float) – The upper bound of the action values.

  • pre_func (callable) – A function to apply to the input logits before creating the distribution. (Default: symlog)

  • post_func (callable) – A function to apply to the output of the distribution. (Default: symexp)

  • eps (float) – A small value to add when computing entropy to avoid numerical issues. (Default: 1e-8)

construct(action: Tensor) Tensor#

Construct the full action from the simplified action.

Takes the simplified action value and returns a one-hot encoding corresponding to the proper bin.

Parameters:

action (torch.Tensor) – The simplified action of shape (batch_dim, 1).

Returns:

The full action of shape (batch_dim, output_dim).

Return type:

torch.Tensor

sample(logits: Tensor) tuple[Tensor, Distribution]#

Sample logits using a one-hot encoding.

Parameters:

logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).

Returns:

Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.

Return type:

tuple[torch.Tensor, torch.distributions.Distribution]

simplify(x: Tensor) Tensor#

Simplify each action to a single value.

Takes the index of the one-hot encoded action and returns the corresponding bin value.

Parameters:

x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).

Returns:

The simplified action of shape (batch_dim, 1).

Return type:

torch.Tensor

property input_dim: int#

The number of input features for the discretized continuous action, equal to the number of bins.

Type:

int

property num_actions: int#

The number of actions.

Type:

int

property output_dim: int#

The number of output features for the discretized continuous action, equal to the number of bins.

Type:

int

class fishyrl.actions.TwoHotDiscretizedContinuousAction(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08)#

Bases: Action

Discretized continuous action definition using a two-hot encoding.

__init__(bins: int = 32, low: float = -1.0, high: float = 1.0, pre_func: callable = <function identity>, post_func: callable = <function identity>, eps: float = 1e-08) None#

Initialize the action definition.

Parameters:
  • num_actions (int) – The number of actions to initialize.

  • bins (int) – The number of bins to use for discretization.

  • low (float) – The lower bound of the action values.

  • high (float) – The upper bound of the action values.

  • pre_func (callable) – A function to apply to the input logits before creating the distribution. (Default: symlog)

  • post_func (callable) – A function to apply to the output of the distribution. (Default: symexp)

  • eps (float) – A small value to add when computing entropy to avoid numerical issues. (Default: 1e-8)

construct(action: Tensor) Tensor#

Construct the full action from the simplified action. Is a no-op for discretized continuous actions.

Parameters:

action (torch.Tensor) – The simplified action of shape (batch_dim, output_dim).

Returns:

The input tensor action.

Return type:

torch.Tensor

sample(logits: Tensor) tuple[Tensor, TwoHot]#

Sample logits using a two-hot encoding.

Parameters:

logits (torch.Tensor) – The base logits of shape (batch_dim, input_dim).

Returns:

Tuple containing the sampled action of shape (batch_dim, output_dim) and the distribution.

Return type:

tuple[torch.Tensor, frl_distributions.TwoHot]

simplify(x: Tensor) Tensor#

Simplify each action to a single value. Is a no-op for discretized continuous actions.

Parameters:

x (torch.Tensor) – The action(s) of shape (batch_dim, output_dim).

Returns:

The input tensor x.

Return type:

torch.Tensor

property input_dim: int#

The number of input features for the discretized continuous action, equal to the number of bins.

Type:

int

property num_actions: int#

The number of actions.

Type:

int

property output_dim: int#

The number of output features for the discretized continuous action, always 1.

Type:

int

fishyrl.actions.construct_actions(actions: Tensor, model_actions: list[Action]) Tensor#

Construct actions using the action definitions provided.

Parameters:
  • actions (torch.Tensor) – The simplified actions of shape (batch_dim, sum(num_actions)).

  • model_actions (list[Action]) – A list of action definitions.

Returns:

The full actions of shape (batch_dim, sum(output_dim)).

Return type:

torch.Tensor

fishyrl.actions.simplify_actions(actions: Tensor, model_actions: list[Action]) Tensor#

Simplify actions using the action definitions provided.

Parameters:
  • actions (torch.Tensor) – The actions of shape (batch_dim, sum(output_dim)).

  • model_actions (list[Action]) – A list of action definitions.

Returns:

The simplified actions of shape (batch_dim, sum(num_actions)).

Return type:

torch.Tensor