Description#
Currently, libraries across the ecosystem provide various APIs for seeding pseudo-random number generation. This SPEC suggests a unified, pragmatic API, taking into account technical and historical factors. Adopting such a uniform API will simplify the user experience, especially for those who rely on multiple projects.
We recommend:
- standardizing the usage and interpretation of an
rng
keyword for seeding, and - avoiding the use of global state and legacy bitstream generators.
We suggest implementing these principles by:
- deprecating uses of an existing seed argument (commonly
random_state
orseed
) in favor of a consistentrng
argument, - using
numpy.random.default_rng
to validate therng
argument and instantiate aGenerator
1, and - deprecating the use of
numpy.random.seed
to control the random state.
We are primarily concerned with API uniformity, but also encourage libraries to move towards using NumPy pseudo-random Generator
s because:
Generator
s avoid problems associated with naïve seeding (e.g., using successive integers), via its SeedSequence mechanism;- their use avoids relying on global state—which can make code execution harder to track, and may cause problems in parallel processing scenarios.
Scope#
This is intended as a recommendation to all libraries that allow users to control the state of a NumPy random number generator.
It is specifically targeted toward functions that currently accept RandomState
instances via an argument other than rng
, or allow numpy.random.seed
to control the random state, but the ideas are more broadly applicable.
Random number generators other than those provided by NumPy could also be accommodated by an rng
keyword, but that is beyond the scope of this SPEC.
Concepts#
BitGenerator
: Generates a stream of pseudo-random bits. The default generator in NumPy (numpy.random.default_rng
) uses PCG64.Generator
: Derives pseudo-random numbers from the bits produced by aBitGenerator
.RandomState
: a legacy object in NumPy, similar toGenerator
, that produces random numbers based on the Mersenne Twister.
Constraints#
NumPy, SciPy, scikit-learn, scikit-image, and NetworkX all implement pseudo-random seeding in slightly different ways.
Common keyword arguments include random_state
and seed
.
In practice, the seed is also often controllable using numpy.random.seed
.
Core Project Endorsement#
Endorsement of this SPEC means that a project considers the standardization and interpretation of the rng
keyword, as well as avoiding use of global state and legacy bitstream generators, good ideas that are worth implemented widely.
Ecosystem Adoption#
To adopt this SPEC, a project should:
- deprecate the use of
random_state
/seed
arguments in favor of anrng
argument in all functions where users need to control pseudo-random number generation, - use
numpy.random.default_rng
to validate therng
argument and instantiate aGenerator
, and - deprecate the use of
numpy.random.seed
to control the random state.
Badges#
Projects can highlight their adoption of this SPEC by including a SPEC badge.
[![SPEC 7 — Seeding pseudo-random number generation](https://img.shields.io/badge/SPEC-7-green?labelColor=%23004811&color=%235CA038)](https://scientific-python.org/specs/spec-0007/)
|SPEC 7 — Seeding pseudo-random number generation|
.. |SPEC 7 — Seeding pseudo-random number generation| image:: https://img.shields.io/badge/SPEC-7-green?labelColor=%23004811&color=%235CA038
:target: https://scientific-python.org/specs/spec-0007/
Implementation#
Legacy behavior in packages such as scikit-learn (sklearn.utils.check_random_state
) typically handle None
(use the global seed state), an int (convert to RandomState
), or RandomState
object.
Our recommendation here is a deprecation strategy which does not in all cases adhere to the Hinsen principle2,
although it could very nearly do so by enforcing the use of rng
as a keyword argument.
The deprecation strategy is as follows.
Initially, accept both rng
and the existing random_state
/seed
/...
keyword arguments.
- If both are specified by the user, raise an error.
- If
rng
is passed by keyword, validate it withnp.random.default_rng()
and use it to generate random numbers as needed. - If
random_state
/seed
/...
is specified (by keyword or position, if allowed), preserve existing behavior.
After rng
becomes available in all releases within the support window suggested by SPEC 0, emit warnings as follows:
-
If neither
rng
norrandom_state
/seed
/...
is specified andnp.random.seed
has been used to set the seed, emit aFutureWarning
about the upcoming change in behavior. -
If
random_state
/seed
/...
is passed by keyword or by position, treat it as before, but:- Emit a
DeprecationWarning
if passed by keyword, warning about the deprecation of keywordrandom_state
in favor ofrng
. - Emit a
FutureWarning
if passed by position, warning about the change in behavior of the positional argument.
- Emit a
After the deprecation period, accept only rng
, raising an error if random_state
/seed
/...
is provided.
By now, the function signature, with type annotations, could look like this:
from collections.abc import Sequence
import numpy as np
SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator
def my_func(*, rng: RNGLike | SeedLike | None = None):
"""My function summary.
Parameters
----------
rng : `numpy.random.Generator`, optional
Pseudorandom number generator state. When `rng` is None, a new
`numpy.random.Generator` is created using entropy from the
operating system. Types other than `numpy.random.Generator` are
passed to `numpy.random.default_rng` to instantiate a `Generator`.
"""
rng = np.random.default_rng(rng)
...
Also note the suggested language for the rng
parameter docstring, which encourages the user to pass a Generator
or None
, but allows for other types accepted by numpy.random.default_rng
(captured by the type annotation).
Impact#
There are three classes of users, which will be affected to varying degrees.
-
Those who do not attempt to control the random state. Their code will switch from using the unseeded global
RandomState
to using an unseededGenerator
. Since the underlying distributions of pseudo-random numbers will not change, these users should be largely unaffected. While technically this change does not adhere to the Hinsen principle, its impact should be minimal. -
Users of
random_state
/seed
arguments. Support for these arguments will be dropped eventually, but during the deprecation period, we can provide clear guidance, via warnings and documentation, on how to migrate to the newrng
keyword. -
Those who use
numpy.random.seed
. The proposal will do away with that global seeding mechanism, meaning that code that relies on it would, after the deprecation period, go from being seeded to being unseeded. To ensure that this does not go unnoticed, libraries that allowed for control of the random state vianumpy.random.seed
should raise aFutureWarning
ifnp.random.seed
has been called. (See Code below for an example.) To fully adhere to the Hinsen principle, these warnings should instead be raised as errors. In response, users will have to switch from usingnumpy.random.seed
to passing therng
argument explicitly to all functions that accept it.
Code#
As an example, consider how a SciPy function would transition from a random_state
parameter to an rng
parameter using a decorator.
import numpy as np
import functools
import warnings
def _transition_to_rng(old_name, *, position_num=None, end_version=None):
"""Example decorator to transition from old PRNG usage to new `rng` behavior
Suppose the decorator is applied to a function that used to accept parameter
`old_name='random_state'` either by keyword or as a positional argument at
`position_num=1`. At the time of application, the name of the argument in the
function signature is manually changed to the new name, `rng`. If positional
use was allowed before, this is not changed.*
- If the function is called with both `random_state` and `rng`, the decorator
raises an error.
- If `random_state` is provided as a keyword argument, the decorator passes
`random_state` to the function's `rng` argument as a keyword. If `end_version`
is specified, the decorator will emit a `DeprecationWarning` about the
deprecation of keyword `random_state`.
- If `random_state` is provided as a positional argument, the decorator passes
`random_state` to the function's `rng` argument by position. If `end_version`
is specified, the decorator will emit a `FutureWarning` about the changing
interpretation of the argument.
- If `rng` is provided as a keyword argument, the decorator validates `rng` using
`numpy.random.default_rng` before passing it to the function.
- If `end_version` is specified and neither `random_state` nor `rng` is provided
by the user, the decorator checks whether `np.random.seed` has been used to set
the global seed. If so, it emits a `FutureWarning`, noting that usage of
`numpy.random.seed` will eventually have no effect. Either way, the decorator
calls the function without explicitly passing the `rng` argument.
If `end_version` is specified, a user must pass `rng` as a keyword to avoid warnings.
After the deprecation period, the decorator can be removed, and the function
can simply validate the `rng` argument by calling `np.random.default_rng(rng)`.
* A `FutureWarning` is emitted when the PRNG argument is used by
position. It indicates that the "Hinsen principle" (same
code yielding different results in two versions of the software)
will be violated, unless positional use is deprecated. Specifically:
- If `None` is passed by position and `np.random.seed` has been used,
the function will change from being seeded to being unseeded.
- If an integer is passed by position, the random stream will change.
- If `np.random` or an instance of `RandomState` is passed by position,
an error will be raised.
We suggest that projects consider deprecating positional use of
`random_state`/`rng` (i.e., change their function signatures to
``def my_func(..., *, rng=None)``); that might not make sense
for all projects, so this SPEC does not make that
recommendation, neither does this decorator enforce it.
Parameters
----------
old_name : str
The old name of the PRNG argument (e.g. `seed` or `random_state`).
position_num : int, optional
The (0-indexed) position of the old PRNG argument (if accepted by position).
Maintainers are welcome to eliminate this argument and use, for example,
`inspect`, if preferred.
end_version : str, optional
The full version number of the library when the behavior described in
`DeprecationWarning`s and `FutureWarning`s will take effect. If left
unspecified, no warnings will be emitted by the decorator.
"""
NEW_NAME = "rng"
cmn_msg = (
"To silence this warning and ensure consistent behavior in SciPy "
f"{end_version}, control the RNG using argument `{NEW_NAME}`. Arguments passed "
f"to keyword `{NEW_NAME}` will be validated by `np.random.default_rng`, so the "
"behavior corresponding with a given value may change compared to use of "
f"`{old_name}`. For example, "
"1) `None` will result in unpredictable random numbers, "
"2) an integer will result in a different stream of random numbers, (with the "
"same distribution), and "
"3) `np.random` or `RandomState` instances will result in an error. "
"See the documentation of `default_rng` for more information."
)
def decorator(fun):
@functools.wraps(fun)
def wrapper(*args, **kwargs):
# Determine how PRNG was passed
as_old_kwarg = old_name in kwargs
as_new_kwarg = NEW_NAME in kwargs
as_pos_arg = position_num is not None and len(args) >= position_num + 1
emit_warning = end_version is not None
# Can only specify PRNG one of the three ways
if int(as_old_kwarg) + int(as_new_kwarg) + int(as_pos_arg) > 1:
message = (
f"{fun.__name__}() got multiple values for "
f"argument now known as `{NEW_NAME}`"
)
raise TypeError(message)
# Check whether global random state has been set
global_seed_set = np.random.mtrand._rand._bit_generator._seed_seq is None
if as_old_kwarg: # warn about deprecated use of old kwarg
kwargs[NEW_NAME] = kwargs.pop(old_name)
if emit_warning:
message = (
f"Use of keyword argument `{old_name}` is "
f"deprecated and replaced by `{NEW_NAME}`. "
f"Support for `{old_name}` will be removed "
f"in SciPy {end_version}."
) + cmn_msg
warnings.warn(message, DeprecationWarning, stacklevel=2)
elif as_pos_arg:
# Warn about changing meaning of positional arg
# Note that this decorator does not deprecate positional use of the
# argument; it only warns that the behavior will change in the future.
# Simultaneously transitioning to keyword-only use is another option.
arg = args[position_num]
# If the argument is None and the global seed wasn't set, or if the
# argument is one of a few new classes, the user will not notice change
# in behavior.
ok_classes = (
np.random.Generator,
np.random.SeedSequence,
np.random.BitGenerator,
)
if (arg is None and not global_seed_set) or isinstance(arg, ok_classes):
pass
elif emit_warning:
message = (
f"Positional use of `{NEW_NAME}` (formerly known as "
f"`{old_name}`) is still allowed, but the behavior is "
"changing: the argument will be validated using "
f"`np.random.default_rng` beginning in SciPy {end_version}, "
"and the resulting `Generator` will be used to generate "
"random numbers."
) + cmn_msg
warnings.warn(message, FutureWarning, stacklevel=2)
elif as_new_kwarg: # no warnings; this is the preferred use
# After the removal of the decorator, validation with
# np.random.default_rng will be done inside the decorated function
kwargs[NEW_NAME] = np.random.default_rng(kwargs[NEW_NAME])
elif global_seed_set and emit_warning:
# Emit FutureWarning if `np.random.seed` was used and no PRNG was passed
message = (
"The NumPy global RNG was seeded by calling "
f"`np.random.seed`. Beginning in {end_version}, this "
"function will no longer use the global RNG."
) + cmn_msg
warnings.warn(message, FutureWarning, stacklevel=2)
return fun(*args, **kwargs)
return wrapper
return decorator
# Example usage of _prepare_rng decorator.
# Suppose a library uses a custom random state validation function, such as
from scipy._lib._util import check_random_state
# https://github.com/scipy/scipy/blob/94532e74b902b569bfad504866cb53720c5f4f31/scipy/_lib/_util.py#L253
# Suppose a function `library_function` is defined as:
def library_function(arg1, random_state=None, arg2=0):
random_state = check_random_state(random_state)
return random_state.random() * arg1 + arg2
# We apply the decorator and change the function signature at the same time.
# The use of `random_state` throughout the function may be replaced with `rng`,
# or the variable may be defined as `random_state = rng`.
@_transition_to_rng("random_state", position_num=1)
def library_function(arg1, rng=None, arg2=0):
rng = check_random_state(rng)
return rng.random() * arg1 + arg2
# After `rng` is available in all releases within the support window suggested by
# SPEC 0, we pass the `end_version` param to the decorator to emit warnings.
@_transition_to_rng("random_state", position_num=1, end_version="1.17.0")
def library_function(arg1, rng=None, arg2=0):
rng = check_random_state(rng)
return rng.random() * arg1 + arg2
# At the end of the deprecation period, remove the decorator, and validate
# `rng` with` np.random.default_rng`.
def library_function(arg1, rng=None, arg2=0):
rng = np.random.default_rng(rng)
return rng.random() * arg1 + arg2
Notes#
-
Note that
numpy.random.default_rng
does not accept instances ofRandomState
, so use ofRandomState
to control the seed is effectively deprecated, too. That said, neithernp.random.seed
nornp.random.RandomState
themselves are deprecated, so they may still be used in some contexts (e.g. by developers for generating unit test data). ↩︎ -
The Hinsen principle states, loosely, that code should, whether executed now or in the future, return the same result, or raise an error. ↩︎