Uncertainty-First Statistics
Overview
py3plex implements an uncertainty-first statistics system where every statistic is represented as (value + uncertainty + provenance). This design makes uncertainty a core part of statistics computation, not an afterthought.
Key Principles
Every statistic is a StatValue: No plain floats/ints escaping internal computation paths. Even deterministic values carry
Delta(0)uncertainty.Uncertainty is first-class: Uncertainty can be summarized, sampled, propagated through arithmetic, and used in queries.
Backward compatible: Existing code expecting floats works via
float(statvalue).Registry discipline: Statistics must have an uncertainty model to be registered.
Reproducible: Random processes (bootstrap/MC) support explicit seeds tracked in provenance.
Core Components
StatValue
StatValue is the fundamental container for statistics:
from py3plex.stats import StatValue, Delta, Provenance
# Create a deterministic statistic
sv = StatValue(
value=0.42,
uncertainty=Delta(0.0),
provenance=Provenance("degree", "delta", {})
)
# Access value
print(float(sv)) # 0.42
# Query uncertainty
print(sv.std()) # 0.0
print(sv.ci(0.95)) # (0.42, 0.42)
print(sv.robustness()) # 1.0
Key Methods:
float(sv): Convert to point estimate (backward compatibility)sv.mean(): Alias for valuesv.std(): Standard deviationsv.ci(level=0.95): Confidence intervalsv.robustness(): Robustness score in [0, 1]sv.to_json_dict(): Serialize to JSON
Uncertainty Models
Five concrete uncertainty models are provided:
Delta
Deterministic or known-precision uncertainty.
from py3plex.stats import Delta
# Perfect certainty
d = Delta(0.0)
# Small known error
d = Delta(0.01)
Properties:
std(): Returns sigmaci(level): Returns symmetric interval based on sigmasample(n, seed): Returns constant samplesPropagation: Analytic error propagation when both are Delta
Gaussian
Normal distribution uncertainty.
from py3plex.stats import Gaussian
g = Gaussian(mean=0.0, std_dev=0.1)
# Exact CI computation
low, high = g.ci(0.95) # ≈ (-0.196, 0.196)
Properties:
std(): Returns std_devci(level): Exact Gaussian CI using z-scoressample(n, seed): Generates Gaussian samplesPropagation: Analytic for addition/subtraction, Monte Carlo for complex ops
Bootstrap
Empirical uncertainty from bootstrap resampling.
from py3plex.stats import Bootstrap
import numpy as np
# Store bootstrap samples (relative to point estimate)
samples = np.array([0.1, -0.05, 0.15, 0.0, 0.08])
b = Bootstrap(samples)
# Compute CI from percentiles
low, high = b.ci(0.95)
Properties:
std(): Sample standard deviationci(level): Percentile-based CIsample(n, seed): Resample from stored samplesPropagation: Always Monte Carlo
Serialization: Stores summary (n, std, CI) not full samples
Empirical
Similar to Bootstrap but conceptually for any empirical distribution.
from py3plex.stats import Empirical
samples = np.array([0.1, 0.2, 0.15, 0.18, 0.12])
e = Empirical(samples)
Properties:
Same as Bootstrap
Conceptually separate for clarity
Interval
Interval-based uncertainty without assuming distribution.
from py3plex.stats import Interval
i = Interval(-0.1, 0.15)
# Uniform sampling by default
samples = i.sample(100, seed=42)
Properties:
std(): Estimates std assuming uniform distribution:(high - low) / sqrt(12)ci(level): Returns the interval boundssample(n, seed): Uniform samplingPropagation: Monte Carlo
Provenance
Tracks how a statistic was computed:
from py3plex.stats import Provenance
prov = Provenance(
algorithm="brandes",
uncertainty_method="bootstrap",
parameters={"n_samples": 100},
seed=42,
timestamp="2024-12-12T17:00:00",
library_version="1.0.0"
)
# Serialize
json_dict = prov.to_json_dict()
Fields:
algorithm: Algorithm name (e.g., “degree”, “betweenness”)uncertainty_method: Uncertainty method (e.g., “delta”, “bootstrap”)parameters: Dict of parametersseed: Random seed (optional)timestamp: Computation time (optional)library_version: Version string (optional)
Statistics Registry
The StatisticsRegistry enforces that every registered statistic has an uncertainty model.
Registration
from py3plex.stats import StatisticSpec, register_statistic, Delta
def compute_degree(network, node):
return network.core_network.degree(node)
def degree_uncertainty(network, node, **kwargs):
return Delta(0.0) # Deterministic
spec = StatisticSpec(
name="degree",
estimator=compute_degree,
uncertainty_model=degree_uncertainty,
assumptions=["deterministic"],
supports={"directed": True, "weighted": True}
)
register_statistic(spec)
Note: Registration fails if uncertainty_model is missing.
Usage
from py3plex.stats import compute_statistic
# Compute with uncertainty
result = compute_statistic("degree", network, node, with_uncertainty=True)
# Returns StatValue
# Compute without uncertainty (raw value)
value = compute_statistic("degree", network, node, with_uncertainty=False)
Arithmetic with Uncertainty
StatValue supports arithmetic operations with automatic uncertainty propagation.
Basic Operations
from py3plex.stats import StatValue, Gaussian, Provenance
sv1 = StatValue(1.0, Gaussian(0.0, 0.1), Provenance("a", "gaussian", {}))
sv2 = StatValue(2.0, Gaussian(0.0, 0.15), Provenance("b", "gaussian", {}))
# Addition
result = sv1 + sv2
print(float(result)) # 3.0
print(result.std()) # ~0.180 (sqrt(0.1² + 0.15²))
# Subtraction
result = sv1 - sv2
# Multiplication
result = sv1 * sv2
# Division
result = sv1 / sv2
# Power
result = sv1 ** 2
# Negation
result = -sv1
Scalar Operations
StatValue supports operations with scalars:
sv = StatValue(2.0, Gaussian(0.0, 0.1), Provenance("a", "gaussian", {}))
# Scalar addition
result = sv + 3 # 5.0 (uncertainty unchanged)
# Scalar multiplication
result = sv * 2 # 4.0 (uncertainty scaled)
# Scalar division
result = sv / 2 # 1.0 (uncertainty scaled)
Propagation Rules
Delta + Delta: Analytic error propagation (
σ_sum = sqrt(σ1² + σ2²))Gaussian + Gaussian: Exact propagation for addition/subtraction
Complex operations: Monte Carlo propagation (4096 samples by default)
Scalar operations: Direct computation, uncertainty scaled appropriately
Filtering and Queries
Statistics with uncertainty can be filtered using selectors:
Selector Syntax
Format: attribute__component__operator=value
Components:
mean: Point estimate (default if omitted)std: Standard deviationci95__width: Width of 95% CIrobustness: Robustness score
Operators:
gt: Greater thangte: Greater than or equallt: Less thanlte: Less than or equaleq: Equalne: Not equal
Examples
# Filter by mean value
result = Q.nodes().where(degree__mean__gt=3).execute(network)
# Filter by uncertainty
result = Q.nodes().where(betweenness__std__lt=0.05).execute(network)
# Filter by CI width
result = Q.nodes().where(degree__ci95__width__lt=0.1).execute(network)
# Filter by robustness
result = Q.nodes().where(centrality__robustness__gt=0.9).execute(network)
Serialization
StatValue and Uncertainty models support JSON serialization.
StatValue Serialization
sv = StatValue(
value=0.42,
uncertainty=Gaussian(0.0, 0.05),
provenance=Provenance("betweenness", "analytic", {})
)
json_dict = sv.to_json_dict()
# {
# "value": 0.42,
# "uncertainty": {
# "type": "gaussian",
# "mean": 0.0,
# "std": 0.05
# },
# "provenance": {
# "algorithm": "betweenness",
# "uncertainty_method": "analytic",
# "params": {}
# }
# }
DataFrame Export
QueryResult can export to pandas with uncertainty columns:
result = Q.nodes().compute("betweenness").execute(network)
df = result.to_pandas()
# Columns: id, betweenness.value, betweenness.std,
# betweenness.ci_low, betweenness.ci_high,
# betweenness.uncertainty_type
Best Practices
Always use StatValue internally: Even for deterministic stats, use
Delta(0)Provide uncertainty models: Every registered statistic must have one
Use seeds for reproducibility: Pass explicit seeds to bootstrap/MC operations
Choose appropriate models:
Deterministic →
Delta(0)Known distribution →
GaussianEmpirical estimation →
BootstraporEmpiricalNo distribution assumption →
Interval
Check robustness: Use
sv.robustness()to assess reliabilityExport uncertainty: Include uncertainty columns in exports for downstream analysis
Examples
See:
examples/uncertainty/example_stats_degree_delta.pyexamples/uncertainty/example_stats_betweenness_bootstrap.py
API Reference
StatValue
class StatValue:
"""Statistical value with uncertainty and provenance."""
value: float | int | ndarray
uncertainty: Uncertainty
provenance: Provenance
def __float__(self) -> float: ...
def mean(self) -> float: ...
def std(self) -> float: ...
def ci(self, level: float = 0.95) -> tuple[float, float]: ...
def robustness(self) -> float: ...
def to_json_dict(self) -> dict: ...
Uncertainty Models
class Uncertainty(ABC):
def summary(self, level: float = 0.95) -> dict: ...
def sample(self, n: int, *, seed: int | None = None) -> ndarray: ...
def ci(self, level: float = 0.95) -> tuple[float, float]: ...
def std(self) -> float | None: ...
def propagate(self, op: str, other: Uncertainty | None, *, seed: int | None = None) -> Uncertainty: ...
def to_json_dict(self) -> dict: ...
class Delta(Uncertainty):
sigma: float = 0.0
class Gaussian(Uncertainty):
mean: float
std_dev: float
class Bootstrap(Uncertainty):
samples: ndarray
class Empirical(Uncertainty):
samples: ndarray
class Interval(Uncertainty):
low: float
high: float
Provenance
@dataclass(frozen=True)
class Provenance:
algorithm: str
uncertainty_method: str
parameters: dict = field(default_factory=dict)
seed: int | None = None
timestamp: str | None = None
library_version: str | None = None
def to_json_dict(self) -> dict: ...
@classmethod
def from_json_dict(cls, data: dict) -> Provenance: ...
Registry
@dataclass(frozen=True)
class StatisticSpec:
name: str
estimator: Callable
uncertainty_model: Callable # Required
assumptions: list[str] = field(default_factory=list)
supports: dict = field(default_factory=dict)
def register_statistic(spec: StatisticSpec, force: bool = False) -> None: ...
def get_statistic(name: str) -> StatisticSpec: ...
def list_statistics() -> list[str]: ...
def compute_statistic(name: str, *args, with_uncertainty: bool = True, **kwargs) -> Any: ...