Uncertainty-First Statistics

Overview

py3plex implements an uncertainty-first statistics system where every statistic is represented as (value + uncertainty + provenance). This design makes uncertainty a core part of statistics computation, not an afterthought.

Key Principles

Every statistic is a StatValue: No plain floats/ints escaping internal computation paths. Even deterministic values carry Delta(0) uncertainty.
Uncertainty is first-class: Uncertainty can be summarized, sampled, propagated through arithmetic, and used in queries.
Backward compatible: Existing code expecting floats works via float(statvalue).
Registry discipline: Statistics must have an uncertainty model to be registered.
Reproducible: Random processes (bootstrap/MC) support explicit seeds tracked in provenance.

Core Components

StatValue

StatValue is the fundamental container for statistics:

from py3plex.stats import StatValue, Delta, Provenance

# Create a deterministic statistic
sv = StatValue(
    value=0.42,
    uncertainty=Delta(0.0),
    provenance=Provenance("degree", "delta", {})
)

# Access value
print(float(sv))  # 0.42

# Query uncertainty
print(sv.std())  # 0.0
print(sv.ci(0.95))  # (0.42, 0.42)
print(sv.robustness())  # 1.0

Key Methods:

float(sv): Convert to point estimate (backward compatibility)
sv.mean(): Alias for value
sv.std(): Standard deviation
sv.ci(level=0.95): Confidence interval
sv.robustness(): Robustness score in [0, 1]
sv.to_json_dict(): Serialize to JSON

Uncertainty Models

Five concrete uncertainty models are provided:

Delta

Deterministic or known-precision uncertainty.

from py3plex.stats import Delta

# Perfect certainty
d = Delta(0.0)

# Small known error
d = Delta(0.01)

Properties:

std(): Returns sigma
ci(level): Returns symmetric interval based on sigma
sample(n, seed): Returns constant samples
Propagation: Analytic error propagation when both are Delta

Gaussian

Normal distribution uncertainty.

from py3plex.stats import Gaussian

g = Gaussian(mean=0.0, std_dev=0.1)

# Exact CI computation
low, high = g.ci(0.95)  # ≈ (-0.196, 0.196)

Properties:

std(): Returns std_dev
ci(level): Exact Gaussian CI using z-scores
sample(n, seed): Generates Gaussian samples
Propagation: Analytic for addition/subtraction, Monte Carlo for complex ops

Bootstrap

Empirical uncertainty from bootstrap resampling.

from py3plex.stats import Bootstrap
import numpy as np

# Store bootstrap samples (relative to point estimate)
samples = np.array([0.1, -0.05, 0.15, 0.0, 0.08])
b = Bootstrap(samples)

# Compute CI from percentiles
low, high = b.ci(0.95)

Properties:

std(): Sample standard deviation
ci(level): Percentile-based CI
sample(n, seed): Resample from stored samples
Propagation: Always Monte Carlo
Serialization: Stores summary (n, std, CI) not full samples

Empirical

Similar to Bootstrap but conceptually for any empirical distribution.

from py3plex.stats import Empirical

samples = np.array([0.1, 0.2, 0.15, 0.18, 0.12])
e = Empirical(samples)

Properties:

Same as Bootstrap
Conceptually separate for clarity

Interval

Interval-based uncertainty without assuming distribution.

from py3plex.stats import Interval

i = Interval(-0.1, 0.15)

# Uniform sampling by default
samples = i.sample(100, seed=42)

Properties:

std(): Estimates std assuming uniform distribution: (high - low) / sqrt(12)
ci(level): Returns the interval bounds
sample(n, seed): Uniform sampling
Propagation: Monte Carlo

Provenance

Tracks how a statistic was computed:

from py3plex.stats import Provenance

prov = Provenance(
    algorithm="brandes",
    uncertainty_method="bootstrap",
    parameters={"n_samples": 100},
    seed=42,
    timestamp="2024-12-12T17:00:00",
    library_version="1.0.0"
)

# Serialize
json_dict = prov.to_json_dict()

Fields:

algorithm: Algorithm name (e.g., “degree”, “betweenness”)
uncertainty_method: Uncertainty method (e.g., “delta”, “bootstrap”)
parameters: Dict of parameters
seed: Random seed (optional)
timestamp: Computation time (optional)
library_version: Version string (optional)

Statistics Registry

The StatisticsRegistry enforces that every registered statistic has an uncertainty model.

Registration

from py3plex.stats import StatisticSpec, register_statistic, Delta

def compute_degree(network, node):
    return network.core_network.degree(node)

def degree_uncertainty(network, node, **kwargs):
    return Delta(0.0)  # Deterministic

spec = StatisticSpec(
    name="degree",
    estimator=compute_degree,
    uncertainty_model=degree_uncertainty,
    assumptions=["deterministic"],
    supports={"directed": True, "weighted": True}
)

register_statistic(spec)

Note: Registration fails if uncertainty_model is missing.

Usage

from py3plex.stats import compute_statistic

# Compute with uncertainty
result = compute_statistic("degree", network, node, with_uncertainty=True)
# Returns StatValue

# Compute without uncertainty (raw value)
value = compute_statistic("degree", network, node, with_uncertainty=False)

Arithmetic with Uncertainty

StatValue supports arithmetic operations with automatic uncertainty propagation.

Basic Operations

from py3plex.stats import StatValue, Gaussian, Provenance

sv1 = StatValue(1.0, Gaussian(0.0, 0.1), Provenance("a", "gaussian", {}))
sv2 = StatValue(2.0, Gaussian(0.0, 0.15), Provenance("b", "gaussian", {}))

# Addition
result = sv1 + sv2
print(float(result))  # 3.0
print(result.std())  # ~0.180 (sqrt(0.1² + 0.15²))

# Subtraction
result = sv1 - sv2

# Multiplication
result = sv1 * sv2

# Division
result = sv1 / sv2

# Power
result = sv1 ** 2

# Negation
result = -sv1

Scalar Operations

StatValue supports operations with scalars:

sv = StatValue(2.0, Gaussian(0.0, 0.1), Provenance("a", "gaussian", {}))

# Scalar addition
result = sv + 3  # 5.0 (uncertainty unchanged)

# Scalar multiplication
result = sv * 2  # 4.0 (uncertainty scaled)

# Scalar division
result = sv / 2  # 1.0 (uncertainty scaled)

Propagation Rules

Delta + Delta: Analytic error propagation (σ_sum = sqrt(σ1² + σ2²))
Gaussian + Gaussian: Exact propagation for addition/subtraction
Complex operations: Monte Carlo propagation (4096 samples by default)
Scalar operations: Direct computation, uncertainty scaled appropriately

Filtering and Queries

Statistics with uncertainty can be filtered using selectors:

Selector Syntax

Format: attribute__component__operator=value

Components:

mean: Point estimate (default if omitted)
std: Standard deviation
ci95__width: Width of 95% CI
robustness: Robustness score

Operators:

gt: Greater than
gte: Greater than or equal
lt: Less than
lte: Less than or equal
eq: Equal
ne: Not equal

Examples

# Filter by mean value
result = Q.nodes().where(degree__mean__gt=3).execute(network)

# Filter by uncertainty
result = Q.nodes().where(betweenness__std__lt=0.05).execute(network)

# Filter by CI width
result = Q.nodes().where(degree__ci95__width__lt=0.1).execute(network)

# Filter by robustness
result = Q.nodes().where(centrality__robustness__gt=0.9).execute(network)

Serialization

StatValue and Uncertainty models support JSON serialization.

StatValue Serialization

sv = StatValue(
    value=0.42,
    uncertainty=Gaussian(0.0, 0.05),
    provenance=Provenance("betweenness", "analytic", {})
)

json_dict = sv.to_json_dict()
# {
#   "value": 0.42,
#   "uncertainty": {
#     "type": "gaussian",
#     "mean": 0.0,
#     "std": 0.05
#   },
#   "provenance": {
#     "algorithm": "betweenness",
#     "uncertainty_method": "analytic",
#     "params": {}
#   }
# }

DataFrame Export

QueryResult can export to pandas with uncertainty columns:

result = Q.nodes().compute("betweenness").execute(network)

df = result.to_pandas()
# Columns: id, betweenness.value, betweenness.std,
#          betweenness.ci_low, betweenness.ci_high,
#          betweenness.uncertainty_type

Best Practices

Always use StatValue internally: Even for deterministic stats, use Delta(0)
Provide uncertainty models: Every registered statistic must have one
Use seeds for reproducibility: Pass explicit seeds to bootstrap/MC operations
Choose appropriate models:
- Deterministic → Delta(0)
- Known distribution → Gaussian
- Empirical estimation → Bootstrap or Empirical
- No distribution assumption → Interval
Check robustness: Use sv.robustness() to assess reliability
Export uncertainty: Include uncertainty columns in exports for downstream analysis

Examples

See:

examples/uncertainty/example_stats_degree_delta.py
examples/uncertainty/example_stats_betweenness_bootstrap.py

API Reference

StatValue

class StatValue:
    """Statistical value with uncertainty and provenance."""

    value: float | int | ndarray
    uncertainty: Uncertainty
    provenance: Provenance

    def __float__(self) -> float: ...
    def mean(self) -> float: ...
    def std(self) -> float: ...
    def ci(self, level: float = 0.95) -> tuple[float, float]: ...
    def robustness(self) -> float: ...
    def to_json_dict(self) -> dict: ...

Uncertainty Models

class Uncertainty(ABC):
    def summary(self, level: float = 0.95) -> dict: ...
    def sample(self, n: int, *, seed: int | None = None) -> ndarray: ...
    def ci(self, level: float = 0.95) -> tuple[float, float]: ...
    def std(self) -> float | None: ...
    def propagate(self, op: str, other: Uncertainty | None, *, seed: int | None = None) -> Uncertainty: ...
    def to_json_dict(self) -> dict: ...

class Delta(Uncertainty):
    sigma: float = 0.0

class Gaussian(Uncertainty):
    mean: float
    std_dev: float

class Bootstrap(Uncertainty):
    samples: ndarray

class Empirical(Uncertainty):
    samples: ndarray

class Interval(Uncertainty):
    low: float
    high: float

Provenance

@dataclass(frozen=True)
class Provenance:
    algorithm: str
    uncertainty_method: str
    parameters: dict = field(default_factory=dict)
    seed: int | None = None
    timestamp: str | None = None
    library_version: str | None = None

    def to_json_dict(self) -> dict: ...
    @classmethod
    def from_json_dict(cls, data: dict) -> Provenance: ...

Registry

@dataclass(frozen=True)
class StatisticSpec:
    name: str
    estimator: Callable
    uncertainty_model: Callable  # Required
    assumptions: list[str] = field(default_factory=list)
    supports: dict = field(default_factory=dict)

def register_statistic(spec: StatisticSpec, force: bool = False) -> None: ...
def get_statistic(name: str) -> StatisticSpec: ...
def list_statistics() -> list[str]: ...
def compute_statistic(name: str, *args, with_uncertainty: bool = True, **kwargs) -> Any: ...