===================================
Uncertainty-First Statistics
===================================

Overview
========

py3plex implements an **uncertainty-first statistics system** where every statistic is represented as ``(value + uncertainty + provenance)``. This design makes uncertainty a core part of statistics computation, not an afterthought.

Key Principles
--------------

1. **Every statistic is a StatValue**: No plain floats/ints escaping internal computation paths. Even deterministic values carry ``Delta(0)`` uncertainty.

2. **Uncertainty is first-class**: Uncertainty can be summarized, sampled, propagated through arithmetic, and used in queries.

3. **Backward compatible**: Existing code expecting floats works via ``float(statvalue)``.

4. **Registry discipline**: Statistics must have an uncertainty model to be registered.

5. **Reproducible**: Random processes (bootstrap/MC) support explicit seeds tracked in provenance.

Core Components
===============

StatValue
---------

``StatValue`` is the fundamental container for statistics:

.. code-block:: python

    from py3plex.stats import StatValue, Delta, Provenance
    
    # Create a deterministic statistic
    sv = StatValue(
        value=0.42,
        uncertainty=Delta(0.0),
        provenance=Provenance("degree", "delta", {})
    )
    
    # Access value
    print(float(sv))  # 0.42
    
    # Query uncertainty
    print(sv.std())  # 0.0
    print(sv.ci(0.95))  # (0.42, 0.42)
    print(sv.robustness())  # 1.0

**Key Methods:**

- ``float(sv)``: Convert to point estimate (backward compatibility)
- ``sv.mean()``: Alias for value
- ``sv.std()``: Standard deviation
- ``sv.ci(level=0.95)``: Confidence interval
- ``sv.robustness()``: Robustness score in [0, 1]
- ``sv.to_json_dict()``: Serialize to JSON

Uncertainty Models
------------------

Five concrete uncertainty models are provided:

Delta
~~~~~

Deterministic or known-precision uncertainty.

.. code-block:: python

    from py3plex.stats import Delta
    
    # Perfect certainty
    d = Delta(0.0)
    
    # Small known error
    d = Delta(0.01)

**Properties:**

- ``std()``: Returns sigma
- ``ci(level)``: Returns symmetric interval based on sigma
- ``sample(n, seed)``: Returns constant samples
- Propagation: Analytic error propagation when both are Delta

Gaussian
~~~~~~~~

Normal distribution uncertainty.

.. code-block:: python

    from py3plex.stats import Gaussian
    
    g = Gaussian(mean=0.0, std_dev=0.1)
    
    # Exact CI computation
    low, high = g.ci(0.95)  # ≈ (-0.196, 0.196)

**Properties:**

- ``std()``: Returns std_dev
- ``ci(level)``: Exact Gaussian CI using z-scores
- ``sample(n, seed)``: Generates Gaussian samples
- Propagation: Analytic for addition/subtraction, Monte Carlo for complex ops

Bootstrap
~~~~~~~~~

Empirical uncertainty from bootstrap resampling.

.. code-block:: python

    from py3plex.stats import Bootstrap
    import numpy as np
    
    # Store bootstrap samples (relative to point estimate)
    samples = np.array([0.1, -0.05, 0.15, 0.0, 0.08])
    b = Bootstrap(samples)
    
    # Compute CI from percentiles
    low, high = b.ci(0.95)

**Properties:**

- ``std()``: Sample standard deviation
- ``ci(level)``: Percentile-based CI
- ``sample(n, seed)``: Resample from stored samples
- Propagation: Always Monte Carlo
- Serialization: Stores summary (n, std, CI) not full samples

Empirical
~~~~~~~~~

Similar to Bootstrap but conceptually for any empirical distribution.

.. code-block:: python

    from py3plex.stats import Empirical
    
    samples = np.array([0.1, 0.2, 0.15, 0.18, 0.12])
    e = Empirical(samples)

**Properties:**

- Same as Bootstrap
- Conceptually separate for clarity

Interval
~~~~~~~~

Interval-based uncertainty without assuming distribution.

.. code-block:: python

    from py3plex.stats import Interval
    
    i = Interval(-0.1, 0.15)
    
    # Uniform sampling by default
    samples = i.sample(100, seed=42)

**Properties:**

- ``std()``: Estimates std assuming uniform distribution: ``(high - low) / sqrt(12)``
- ``ci(level)``: Returns the interval bounds
- ``sample(n, seed)``: Uniform sampling
- Propagation: Monte Carlo

Provenance
----------

Tracks how a statistic was computed:

.. code-block:: python

    from py3plex.stats import Provenance
    
    prov = Provenance(
        algorithm="brandes",
        uncertainty_method="bootstrap",
        parameters={"n_samples": 100},
        seed=42,
        timestamp="2024-12-12T17:00:00",
        library_version="1.0.0"
    )
    
    # Serialize
    json_dict = prov.to_json_dict()

**Fields:**

- ``algorithm``: Algorithm name (e.g., "degree", "betweenness")
- ``uncertainty_method``: Uncertainty method (e.g., "delta", "bootstrap")
- ``parameters``: Dict of parameters
- ``seed``: Random seed (optional)
- ``timestamp``: Computation time (optional)
- ``library_version``: Version string (optional)

Statistics Registry
===================

The ``StatisticsRegistry`` enforces that every registered statistic has an uncertainty model.

Registration
------------

.. code-block:: python

    from py3plex.stats import StatisticSpec, register_statistic, Delta
    
    def compute_degree(network, node):
        return network.core_network.degree(node)
    
    def degree_uncertainty(network, node, **kwargs):
        return Delta(0.0)  # Deterministic
    
    spec = StatisticSpec(
        name="degree",
        estimator=compute_degree,
        uncertainty_model=degree_uncertainty,
        assumptions=["deterministic"],
        supports={"directed": True, "weighted": True}
    )
    
    register_statistic(spec)

**Note:** Registration fails if ``uncertainty_model`` is missing.

Usage
-----

.. code-block:: python

    from py3plex.stats import compute_statistic
    
    # Compute with uncertainty
    result = compute_statistic("degree", network, node, with_uncertainty=True)
    # Returns StatValue
    
    # Compute without uncertainty (raw value)
    value = compute_statistic("degree", network, node, with_uncertainty=False)

Arithmetic with Uncertainty
============================

StatValue supports arithmetic operations with automatic uncertainty propagation.

Basic Operations
----------------

.. code-block:: python

    from py3plex.stats import StatValue, Gaussian, Provenance
    
    sv1 = StatValue(1.0, Gaussian(0.0, 0.1), Provenance("a", "gaussian", {}))
    sv2 = StatValue(2.0, Gaussian(0.0, 0.15), Provenance("b", "gaussian", {}))
    
    # Addition
    result = sv1 + sv2
    print(float(result))  # 3.0
    print(result.std())  # ~0.180 (sqrt(0.1² + 0.15²))
    
    # Subtraction
    result = sv1 - sv2
    
    # Multiplication
    result = sv1 * sv2
    
    # Division
    result = sv1 / sv2
    
    # Power
    result = sv1 ** 2
    
    # Negation
    result = -sv1

Scalar Operations
-----------------

StatValue supports operations with scalars:

.. code-block:: python

    sv = StatValue(2.0, Gaussian(0.0, 0.1), Provenance("a", "gaussian", {}))
    
    # Scalar addition
    result = sv + 3  # 5.0 (uncertainty unchanged)
    
    # Scalar multiplication
    result = sv * 2  # 4.0 (uncertainty scaled)
    
    # Scalar division
    result = sv / 2  # 1.0 (uncertainty scaled)

Propagation Rules
-----------------

1. **Delta + Delta**: Analytic error propagation (``σ_sum = sqrt(σ1² + σ2²)``)
2. **Gaussian + Gaussian**: Exact propagation for addition/subtraction
3. **Complex operations**: Monte Carlo propagation (4096 samples by default)
4. **Scalar operations**: Direct computation, uncertainty scaled appropriately

Filtering and Queries
======================

Statistics with uncertainty can be filtered using selectors:

Selector Syntax
---------------

Format: ``attribute__component__operator=value``

**Components:**

- ``mean``: Point estimate (default if omitted)
- ``std``: Standard deviation
- ``ci95__width``: Width of 95% CI
- ``robustness``: Robustness score

**Operators:**

- ``gt``: Greater than
- ``gte``: Greater than or equal
- ``lt``: Less than
- ``lte``: Less than or equal
- ``eq``: Equal
- ``ne``: Not equal

Examples
--------

.. code-block:: python

    # Filter by mean value
    result = Q.nodes().where(degree__mean__gt=3).execute(network)
    
    # Filter by uncertainty
    result = Q.nodes().where(betweenness__std__lt=0.05).execute(network)
    
    # Filter by CI width
    result = Q.nodes().where(degree__ci95__width__lt=0.1).execute(network)
    
    # Filter by robustness
    result = Q.nodes().where(centrality__robustness__gt=0.9).execute(network)

Serialization
=============

StatValue and Uncertainty models support JSON serialization.

StatValue Serialization
-----------------------

.. code-block:: python

    sv = StatValue(
        value=0.42,
        uncertainty=Gaussian(0.0, 0.05),
        provenance=Provenance("betweenness", "analytic", {})
    )
    
    json_dict = sv.to_json_dict()
    # {
    #   "value": 0.42,
    #   "uncertainty": {
    #     "type": "gaussian",
    #     "mean": 0.0,
    #     "std": 0.05
    #   },
    #   "provenance": {
    #     "algorithm": "betweenness",
    #     "uncertainty_method": "analytic",
    #     "params": {}
    #   }
    # }

DataFrame Export
----------------

QueryResult can export to pandas with uncertainty columns:

.. code-block:: python

    result = Q.nodes().compute("betweenness").execute(network)
    
    df = result.to_pandas()
    # Columns: id, betweenness.value, betweenness.std, 
    #          betweenness.ci_low, betweenness.ci_high,
    #          betweenness.uncertainty_type

Best Practices
==============

1. **Always use StatValue internally**: Even for deterministic stats, use ``Delta(0)``
2. **Provide uncertainty models**: Every registered statistic must have one
3. **Use seeds for reproducibility**: Pass explicit seeds to bootstrap/MC operations
4. **Choose appropriate models**:
   
   - Deterministic → ``Delta(0)``
   - Known distribution → ``Gaussian``
   - Empirical estimation → ``Bootstrap`` or ``Empirical``
   - No distribution assumption → ``Interval``

5. **Check robustness**: Use ``sv.robustness()`` to assess reliability
6. **Export uncertainty**: Include uncertainty columns in exports for downstream analysis

Examples
========

See:

- ``examples/uncertainty/example_stats_degree_delta.py``
- ``examples/uncertainty/example_stats_betweenness_bootstrap.py``

API Reference
=============

StatValue
---------

.. code-block:: python

    class StatValue:
        """Statistical value with uncertainty and provenance."""
        
        value: float | int | ndarray
        uncertainty: Uncertainty
        provenance: Provenance
        
        def __float__(self) -> float: ...
        def mean(self) -> float: ...
        def std(self) -> float: ...
        def ci(self, level: float = 0.95) -> tuple[float, float]: ...
        def robustness(self) -> float: ...
        def to_json_dict(self) -> dict: ...

Uncertainty Models
------------------

.. code-block:: python

    class Uncertainty(ABC):
        def summary(self, level: float = 0.95) -> dict: ...
        def sample(self, n: int, *, seed: int | None = None) -> ndarray: ...
        def ci(self, level: float = 0.95) -> tuple[float, float]: ...
        def std(self) -> float | None: ...
        def propagate(self, op: str, other: Uncertainty | None, *, seed: int | None = None) -> Uncertainty: ...
        def to_json_dict(self) -> dict: ...
    
    class Delta(Uncertainty):
        sigma: float = 0.0
    
    class Gaussian(Uncertainty):
        mean: float
        std_dev: float
    
    class Bootstrap(Uncertainty):
        samples: ndarray
    
    class Empirical(Uncertainty):
        samples: ndarray
    
    class Interval(Uncertainty):
        low: float
        high: float

Provenance
----------

.. code-block:: python

    @dataclass(frozen=True)
    class Provenance:
        algorithm: str
        uncertainty_method: str
        parameters: dict = field(default_factory=dict)
        seed: int | None = None
        timestamp: str | None = None
        library_version: str | None = None
        
        def to_json_dict(self) -> dict: ...
        @classmethod
        def from_json_dict(cls, data: dict) -> Provenance: ...

Registry
--------

.. code-block:: python

    @dataclass(frozen=True)
    class StatisticSpec:
        name: str
        estimator: Callable
        uncertainty_model: Callable  # Required
        assumptions: list[str] = field(default_factory=list)
        supports: dict = field(default_factory=dict)
    
    def register_statistic(spec: StatisticSpec, force: bool = False) -> None: ...
    def get_statistic(name: str) -> StatisticSpec: ...
    def list_statistics() -> list[str]: ...
    def compute_statistic(name: str, *args, with_uncertainty: bool = True, **kwargs) -> Any: ...