Architecture and Design

This document describes the system architecture, design patterns, and extension points of py3plex. It explains how the layers fit together, what each layer owns, and where to plug in new capabilities without leaking responsibilities across layers.

System Overview

py3plex uses a modular, layered architecture with explicit boundaries. Higher layers depend on lower ones, never the reverse; data flows downward for computation and back upward for presentation:

┌─────────────────────────────────────────────────────┐
│          High-Level Interfaces (Wrappers)           │
│  node2vec_embedding, benchmark_nodes                │
└─────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────┐
│              Algorithms Layer                        │
│  ┌──────────────┬───────────────┬─────────────────┐ │
│  │  Community   │  Statistics   │  Multilayer     │ │
│  │  Detection   │               │  Algorithms     │ │
│  └──────────────┴───────────────┴─────────────────┘ │
└─────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────┐
│            Visualization Layer                       │
│  ┌──────────────┬───────────────┬─────────────────┐ │
│  │  Multilayer  │  Drawing      │  Layout         │ │
│  │  Plots       │  Machinery    │  Algorithms     │ │
│  └──────────────┴───────────────┴─────────────────┘ │
└─────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────┐
│                Core Layer                            │
│  ┌──────────────┬───────────────┬─────────────────┐ │
│  │  multinet    │  Parsers      │  Converters     │ │
│  │  (MultiLayer │               │                 │ │
│  │   Network)   │               │                 │ │
│  └──────────────┴───────────────┴─────────────────┘ │
└─────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────┐
│           NetworkX Foundation                        │
│  MultiDiGraph, MultiGraph, Algorithms                │
└─────────────────────────────────────────────────────┘

Reading the diagram: NetworkX primitives sit at the base. The core layer wraps them in a multilayer-aware data model. Algorithms, visualization, and wrappers build on that model and never mutate it implicitly; wrappers orchestrate complete tasks while delegating computation to the lower layers.

Architectural Layers

Each layer has a narrow contract: the core encodes data, algorithms consume that encoding, visualization renders results, and wrappers orchestrate complete tasks. When extending py3plex, start from the lowest layer you need and only depend upward.

Core Layer

Purpose: Fundamental data structures and I/O operations. Everything else depends on this layer, so invariants and encoding live here.

Key Components:

multinet.py - The multi_layer_network class (core facade)
parsers.py - Input/output for various formats
converters.py - Format conversion utilities
random_generators.py - Random network generators
HINMINE/ - Heterogeneous network decomposition helpers

Responsibilities:

Network construction and mutation through a single API
File I/O (GraphML, GML, GEXF, edge lists, etc.)
Layer management (string ↔ integer IDs, delimiter handling)
Matrix representations (adjacency, supra-adjacency)
NetworkX integration and type selection (directed vs. undirected)
Cache ownership (e.g., supra adjacency, embeddings) and invalidation hooks triggered by mutations

Design Pattern: Facade Pattern — multi_layer_network exposes one consistent interface while hiding NetworkX wiring and encoding details

Algorithms Layer

Purpose: Network analysis algorithms optimized for multilayer networks. Algorithms assume core-layer encoding and never adjust it themselves.

Key Components:

community_detection/ - Community detection algorithms
statistics/ - Network statistics and metrics
multilayer_algorithms/ - Multilayer-specific methods
node_ranking/ - Centrality and ranking measures
general/ - General-purpose algorithms (random walks, etc.)

Responsibilities:

Community detection (Louvain, Infomap, Label Propagation)
Statistical analysis (multilayer density, inter-layer correlation, etc.)
Centrality computation (degree, betweenness, PageRank, and variants)
Random walks and embeddings
Network decomposition primitives
Result caching where appropriate (never mutating the underlying graph)

Design Pattern: Strategy Pattern — interchangeable algorithms share a common interface

Visualization Layer

Purpose: Network plotting and rendering for multilayer structures. Visualization consumes algorithm outputs but does not compute new graph state.

Key Components:

multilayer.py - High-level multilayer plotting
drawing_machinery.py - Core drawing primitives
layout_algorithms.py - Layout computation
colors.py - Color scheme generators
fa2/ - ForceAtlas2 layout

Responsibilities:

Diagonal projection plots and multilayer-specific layouts
Force-directed and ForceAtlas2 layouts
Matrix visualizations
Color mapping and legend helpers
Interactive plots (via Plotly)

Design Pattern: Template Method Pattern — layout algorithms follow a shared skeleton with overridable steps

Wrappers Layer

Purpose: High-level interfaces for common workflows so users can run end-to-end tasks without touching internals. Wrappers compose algorithms and visualization, but defer state changes to the core.

Key Components:

node2vec_embedding.py - Node2Vec embedding generation
benchmark_nodes.py - Node classification benchmarking

Responsibilities:

Simplified interfaces for multi-step workflows
Integration with external tools
Benchmarking and evaluation shortcuts

Design Pattern: Facade Pattern — hides orchestration and sensible defaults behind a small API

Core Data Structure

The multi_layer_network Class

Central to py3plex, this class wraps the underlying NetworkX graph and enforces multilayer encoding invariants:

class multi_layer_network:
    def __init__(self, directed=True, label_delimiter="---", coupling_weight=1.0):
        self.core_network = nx.MultiDiGraph() if directed else nx.MultiGraph()
        self.layer_name_map = {}  # Bidirectional mapping
        self.label_delimiter = label_delimiter
        self.coupling_weight = coupling_weight
        self.embedding = None
        self.labels = None

Key Attributes:

core_network - Underlying NetworkX graph
layer_name_map - Maps layer names to integer IDs
label_delimiter - Separator for node-layer encoding (default: "---")
coupling_weight - Default weight for inter-layer edges
embedding - Cached node embedding matrix
labels - Node classification labels

Encoding Scheme and Invariants:

Nodes are represented as (node_id, layer) tuples in Python.
When serialized to flat text (files, labels), tuples are joined with the delimiter: "{node_id}{delimiter}{layer}". Avoid using the delimiter inside raw IDs.
Layer names are mapped to integers in layer_name_map for stable ordering.
core_network stays a NetworkX MultiGraph/MultiDiGraph; avoid injecting derived attributes that are not part of the graph definition.
Inter-layer edges use coupling_weight unless explicitly weighted; modifying coupling should clear related caches.
Any mutation (adding/removing nodes, relabeling layers) should invalidate cached matrices or embeddings.

Design Patterns

Facade Pattern

Used in: multi_layer_network, wrappers

Purpose: Provide a simplified interface to complex subsystems and enforce a single entry point for mutations.

# Complex underlying operations hidden behind simple interface
network = multinet.multi_layer_network()
network.add_edges(edges, input_type='list')  # Handles parsing, encoding, validation
stats = network.basic_stats()  # Aggregates multiple NetworkX calls behind one call

Strategy Pattern

Used in: Algorithms, layout computation

Purpose: Interchangeable algorithms following a common interface; callers pick by name without changing call sites.

# Different community detection strategies
def detect_communities(network, method='louvain'):
    strategies = {
        'louvain': community_louvain.best_partition,
        'infomap': community_wrapper.infomap_communities,
        'label_prop': label_propagation.propagate
    }
    return strategies[method](network.core_network)  # Adding a new strategy means adding one entry to the map

Template Method Pattern

Used in: Visualization, layout algorithms

Purpose: Define algorithm skeleton, allow customization in subclasses; shared steps live in the base class.

class LayoutAlgorithm:
    def compute(self, graph):
        self.initialize(graph)
        self.iterate()
        return self.finalize()

    def initialize(self, graph):
        raise NotImplementedError

    def iterate(self):
        raise NotImplementedError

    def finalize(self):
        raise NotImplementedError

Dependency Injection

Used in: Configuration, algorithm parameters

Purpose: Inject dependencies rather than hard-coding so testing and theming stay configurable.

# Configuration injected rather than hard-coded
from py3plex.config import DEFAULT_COLORS, LAYOUT_PARAMS

def draw_network(network, colors=None, layout_params=None):
    colors = colors or DEFAULT_COLORS
    layout_params = layout_params or LAYOUT_PARAMS
    # Use injected configuration without touching global state

Data Flow

Typical Workflow

py3plex workflows follow a predictable sequence: load → compute → analyze → visualize → export. Each step uses the layer beneath it and should avoid mutating lower layers unless explicitly intended.

Input: Load or create network, keeping encoding consistent

network = multinet.multi_layer_network()
network.load_network("data.graphml", input_type="graphml")

Processing: Apply algorithms against the core graph without re-encoding

communities = community_louvain.best_partition(network.core_network)
centrality = calc.multilayer_degree_centrality(network)

Analysis: Compute statistics on the derived results

density = mls.layer_density(network, 'layer1')
correlation = mls.inter_layer_degree_correlation(network, 'layer1', 'layer2')

Visualization: Render results; visualization functions expect immutable inputs
```
draw_multilayer_default([network], display=True)
```

Output: Export results using the same delimiter and layer naming

network.save_network("output.graphml", output_type="graphml")

State Management

Immutable Operations: Most algorithms don’t modify the network or its caches

# These don't modify the network
centrality = calc.multilayer_degree_centrality(network)
communities = community_louvain.best_partition(network.core_network)

Mutable Operations: Some operations modify network state (nodes, edges, layer mappings, caches)

# These modify the network
network.add_edges(new_edges, input_type='list')
network.aggregate_layers(['L1', 'L2'], 'combined')

After running mutable operations, clear or recompute cached matrices/embeddings before downstream analysis. If you need isolation, copy the network or work on a subgraph before running destructive operations.

Extension Points

Custom Algorithms

Add new algorithms by following existing patterns (pure functions returning dictionaries or NetworkX objects). Keep inputs typed as multi_layer_network to reuse encoding and caching, and avoid mutating the passed network unless the function is explicitly transformative. Document input assumptions (directed vs. undirected, weighted vs. unweighted) and expose new callables in the relevant package __init__ when you want them importable by name.

# py3plex/algorithms/my_module/my_algorithm.py
from py3plex.core.multinet import multi_layer_network

def my_centrality(network: multi_layer_network) -> dict:
    """
    Custom centrality measure.

    Parameters
    ----------
    network : multi_layer_network
        Input network

    Returns
    -------
    dict
        Node centrality scores
    """
    G = network.core_network
    centrality = {}

    for node in G.nodes():
        # Implement custom logic
        centrality[node] = compute_score(node, G)

    return centrality

Custom Visualizations

Create custom plots using drawing machinery. Treat the network as read-only and reuse shared layout helpers to keep visuals consistent with built-in plots:

from py3plex.visualization import drawing_machinery as dm
import matplotlib.pyplot as plt

def my_custom_plot(network):
    """Custom visualization."""
    fig, ax = plt.subplots(figsize=(10, 8))

    # Compute layout
    pos = dm.compute_layout(network.core_network, 'force')

    # Draw elements
    dm.draw_nodes(ax, network.core_network, pos, node_size=50)
    dm.draw_edges(ax, network.core_network, pos, edge_width=1)
    dm.draw_labels(ax, pos, labels=network.get_node_labels())

    plt.show()

Custom Parsers

Add support for new file formats. Normalize layer names, respect label_delimiter, and raise the appropriate domain exceptions so callers can distinguish parsing failures from missing data:

# py3plex/core/parsers.py
def parse_my_format(input_file, **kwargs):
    """
    Parse custom file format.

    Parameters
    ----------
    input_file : str
        Path to input file

    Returns
    -------
    multi_layer_network
        Parsed network
    """
    network = multi_layer_network()

    with open(input_file, 'r') as f:
        for line in f:
            # Parse line and add to network
            pass

    return network

Configuration System

Centralized Configuration

py3plex/config.py provides centralized configuration:

# Default color palettes (8 options including colorblind-safe)
DEFAULT_COLORS = 'Set1'
COLORBLIND_SAFE = 'colorblind'

# Visualization defaults
DEFAULT_NODE_SIZE = 20
DEFAULT_EDGE_WIDTH = 1.0
DEFAULT_ALPHA = 0.7

# Layout parameters
LAYOUT_PARAMS = {
    'force': {'iterations': 500, 'optimal_distance': 1.0},
    'fa2': {'iterations': 1000, 'gravity': 1.0}
}

# Performance settings
SPARSE_THRESHOLD = 1000  # Use sparse matrices above this node count
MEMORY_WARNING_THRESHOLD = 10000  # Warn for large dense matrices

Usage:

from py3plex.config import DEFAULT_COLORS, LAYOUT_PARAMS

colors = DEFAULT_COLORS
iterations = LAYOUT_PARAMS['force']['iterations']

Prefer importing needed values rather than mutating module globals. For per-run overrides, pass parameters explicitly into drawing or layout helpers.

Testing Architecture

Test Organization

Tests mirror the architecture: core primitives first, then algorithms and workflows.

tests/
├── test_core_functionality.py      # Core data structure tests
├── test_multilayer_*.py            # Multilayer algorithm tests
├── test_random_walks.py            # Random walk tests
├── test_io_*.py                    # I/O and parsing tests
├── test_config_api.py              # Configuration tests
└── test_utils.py                   # Utility function tests

Test Patterns

Unit Tests: Test individual functions in isolation

def test_layer_density():
    network = create_test_network()
    density = mls.layer_density(network, 'layer1')
    assert 0 <= density <= 1

Integration Tests: Test workflows across modules

def test_community_detection_workflow():
    network = load_network("test_data.graphml")
    communities = community_louvain.best_partition(network.core_network)
    assert len(communities) > 0

Property-Based Tests: Test invariants

def test_centrality_normalization():
    network = create_random_network()
    centrality = calc.multilayer_degree_centrality(network)
    # Centrality values should not be negative; normalized variants stay within [0, 1]
    assert all(v >= 0 for v in centrality.values())

Performance Considerations

Lazy Evaluation

Expensive operations are computed on-demand and cached on the instance:

class multi_layer_network:
    @property
    def supra_adjacency(self):
        if self._supra_adj_cache is None:
            self._supra_adj_cache = self._compute_supra_adjacency()
        return self._supra_adj_cache

Sparse Matrices

Use sparse representations for large networks; the threshold is configurable in config.py:

def get_supra_adjacency_matrix(self, sparse=True):
    if sparse or len(self.get_nodes()) > SPARSE_THRESHOLD:
        return scipy.sparse.csr_matrix(adj)
    return np.array(adj)

Keep the chosen representation consistent downstream to avoid repeated dense↔sparse conversions.

Vectorization

Prefer NumPy vectorized operations to avoid Python loops:

# Bad: Python loop
degrees = [sum(1 for _ in G.neighbors(node)) for node in nodes]

# Good: Vectorized
degrees = np.array(list(dict(G.degree()).values()))

Logging Infrastructure

Centralized Logging

py3plex/logging_config.py provides structured logging:

import logging
from py3plex.logging_config import get_logger

logger = get_logger(__name__)

logger.info("Processing network with %d nodes", num_nodes)
logger.warning("Large network detected, using sparse matrices")
logger.error("Invalid layer: %s", layer_name)

Call get_logger once per module; configure logging once in the application entrypoint to avoid duplicate handlers.

Log Levels

DEBUG: Detailed diagnostic information
INFO: General informational messages
WARNING: Warning messages (e.g., performance concerns)
ERROR: Error messages
CRITICAL: Critical errors

Error Handling

Custom Exceptions

py3plex/exceptions.py defines domain-specific exceptions:

class NetworkError(Exception):
    """Base exception for network errors."""
    pass

class LayerNotFoundError(NetworkError):
    """Raised when layer doesn't exist."""
    pass

class InvalidFormatError(NetworkError):
    """Raised when file format is invalid."""
    pass

Usage:

from py3plex.exceptions import LayerNotFoundError

def get_layer(self, layer_name):
    if layer_name not in self.layer_name_map:
        raise LayerNotFoundError(f"Layer '{layer_name}' not found")
    return self.layer_name_map[layer_name]

Prefer raising domain-specific exceptions for recoverable errors so callers can distinguish parsing failures, missing layers, or invalid formats.

Future Architecture

Planned Improvements

Backend Registry: Support for igraph, cugraph backends
Streaming API: Process networks larger than memory
Distributed Computing: Dask/Ray integration for large-scale analysis
Plugin System: Easy addition of third-party algorithms
Type System: Full type hints coverage (currently 65%)

These roadmap items are exploratory; expect details and APIs to evolve.

Architecture and Design

System Overview

Architectural Layers

Core Layer

Algorithms Layer

Visualization Layer

Wrappers Layer

Core Data Structure

The multi_layer_network Class

Design Patterns

Facade Pattern

Strategy Pattern

Template Method Pattern

Dependency Injection

Data Flow

Typical Workflow

State Management

Extension Points

Custom Algorithms

Custom Visualizations

Custom Parsers

Configuration System

Centralized Configuration

Testing Architecture

Test Organization

Test Patterns

Performance Considerations

Lazy Evaluation

Sparse Matrices

Vectorization

Logging Infrastructure

Centralized Logging

Log Levels

Error Handling

Custom Exceptions

Future Architecture

Planned Improvements

See Also