Benchmarking & Performance

Performance characteristics and optimization strategies for py3plex.

Network Scale Guidelines

py3plex is optimized for research-scale networks:

Network Scale Performance

Network Size

Performance

Visualization

Recommendations

Small (<100 nodes)

Excellent

Fast, detailed

Use dense visualization mode

Medium (100-1k nodes)

Good

Fast, balanced

Default settings work well

Large (1k-10k nodes)

Good

Slower, minimal

Use sparse matrices, sampling

Very Large (>10k nodes)

Variable

Very slow

Sampling required

Performance Tips

Use Sparse Matrices

For large networks, use sparse matrix representations:

from py3plex.core import multinet

network = multinet.multi_layer_network(sparse=True)

This reduces memory usage by 10-100x for typical networks.

Batch Operations

Process multiple operations together:

from py3plex.dsl import Q

# Compute multiple metrics at once
result = (
    Q.nodes()
     .compute("degree", "betweenness_centrality", "clustering")
     .execute(network)
)

Avoid repeated single-metric computations.

Use Arrow/Parquet for I/O

For large datasets:

import pyarrow.parquet as pq

# Save
table = pq.write_table(edges_table, 'network.parquet')

# Load (much faster than CSV)
table = pq.read_table('network.parquet')

Parallel Processing

For Node2Vec and other CPU-intensive algorithms:

from py3plex.wrappers import train_node2vec

embeddings = train_node2vec(
    network,
    workers=8  # Use multiple CPU cores
)

Benchmark Results

Performance benchmarks for common operations on synthetic multilayer networks. These results provide guidance for planning analyses and optimizing workflows.

Test Environment:

  • CPU: Intel Core i7-9700K @ 3.6GHz (8 cores)

  • RAM: 32 GB DDR4

  • Python: 3.10

  • py3plex: v1.0.0

Algorithm Runtimes vs. Network Size

Community Detection (Louvain)

Louvain Algorithm Runtime

Nodes

Edges

Layers

Runtime

100

500

3

0.05s

1,000

5,000

3

0.3s

10,000

50,000

3

4.2s

100,000

500,000

3

58s

Centrality Computation (Betweenness)

Betweenness Centrality Runtime

Nodes

Edges

Layers

Runtime

100

500

3

0.12s

1,000

5,000

3

8.5s

10,000

50,000

3

1,240s (21 min)

100,000

500,000

3

N/A (too slow)

Note: Betweenness is O(n³) - use approximation methods for large networks

Node2Vec Embeddings

Node2Vec Runtime (128-dim, 10 walks/node)

Nodes

Edges

Layers

Runtime

100

500

3

2.3s

1,000

5,000

3

18s

10,000

50,000

3

245s (4 min)

100,000

500,000

3

3,200s (53 min)

Dynamics Simulation (SIR)

SIR Simulation Runtime (100 steps)

Nodes

Edges

Layers

Runtime

100

500

3

0.8s

1,000

5,000

3

4.5s

10,000

50,000

3

52s

100,000

500,000

3

680s (11 min)

Memory Usage Profiles

Peak Memory Usage

Nodes

Edges

Dense Storage

Sparse Storage

100

500

2 MB

0.5 MB

1,000

5,000

24 MB

2 MB

10,000

50,000

2.4 GB

18 MB

100,000

500,000

240 GB

180 MB

Key Insight: Sparse storage reduces memory by 10-1000x for typical networks.

Comparison with Other Tools

Community Detection: py3plex vs. NetworkX

Louvain Performance Comparison (1k nodes, 5k edges, single layer)

Tool

Runtime

Notes

py3plex

0.3s

Multilayer-aware

NetworkX + python-louvain

0.2s

Single-layer only

graph-tool

0.08s

C++ backend, faster

Verdict: py3plex is competitive for single-layer, adds multilayer capability others lack.

Node Embeddings: py3plex vs. node2vec

Node2Vec Performance (1k nodes, 128-dim, 10 walks)

Tool

Runtime

Notes

py3plex

18s

Python wrapper

node2vec (original)

15s

C++ implementation

Gensim

12s

Optimized Word2Vec

Verdict: py3plex uses established libraries (gensim), performance is comparable.

Benchmarking Notes:

  • Results vary based on network structure (density, clustering, layer coupling)

  • Runtimes scale differently for different algorithms (linear, quadratic, cubic)

  • Use these benchmarks as rough guidelines, not exact predictions

  • For the most accurate estimates, run benchmarks on your specific hardware and data

Running Custom Benchmarks:

See the benchmarks/ directory in the repository for scripts to reproduce these results or run your own benchmarks.

Running Benchmarks

Run benchmarks yourself:

cd benchmarks
python run_benchmarks.py

See the benchmarks/ directory in the repository for benchmark scripts.

Profiling Your Code

Use Python profiling tools:

import cProfile
import pstats

# Profile your analysis
cProfile.run('your_analysis_function(network)', 'profile_stats')

# View results
stats = pstats.Stats('profile_stats')
stats.sort_stats('cumulative')
stats.print_stats(20)

Memory Profiling

pip install memory_profiler
python -m memory_profiler your_script.py

Optimization Strategies

For Large Networks

  1. Sample the network for exploratory analysis

  2. Use layer-specific analysis instead of full multilayer

  3. Compute metrics incrementally rather than all at once

  4. Cache intermediate results

For Repeated Analysis

  1. Precompute and save expensive metrics

  2. Use config-driven workflows for reproducibility

  3. Batch process multiple networks

For Production

  1. Use Docker containers for consistent environments

  2. Implement monitoring for long-running jobs

  3. Add checkpointing for crash recovery

See Docker Usage Guide for deployment best practices.

Hardware Recommendations

Minimum:

  • 4 GB RAM

  • 2 CPU cores

  • Small networks (<1k nodes)

Recommended:

  • 16 GB RAM

  • 8 CPU cores

  • Networks up to 10k nodes

High-Performance:

  • 64+ GB RAM

  • 16+ CPU cores

  • Large networks (>10k nodes)

Next Steps