Community Detection
Finding groups that span multiple layers of interaction.
DSL Tip: Filter by Communities
After detecting communities, use DSL to analyze them:
from py3plex.core import multinet
from py3plex.dsl import Q, L
from py3plex.algorithms.community_detection import community_louvain
# Step 1: Detect communities
communities = community_louvain.best_partition(network.core_network)
# Step 2: Store community IDs as node attributes
for node, comm_id in communities.items():
network.core_network.nodes[node]['community'] = comm_id
# Step 3: Find high-degree nodes in specific community
result = (
Q.nodes()
.where(community=0) # Filter by community ID
.compute("degree", "betweenness_centrality")
.order_by("-degree")
.limit(10)
.execute(network)
)
# Step 4: Export for further analysis
df = result.to_pandas()
df.to_csv("community_hubs.csv", index=False)
# Example output:
# id degree betweenness_centrality
# (Alice, social) 3 0.667
# (Bob, social) 2 0.000
# (Charlie, social) 2 0.000
Combine traditional algorithms with DSL queries for powerful workflows!
Networks are rarely homogeneous. People cluster into social groups. Proteins form functional modules. Cities organize into regional hubs. Community detection finds these natural groupings—but in multilayer networks, the question is more subtle: do communities exist within layers, across layers, or both?
This chapter shows you how to detect communities in multilayer networks, how to tune algorithm parameters for your specific domain, and how to interpret results that may differ from single-layer community detection.
Overview
Community detection identifies groups of nodes that are more densely connected to each other than to the rest of the network. For multilayer networks, communities can span multiple layers, accounting for both intra-layer and inter-layer structure.
Key insight: A person who is moderately connected across multiple social platforms (layers) may be more central to a cross-platform community than someone who is highly connected on just one platform. Multilayer community detection captures this.
Supported Algorithms
py3plex provides several community detection algorithms:
Louvain — Fast modularity optimization (recommended for most use cases)
Leiden — Improved modularity optimization with better guarantees
Infomap — Flow-based community detection (requires external binary)
Label Propagation — Semi-supervised approach with known seed communities
Multilayer Modularity — True multilayer community detection (Mucha et al. 2010)
SBM (Stochastic Block Model) — Model-based approach with automatic model selection
DC-SBM (Degree-Corrected SBM) — Default SBM variant that accounts for degree heterogeneity
Model-Based vs. Objective-Based Algorithms
Objective-based algorithms (Louvain, Leiden, Infomap) optimize a quality function like modularity or code length:
Pros: Fast, scalable, work well on most networks
Cons: Number of communities not specified a priori, can be sensitive to resolution
Model-based algorithms (SBM, DC-SBM) fit a generative model to the network:
Pros: Principled model selection (automatic K), uncertainty quantification, handles degree heterogeneity (DC-SBM)
Cons: Computationally more expensive, requires careful parameter tuning
When to use: When you need to select the number of communities automatically, when degree distributions vary widely, or when you need rigorous statistical inference
Rule of thumb: Use Louvain/Leiden for exploratory analysis and large networks. Use DC-SBM when you need automatic model selection or degree-corrected communities.
Louvain Algorithm
Basic Usage
Fastest algorithm for large networks:
from py3plex.core import multinet
from py3plex.algorithms.community_detection import community_louvain
# Create or load network
network = multinet.multi_layer_network()
network.load_network("data.graphml", input_type="graphml")
# Detect communities using Louvain
communities = community_louvain.best_partition(network.core_network)
# Print results
for node, community_id in communities.items():
print(f"Node {node} -> Community {community_id}")
Parameters
# With custom resolution parameter
communities = community_louvain.best_partition(
network.core_network,
resolution=1.0 # Higher = more communities, Lower = fewer communities
)
Resolution parameter:
resolution=1.0- Standard modularityresolution>1.0- More, smaller communitiesresolution<1.0- Fewer, larger communities
Advantages
Very fast: \(O(n \log n)\)
Scales to millions of nodes
BSD license (commercial-friendly)
Well-established and widely used
Disadvantages
Non-deterministic (random initialization)
Cannot find overlapping communities
Resolution limit issues
Infomap Algorithm
Basic Usage
Flow-based approach for detecting communities:
from py3plex.algorithms.community_detection import community_wrapper
# Detect communities using Infomap
communities = community_wrapper.infomap_communities(
network.core_network,
binary_path="/path/to/infomap" # Path to Infomap binary
)
With Hierarchical Structure
# Get hierarchical community structure
hierarchical_communities = community_wrapper.infomap_communities(
network.core_network,
binary_path="/path/to/infomap",
hierarchical=True
)
Advantages
Can detect overlapping communities
Flow-based (natural for many applications)
Hierarchical structure
Information-theoretic foundation
Disadvantages
Requires external binary
AGPLv3 license (viral copyleft - problematic for commercial use)
Slower than Louvain
Label Propagation
Semi-Supervised Detection
Use when you have some known community memberships:
from py3plex.algorithms.community_detection import label_propagation
# Provide seed labels for some nodes
seed_labels = {
'node1': 0,
'node2': 0,
'node3': 1,
'node4': 1
}
# Propagate labels to unlabeled nodes
communities = label_propagation.propagate(
network.core_network,
seed_labels=seed_labels,
max_iter=100
)
Fully Unsupervised
# Without seed labels (random initialization)
communities = label_propagation.propagate(
network.core_network,
max_iter=100
)
Advantages
Very fast: \(O(m)\) linear in edges
Can incorporate prior knowledge
MIT license
Simple and interpretable
Disadvantages
Non-deterministic
Sensitive to initialization
May not converge
Lower quality than Louvain/Infomap
Stochastic Block Model (SBM)
Model-Based Community Detection
Stochastic Block Model (SBM) is a generative model for community structure. Unlike objective-based algorithms (Louvain, Leiden), SBM explicitly models the probability of edges based on community memberships.
Standard SBM:
Degree-Corrected SBM (DC-SBM) — Recommended default:
DC-SBM accounts for degree heterogeneity (nodes with widely varying degrees), making it more realistic for many real-world networks.
Why Use SBM?
Use SBM when you need:
Automatic model selection — Selects number of communities (K) via MDL/BIC
Degree-corrected communities — Handles hubs and low-degree nodes appropriately
Statistical inference — Principled likelihood-based approach
Uncertainty quantification — Natural support for bootstrap and perturbation UQ
Don’t use SBM when:
Network is very large (>10K nodes) — Use Louvain/Leiden instead
You already know the number of communities
Speed is critical — SBM is slower than modularity-based methods
Basic Usage (with AutoCommunity)
Recommended: Use via AutoCommunity runner for automatic budget management:
from py3plex.algorithms.community_detection import AutoCommunity
from py3plex.algorithms.community_detection.budget import BudgetSpec
from py3plex.algorithms.community_detection.runner import run_community_algorithm
from py3plex.core import multinet
# Load network
net = multinet.multi_layer_network(directed=False)
net.load_network("network.csv", input_type="edgelist")
# Run DC-SBM with AutoCommunity
result = (
AutoCommunity()
.candidates("louvain", "dc_sbm")
.metrics("modularity", "sbm_log_likelihood")
.seed(42)
.execute(net)
)
print(f"Selected: {result.selected}")
print(f"Communities: {result.community_stats.n_communities}")
Direct Usage
For advanced users: Call runner directly with explicit budget:
from py3plex.algorithms.community_detection.runner import run_community_algorithm
from py3plex.algorithms.community_detection.budget import BudgetSpec
# Define budget
budget = BudgetSpec(
max_iter=100, # EM iterations
n_restarts=5, # Random initializations
uq_samples=None # No UQ (single run)
)
# Run DC-SBM
result = run_community_algorithm(
algorithm_id="dc_sbm",
network=net,
budget=budget,
seed=42,
K_range=[2, 3, 4, 5, 6, 7, 8] # Model selection range
)
# Access results
partition = result.partition
K_selected = result.meta["K_selected"]
log_likelihood = result.meta["log_likelihood"]
mdl = result.meta["mdl"] # Lower is better
print(f"Selected K={K_selected} with MDL={mdl:.2f}")
Model Selection
SBM automatically selects the number of communities (K) using Minimum Description Length (MDL):
# Specify K range for model selection
result = run_community_algorithm(
algorithm_id="dc_sbm",
network=net,
budget=BudgetSpec(max_iter=100, n_restarts=3),
seed=42,
K_range=[2, 3, 4, 5, 6, 7, 8] # Try these K values
)
# Best K is selected automatically
print(f"Selected K: {result.meta['K_selected']}")
print(f"MDL score: {result.meta['mdl']:.2f}") # Lower = better
Model selection criteria:
ELBO (Evidence Lower Bound) — Maximize (higher is better)
MDL/BIC (Minimum Description Length) — Minimize (lower is better)
ICL (Integrated Classification Likelihood) — Minimize, penalizes fuzzy assignments
Uncertainty Quantification
Enable UQ via uq_samples in budget:
# Run with uncertainty quantification
budget_uq = BudgetSpec(
max_iter=50,
n_restarts=2,
uq_samples=20 # 20 bootstrap samples
)
result = run_community_algorithm(
algorithm_id="dc_sbm",
network=net,
budget=budget_uq,
seed=42,
K_range=[2, 3, 4, 5]
)
# Access uncertainty metrics
ll_mean = result.meta["log_likelihood"]
ll_std = result.meta["log_likelihood_std"]
print(f"Log-likelihood: {ll_mean:.2f} ± {ll_std:.2f}")
Multilayer Networks
DC-SBM supports shared-membership multilayer networks where nodes have one community assignment across all layers:
# Create multilayer network
net = multinet.multi_layer_network(directed=False)
# Add edges in multiple layers
net.add_edges([
{'source': 'A', 'target': 'B', 'source_type': 'social', 'target_type': 'social'},
{'source': 'A', 'target': 'C', 'source_type': 'work', 'target_type': 'work'},
])
# Run DC-SBM (automatically handles multilayer)
result = run_community_algorithm(
algorithm_id="dc_sbm",
network=net,
budget=BudgetSpec(max_iter=100, n_restarts=3),
seed=42,
K_range=[2, 3, 4]
)
# Partition is across all layers
print(result.partition)
Note: All layers must be node-aligned (same set of nodes across layers).
Successive Halving with SBM
SBM works with Successive Halving for efficient model selection:
result = (
AutoCommunity()
.candidates("louvain", "leiden", "dc_sbm")
.metrics("modularity", "sbm_log_likelihood")
.strategy(
"successive_halving",
eta=3,
budget0={"max_iter": 10, "n_restarts": 1},
utility_method="mean_minus_std"
)
.seed(42)
.execute(net)
)
Budget scaling for SBM across Successive Halving rounds:
Round 0:
max_iter=10, n_restarts=1, K_range=[2,3,4]Round 1:
max_iter=30, n_restarts=3, K_range=[2,3,4,5,6]Round 2:
max_iter=90, n_restarts=9, K_range=[2,3,4,5,6,7,8]
Advantages
Automatic K selection via MDL
Degree-corrected (handles heterogeneous degrees)
Principled statistical model
Uncertainty quantification
Multilayer support
Deterministic (given seed)
Disadvantages
Slower than modularity-based methods
Requires node-aligned multilayer networks
More parameters to tune (max_iter, n_restarts)
Not suitable for very large networks (>10K nodes)
Performance Recommendations
For small networks (<100 nodes):
BudgetSpec(max_iter=200, n_restarts=10)
For medium networks (100-1000 nodes):
BudgetSpec(max_iter=100, n_restarts=5)
For large networks (1000-10000 nodes):
BudgetSpec(max_iter=50, n_restarts=2)
For very large networks (>10000 nodes):
Use Louvain/Leiden instead, or use SBM with small K_range and conservative budgets.
SBM Metrics
SBM provides specialized metrics for AutoCommunity:
sbm_log_likelihood— Higher is better (maximize)sbm_mdl— Lower is better (minimize), MDL/BIC criterionsbm_n_blocks— Number of blocks selected
Use these in AutoCommunity for model comparison:
result = (
AutoCommunity()
.candidates("dc_sbm", "louvain")
.metrics("modularity", "sbm_log_likelihood", "sbm_mdl")
.seed(42)
.execute(net)
)
Note: SBM metrics only apply to SBM/DC-SBM algorithms. Other algorithms will have None values for these metrics.
Multilayer Modularity
True Multilayer Detection
Accounts for multilayer structure following Mucha et al. (2010):
from py3plex.algorithms.community_detection import multilayer_modularity as mlm
# Get supra-adjacency matrix
supra_adj = network.get_supra_adjacency_matrix(sparse=True)
# Detect communities with multilayer modularity
communities = mlm.multilayer_louvain(
supra_adj,
gamma=1.0, # Resolution parameter
omega=1.0 # Inter-layer coupling strength
)
Parameter Tuning
# Emphasize layer-specific structure
communities = mlm.multilayer_louvain(
supra_adj,
gamma=1.0,
omega=0.1 # Low coupling = layer-specific communities
)
# Emphasize cross-layer structure
communities = mlm.multilayer_louvain(
supra_adj,
gamma=1.0,
omega=10.0 # High coupling = cross-layer communities
)
Mathematical Formulation
Multilayer modularity is defined as:
Where:
\(A_{ij}^{[\alpha]}\) is the adjacency matrix of layer \(\alpha\)
\(P_{ij}^{[\alpha]}\) is the null model (e.g., configuration model)
\(\gamma_{\alpha}\) is the resolution parameter for layer \(\alpha\)
\(\omega_{\alpha\beta}\) is the coupling strength between layers
\(\delta(g_{i}^{[\alpha]}, g_{j}^{[\beta]})\) is 1 if nodes are in the same community, 0 otherwise
Advantages
Accounts for multilayer structure
Implements state-of-the-art algorithm
Configurable inter-layer coupling
Published in Science (Mucha et al. 2010)
Disadvantages
More computationally expensive
Requires parameter tuning
May not scale to very large networks (>100k nodes)
Evaluating Community Quality
Modularity Score
import networkx as nx
# Compute modularity
modularity = nx.community.modularity(network.core_network, communities)
print(f"Modularity: {modularity:.3f}")
Interpretation:
\(Q > 0.3\): Good community structure
\(Q > 0.5\): Strong community structure
\(Q < 0.3\): Weak or no community structure
Coverage and Performance
# Coverage: fraction of edges within communities
coverage = nx.community.coverage(network.core_network, communities)
# Performance: fraction of correctly classified node pairs
performance = nx.community.performance(network.core_network, communities)
print(f"Coverage: {coverage:.3f}")
print(f"Performance: {performance:.3f}")
Visualizing Communities
Color by Community
from py3plex.visualization.multilayer import hairball_plot
import matplotlib.pyplot as plt
# Map communities to colors
node_colors = [communities.get(node, 0) for node in network.core_network.nodes()]
# Visualize with community colors
hairball_plot(
network.core_network,
node_color=node_colors,
layout_algorithm='force',
cmap='tab10'
)
plt.show()
Community Size Distribution
from collections import Counter
import matplotlib.pyplot as plt
# Count community sizes
community_sizes = Counter(communities.values())
sizes = list(community_sizes.values())
# Plot distribution
plt.hist(sizes, bins=20)
plt.xlabel('Community Size')
plt.ylabel('Frequency')
plt.title('Community Size Distribution')
plt.show()
Understanding Single-Layer vs. Multilayer Community Detection
Before diving into algorithm specifics, it’s important to understand the conceptual differences between approaches.
Single-Layer Community Detection
Traditional community detection finds groups in a single graph. Applied to a multilayer network, you have two options:
Flatten and detect: Aggregate all layers into one graph, then find communities. This loses layer information.
Detect per layer: Find communities independently in each layer. This ignores cross-layer structure.
Neither captures the full multilayer picture.
Multilayer Community Detection
Multilayer algorithms find communities that are consistent across layers while respecting layer-specific structure. They ask: “Which nodes cluster together across multiple contexts?”
Key insight: A node that is moderately connected in many layers may be more “community-central” than a node highly connected in just one layer.
Overlapping vs. Non-Overlapping Communities
Non-overlapping: Each node belongs to exactly one community. Algorithms like Louvain and Leiden produce non-overlapping partitions.
Overlapping: Nodes can belong to multiple communities. Algorithms like NoRC and clique percolation find overlapping structure.
When to use overlapping: When nodes naturally belong to multiple groups (e.g., a person in both a work community and a hobby community).
Flow-Based vs. Modularity-Based Views
Modularity-based (Louvain, Leiden): Optimize a quality function that compares edge density within communities to expected density. Fast, widely used, but has resolution limit issues.
Flow-based (Infomap): Model random walks on the network and find community structure that minimizes description length of those walks. Theoretically grounded, finds hierarchical structure, but slower.
When to use which:
Use modularity-based for speed and when you don’t need hierarchical structure
Use flow-based when you care about information flow or want to find nested communities
Parameter Tuning Cookbook
Tuning Resolution (Gamma)
The resolution parameter γ controls community size:
from py3plex.algorithms.community_detection.multilayer_modularity import (
louvain_multilayer
)
# Experiment with different resolution values
for gamma in [0.5, 1.0, 1.5, 2.0]:
partition = louvain_multilayer(network, gamma=gamma, omega=1.0, random_state=42)
num_comms = len(set(partition.values()))
print(f"gamma={gamma}: {num_comms} communities")
Interpretation guide:
Very few communities (2-5) when you expect more: γ is too low → increase γ
Many singleton communities: γ is too high → decrease γ
One giant community + many tiny ones: Resolution limit problem → try γ > 1 or use Leiden
Recommended starting procedure:
Start with γ=1.0 (standard modularity)
Look at community size distribution
If too coarse, try γ=1.5, 2.0
If too fine, try γ=0.5, 0.25
Tuning Inter-Layer Coupling (Omega)
The coupling parameter ω controls how much layers influence each other:
# Experiment with different coupling values
for omega in [0.1, 0.5, 1.0, 2.0, 5.0]:
partition = louvain_multilayer(network, gamma=1.0, omega=omega, random_state=42)
num_comms = len(set(partition.values()))
# Check cross-layer consistency
# (how often does the same node get same community across layers?)
# ... (compute consistency metric)
print(f"omega={omega}: {num_comms} communities")
Interpretation guide:
ω = 0: Layers are independent (equivalent to detecting per-layer, then combining)
ω = 1: Balanced coupling (default, usually good)
ω > 1: Strong coupling (forces cross-layer consistency)
ω → ∞: All layers must have identical community structure
Domain-specific guidance:
Multiplex social networks: Start with ω=1.0 (people are the same across platforms)
Temporal networks: ω=0.5 to 1.0 (communities can evolve but not too fast)
Heterogeneous networks: ω=0.1 to 0.5 (different node types may have different community structure)
Diagnosing Bad Partitions
Problem: All nodes in one community
if len(set(partition.values())) == 1:
print("All nodes in single community - try increasing gamma")
Causes: Network is too dense, γ too low, or network genuinely has no community structure.
Problem: Each node is its own community
if len(set(partition.values())) == len(partition):
print("All singletons - try decreasing gamma or increasing omega")
Causes: Network is too sparse, γ too high, ω too low.
Problem: Communities don’t match domain expectations
Actions:
Visualize communities and examine specific nodes
Check if high-degree nodes are correctly assigned
Verify that known groups (e.g., departments) are recovered
Consider using ground-truth labels for NMI comparison
Mini Case Studies
Case Study 1: Biological Network Communities
Scenario: A protein-protein interaction network with 3 layers representing different experimental evidence types (yeast two-hybrid, co-immunoprecipitation, affinity purification).
Goal: Find functional modules (groups of proteins with shared biological function).
Approach:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.multilayer_modularity import (
louvain_multilayer
)
# Load network
network = multinet.multi_layer_network().load_network(
"ppi_multilayer.txt", input_type="multiedgelist", directed=False
)
# Use moderate coupling - different evidence types should
# contribute to same modules, but we don't require perfect consistency
partition = louvain_multilayer(
network,
gamma=1.0, # Standard resolution
omega=0.5, # Moderate coupling
random_state=42
)
# Validate: Do communities correspond to GO biological process terms?
# Compare community assignments to known functional annotations
Expected outcome: Communities should correspond to functional modules like “cell cycle,” “DNA repair,” “metabolic pathways.” Proteins appearing in multiple layers with high connectivity should be community hubs.
Case Study 2: Transportation Network Communities
Scenario: A multi-modal transportation network with layers for metro, bus, and bike-share in a city.
Goal: Find “travel basins”—regions where people travel together within a mode and switch between modes at hubs.
Approach:
# Load network
network = multinet.multi_layer_network().load_network(
"transport_network.txt", input_type="multiedgelist", directed=False
)
# Higher coupling - the same station serves multiple modes
partition = louvain_multilayer(
network,
gamma=1.2, # Slightly higher to find smaller regions
omega=1.5, # Strong coupling at multimodal hubs
random_state=42
)
# Validate: Do communities correspond to geographic regions?
# Are major transfer stations correctly identified as community boundaries?
Expected outcome: Communities should correspond to neighborhoods or districts. Multimodal hubs (stations serving metro + bus + bike) should appear at community boundaries or as bridges between communities.
Comparing Algorithms
Run Multiple Algorithms
# Run different algorithms
louvain_comms = community_louvain.best_partition(network.core_network)
label_prop_comms = label_propagation.propagate(network.core_network)
# Compare number of communities
print(f"Louvain: {len(set(louvain_comms.values()))} communities")
print(f"Label Prop: {len(set(label_prop_comms.values()))} communities")
# Compare modularity
louvain_mod = nx.community.modularity(network.core_network,
[set(n for n, c in louvain_comms.items() if c == i)
for i in set(louvain_comms.values())])
label_mod = nx.community.modularity(network.core_network,
[set(n for n, c in label_prop_comms.items() if c == i)
for i in set(label_prop_comms.values())])
print(f"Louvain modularity: {louvain_mod:.3f}")
print(f"Label Prop modularity: {label_mod:.3f}")
Normalized Mutual Information
Compare similarity between community structures:
from sklearn.metrics import normalized_mutual_info_score
# Convert to lists
louvain_list = [louvain_comms[node] for node in network.core_network.nodes()]
label_list = [label_prop_comms[node] for node in network.core_network.nodes()]
# Compute NMI
nmi = normalized_mutual_info_score(louvain_list, label_list)
print(f"NMI between Louvain and Label Prop: {nmi:.3f}")
Interpretation:
NMI = 1.0: Identical community structures
NMI = 0.0: Completely different structures
NMI > 0.5: Similar structures
Best Practices
Algorithm Selection
Network Size |
Speed Priority |
Quality Priority |
Recommendation |
|---|---|---|---|
Small (<1K) |
Any |
Any |
Try all algorithms |
Medium (1K-10K) |
Louvain |
Louvain/Infomap |
Louvain (good balance) |
Large (10K-100K) |
Louvain/Label Prop |
Louvain |
Louvain |
Very Large (>100K) |
Label Prop |
Louvain |
Label Prop or sample |
Parameter Guidelines
Louvain resolution:
Start with
resolution=1.0Increase if communities are too large
Decrease if communities are too fragmented
Multilayer coupling (omega):
omega=1.0- Default, balancedomega<1.0- Emphasize layer-specific structureomega>1.0- Emphasize cross-layer structure
Validation
Always validate community detection results:
Visual inspection — Plot and examine communities
Modularity — Check modularity score (>0.3 is good)
Size distribution — Check for giant communities or singletons
Domain knowledge — Do communities make sense for your application?
Ground truth comparison — If you have labels, compute NMI or Adjusted Rand Index
Common Failure Modes
Trivial partitions: All-in-one or all-singletons → tune γ and ω
Unstable results: Different runs give very different partitions → use
random_stateand run multiple timesOver-fragmentation: Too many small communities → decrease γ or try Leiden
Resolution limit: Can’t find small communities in large networks → increase γ or use hierarchical methods
What You Learned
This chapter covered community detection in multilayer networks:
Algorithms:
Louvain — Fast, O(n log n), BSD license, good for most use cases
Infomap — Flow-based, finds hierarchical structure, AGPLv3 license
Label Propagation — Very fast, linear in edges, supports semi-supervised detection
Multilayer Modularity — True multilayer detection with inter-layer coupling
Parameter tuning:
Resolution γ — Higher = more, smaller communities; lower = fewer, larger
Coupling ω — Higher = cross-layer consistency; lower = layer-specific structure
Start with γ=1.0, ω=1.0 and adjust based on results
Interpretation:
Trivial partitions (all-in-one or all-singletons) indicate parameter tuning needed
High modularity (>0.3) suggests good community structure
Validate with visualization, domain knowledge, and ground truth if available
Conceptual differences:
Single-layer detection treats node-layer pairs independently
Multilayer detection finds communities consistent across layers
Overlapping vs. non-overlapping communities serve different use cases
References
Louvain:
Blondel, V. D., et al. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics.
Infomap:
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. PNAS, 105(4), 1118-1123.
Multilayer Modularity:
Mucha, P. J., et al. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876-878.
See Citation and References for complete citations with DOIs.
What’s Next?
Random Walk Algorithms — Generate embeddings for ML tasks
Visualization — Visualize communities with color-coding
Algorithm Landscape — Overview of all algorithms
Related Examples:
examples/communities/example_community_detection.py— Complete workflowexamples/communities/example_multilayer_louvain.py— Parameter tuning
Distributional Community Detection (New in v1.1)
Uncertainty-aware community detection via distributional partitions.
Traditional community detection returns a single partition. But what if the structure is uncertain? Distributional community detection runs the algorithm multiple times (with different seeds or on perturbed networks) to produce a distribution over partitions.
This enables:
Co-association matrices: P(node_i and node_j in same community)
Consensus partitions: Representative partition (medoid)
Node confidence scores: Per-node stability measures
Boundary node identification: Nodes with uncertain assignments
Basic Usage
from py3plex.core import multinet
from py3plex.algorithms.community_detection import multilayer_louvain_distribution
# Load network
net = multinet.multi_layer_network(directed=False)
net.load_network("data.edgelist", directed=False, input_type="edgelist_hash")
# Run distributional community detection
dist = multilayer_louvain_distribution(
net,
n_runs=100,
resampling='perturbation',
perturbation_params={'edge_drop_p': 0.05},
seed=42,
n_jobs=4
)
# Get consensus partition
consensus = dist.consensus_partition()
# Get per-node confidence scores
confidence = dist.node_confidence()
# Identify stable vs boundary nodes
stable_mask = confidence >= 0.8
stable_nodes = [dist.nodes[i] for i in range(dist.n_nodes) if stable_mask[i]]
print(f"Communities: {len(set(consensus))}")
print(f"Stable core: {len(stable_nodes)}/{dist.n_nodes} nodes")
Resampling Strategies
- Seed-only (
resampling='seed'): Run algorithm multiple times with different random seeds. Fastest, captures algorithm stochasticity.
- Perturbation (
resampling='perturbation'): Randomly drop edges before each run (e.g.,
edge_drop_p=0.05). Assesses structural uncertainty.- Bootstrap (
resampling='bootstrap'): Resample edges with replacement. Statistical bootstrap for formal inference.
Best Practices
Use
coassoc_mode='auto'for large networks (sparse for n_nodes > 2000)Always set seed for reproducibility
Use
resampling='seed'as default (fastest)Use
weight_by='modularity'to weight better partitions more heavily
See examples/communities/example_distributional_community_detection.py for complete example.
Successive Halving for Algorithm Selection
When you don’t know which algorithm will work best, Successive Halving efficiently races multiple algorithms with increasing computational budgets, eliminating poor performers early.
Overview
Successive Halving is a racing strategy that:
Starts with all candidate algorithms
Evaluates on small budget (e.g., max_iter=5, uq_samples=10)
Eliminates worst performers (keeps top 1/η)
Increases budget and repeats with survivors
Returns winner when one algorithm remains
Key benefits: - Efficient: Quickly eliminates poor algorithms - UQ-aware: Uses distribution-based utilities - Deterministic: Fully reproducible with seed - Provenance-rich: Complete racing history
Basic Usage
from py3plex.algorithms.community_detection import AutoCommunity
from py3plex.core import multinet
# Load network
network = multinet.multi_layer_network(directed=False)
network.load_network("network.csv", input_type="edgelist")
# Run Successive Halving
result = (
AutoCommunity()
.candidates("louvain", "leiden", "label_propagation")
.metrics("modularity", "coverage")
.strategy("successive_halving", eta=3, rounds=2)
.seed(42)
.execute(network)
)
# Winner
print(f"Winner: {result.selected}")
print(f"Communities: {result.community_stats.n_communities}")
Configuration
# Custom budget and utility function
result = (
AutoCommunity()
.candidates("louvain", "leiden")
.metrics("modularity", "coverage")
.strategy(
"successive_halving",
eta=3, # Keep top 1/3 each round
rounds=3, # Number of rounds
budget0={ # Initial budget
"max_iter": 10,
"n_restarts": 1,
"uq_samples": 15,
},
budget_growth=3.0, # Budget scales 3x per round
utility_method="mean_minus_std", # Utility function
metric_weights={ # Custom weights
"modularity": 0.6,
"coverage": 0.4,
},
)
.seed(42)
.execute(network)
)
Racing History
Access complete provenance:
# Racing summary
history = result.provenance["racing_history"]
print(f"Rounds: {len(history['rounds'])}")
print(f"Winner: {history['winner_algo_id']}")
print(f"Runtime: {history['total_runtime_ms']:.2f} ms")
# Per-round details
for i, round_rec in enumerate(history["rounds"]):
print(f"Round {i}:")
print(f" Algorithms: {len(round_rec['algorithms'])}")
print(f" Survivors: {round_rec['survivors']}")
print(f" Eliminated: {round_rec['eliminated']}")
# Show utilities
utilities = round_rec["utilities"]
for algo, utility in utilities.items():
print(f" {algo}: {utility:.4f}")
Utility Methods
- mean_minus_std (default):
U = mean(score) - λ * std(score)
Balances expected performance with risk. Higher λ = more conservative.
- expected_regret:
U = -E[max(scores) - score]
Minimizes expected loss relative to best.
- prob_near_best:
U = P(score ≥ max - ε)
Probability of being near-optimal.
# Use expected regret utility
result = (
AutoCommunity()
.candidates("louvain", "leiden")
.metrics("modularity")
.strategy("successive_halving", utility_method="expected_regret")
.seed(42)
.execute(network)
)
With Uncertainty Quantification
Enable UQ for stability-aware selection:
result = (
AutoCommunity()
.candidates("louvain", "leiden")
.metrics("modularity", "coverage", "stability")
.uq(method="seed", n_samples=20) # Enable UQ
.strategy("successive_halving", eta=3)
.seed(42)
.execute(network)
)
# Check stability
stats = result.community_stats
print(f"Stability: {stats.stability_score:.3f}")
When to Use
Use Successive Halving when:
Testing many algorithms (>5 candidates)
Computational budget is limited
Want efficient early elimination
Clear metric preferences (weighted)
Use default Pareto strategy when:
Few candidates (<5)
Multi-objective without clear weights
Want consensus from non-dominated set
Null model calibration is critical
Best Practices
Start small: budget0 with max_iter=5, uq_samples=10
Elimination: eta=3 for aggressive, eta=2 for conservative
UQ: Enable with uq_samples in budget for stability
Seed: Always set for reproducibility
Weights: Encode domain knowledge via metric_weights
History: Inspect racing_history to understand selection
Complete Example
from py3plex.algorithms.community_detection import AutoCommunity
from py3plex.core import multinet
import json
# 1. Load network
net = multinet.multi_layer_network(directed=False)
net.load_network("network.csv", input_type="edgelist")
# 2. Run Successive Halving with UQ
result = (
AutoCommunity()
.candidates("louvain", "leiden", "label_propagation")
.metrics("modularity", "coverage", "stability")
.uq(method="seed", n_samples=20)
.strategy(
"successive_halving",
eta=3,
rounds=2,
budget0={"max_iter": 10, "uq_samples": 15},
utility_method="mean_minus_std",
metric_weights={"modularity": 0.5, "coverage": 0.3, "stability": 0.2},
)
.seed(42)
.execute(net)
)
# 3. Results
print(f"Winner: {result.selected}")
print(f"Communities: {result.community_stats.n_communities}")
print(f"Coverage: {result.community_stats.coverage:.3f}")
# 4. Export
df = result.to_pandas()
df.to_csv("communities.csv", index=False)
# 5. Save provenance
with open("provenance.json", "w") as f:
json.dump(result.provenance, f, indent=2)