Community Detection Tutorial
This tutorial demonstrates community detection algorithms for multilayer networks using py3plex.
Overview
Community detection identifies groups of nodes that are more densely connected to each other than to the rest of the network. For multilayer networks, communities can span multiple layers, accounting for both intra-layer and inter-layer structure.
Supported Algorithms
py3plex provides several community detection algorithms:
Louvain - Fast modularity optimization (recommended for most use cases)
Infomap - Flow-based community detection (requires external binary)
Label Propagation - Semi-supervised approach with known seed communities
Multilayer Modularity - True multilayer community detection (Mucha et al. 2010)
Louvain Algorithm
Basic Usage
Fastest algorithm for large networks:
from py3plex.core import multinet
from py3plex.algorithms.community_detection import community_louvain
# Create or load network
network = multinet.multi_layer_network()
network.load_network("data.graphml", input_type="graphml")
# Detect communities using Louvain
communities = community_louvain.best_partition(network.core_network)
# Print results
for node, community_id in communities.items():
print(f"Node {node} -> Community {community_id}")
Parameters
# With custom resolution parameter
communities = community_louvain.best_partition(
network.core_network,
resolution=1.0 # Higher = more communities, Lower = fewer communities
)
Resolution parameter:
resolution=1.0
- Standard modularityresolution>1.0
- More, smaller communitiesresolution<1.0
- Fewer, larger communities
Advantages
Very fast: \(O(n \\log n)\)
Scales to millions of nodes
BSD license (commercial-friendly)
Well-established and widely used
Disadvantages
Non-deterministic (random initialization)
Cannot find overlapping communities
Resolution limit issues
Infomap Algorithm
Basic Usage
Flow-based approach for detecting communities:
from py3plex.algorithms.community_detection import community_wrapper
# Detect communities using Infomap
communities = community_wrapper.infomap_communities(
network.core_network,
binary_path="/path/to/infomap" # Path to Infomap binary
)
With Hierarchical Structure
# Get hierarchical community structure
hierarchical_communities = community_wrapper.infomap_communities(
network.core_network,
binary_path="/path/to/infomap",
hierarchical=True
)
Advantages
Can detect overlapping communities
Flow-based (natural for many applications)
Hierarchical structure
Information-theoretic foundation
Disadvantages
Requires external binary
AGPLv3 license (viral copyleft - problematic for commercial use)
Slower than Louvain
Label Propagation
Semi-Supervised Detection
Use when you have some known community memberships:
from py3plex.algorithms.community_detection import label_propagation
# Provide seed labels for some nodes
seed_labels = {
'node1': 0,
'node2': 0,
'node3': 1,
'node4': 1
}
# Propagate labels to unlabeled nodes
communities = label_propagation.propagate(
network.core_network,
seed_labels=seed_labels,
max_iter=100
)
Fully Unsupervised
# Without seed labels (random initialization)
communities = label_propagation.propagate(
network.core_network,
max_iter=100
)
Advantages
Very fast: \(O(m)\) linear in edges
Can incorporate prior knowledge
MIT license
Simple and interpretable
Disadvantages
Non-deterministic
Sensitive to initialization
May not converge
Lower quality than Louvain/Infomap
Multilayer Modularity
True Multilayer Detection
Accounts for multilayer structure following Mucha et al. (2010):
from py3plex.algorithms.community_detection import multilayer_modularity as mlm
# Get supra-adjacency matrix
supra_adj = network.get_supra_adjacency_matrix(sparse=True)
# Detect communities with multilayer modularity
communities = mlm.multilayer_louvain(
supra_adj,
gamma=1.0, # Resolution parameter
omega=1.0 # Inter-layer coupling strength
)
Parameter Tuning
# Emphasize layer-specific structure
communities = mlm.multilayer_louvain(
supra_adj,
gamma=1.0,
omega=0.1 # Low coupling = layer-specific communities
)
# Emphasize cross-layer structure
communities = mlm.multilayer_louvain(
supra_adj,
gamma=1.0,
omega=10.0 # High coupling = cross-layer communities
)
Mathematical Formulation
Multilayer modularity is defined as:
Where:
\(A_{ij}^{[\\alpha]}\) is the adjacency matrix of layer \(\\alpha\)
\(P_{ij}^{[\\alpha]}\) is the null model (e.g., configuration model)
\(\\gamma_{\\alpha}\) is the resolution parameter for layer \(\\alpha\)
\(\\omega_{\\alpha\\beta}\) is the coupling strength between layers
\(\\delta(g_{i}^{[\\alpha]}, g_{j}^{[\\beta]})\) is 1 if nodes are in the same community, 0 otherwise
Advantages
Accounts for multilayer structure
Implements state-of-the-art algorithm
Configurable inter-layer coupling
Published in Science (Mucha et al. 2010)
Disadvantages
More computationally expensive
Requires parameter tuning
May not scale to very large networks (>100k nodes)
Evaluating Community Quality
Modularity Score
import networkx as nx
# Compute modularity
modularity = nx.community.modularity(network.core_network, communities)
print(f"Modularity: {modularity:.3f}")
Interpretation:
\(Q > 0.3\): Good community structure
\(Q > 0.5\): Strong community structure
\(Q < 0.3\): Weak or no community structure
Coverage and Performance
# Coverage: fraction of edges within communities
coverage = nx.community.coverage(network.core_network, communities)
# Performance: fraction of correctly classified node pairs
performance = nx.community.performance(network.core_network, communities)
print(f"Coverage: {coverage:.3f}")
print(f"Performance: {performance:.3f}")
Visualizing Communities
Color by Community
from py3plex.visualization.multilayer import hairball_plot
import matplotlib.pyplot as plt
# Map communities to colors
node_colors = [communities.get(node, 0) for node in network.core_network.nodes()]
# Visualize with community colors
hairball_plot(
network.core_network,
node_color=node_colors,
layout_algorithm='force',
cmap='tab10'
)
plt.show()
Community Size Distribution
from collections import Counter
import matplotlib.pyplot as plt
# Count community sizes
community_sizes = Counter(communities.values())
sizes = list(community_sizes.values())
# Plot distribution
plt.hist(sizes, bins=20)
plt.xlabel('Community Size')
plt.ylabel('Frequency')
plt.title('Community Size Distribution')
plt.show()
Comparing Algorithms
Run Multiple Algorithms
# Run different algorithms
louvain_comms = community_louvain.best_partition(network.core_network)
label_prop_comms = label_propagation.propagate(network.core_network)
# Compare number of communities
print(f"Louvain: {len(set(louvain_comms.values()))} communities")
print(f"Label Prop: {len(set(label_prop_comms.values()))} communities")
# Compare modularity
louvain_mod = nx.community.modularity(network.core_network,
[set(n for n, c in louvain_comms.items() if c == i)
for i in set(louvain_comms.values())])
label_mod = nx.community.modularity(network.core_network,
[set(n for n, c in label_prop_comms.items() if c == i)
for i in set(label_prop_comms.values())])
print(f"Louvain modularity: {louvain_mod:.3f}")
print(f"Label Prop modularity: {label_mod:.3f}")
Normalized Mutual Information
Compare similarity between community structures:
from sklearn.metrics import normalized_mutual_info_score
# Convert to lists
louvain_list = [louvain_comms[node] for node in network.core_network.nodes()]
label_list = [label_prop_comms[node] for node in network.core_network.nodes()]
# Compute NMI
nmi = normalized_mutual_info_score(louvain_list, label_list)
print(f"NMI between Louvain and Label Prop: {nmi:.3f}")
Interpretation:
NMI = 1.0: Identical community structures
NMI = 0.0: Completely different structures
NMI > 0.5: Similar structures
Best Practices
Algorithm Selection
Network Size |
Speed Priority |
Quality Priority |
Recommendation |
---|---|---|---|
Small (<1K) |
Any |
Any |
Try all algorithms |
Medium (1K-10K) |
Louvain |
Louvain/Infomap |
Louvain (good balance) |
Large (10K-100K) |
Louvain/Label Prop |
Louvain |
Louvain |
Very Large (>100K) |
Label Prop |
Louvain |
Label Prop or sample |
Parameter Guidelines
Louvain resolution:
Start with
resolution=1.0
Increase if communities are too large
Decrease if communities are too fragmented
Multilayer coupling (omega):
omega=1.0
- Default, balancedomega<1.0
- Emphasize layer-specific structureomega>1.0
- Emphasize cross-layer structure
Validation
Always validate community detection results:
Visual inspection - Plot and examine communities
Modularity - Check modularity score (>0.3 is good)
Size distribution - Check for giant communities or singletons
Domain knowledge - Do communities make sense for your application?
References
Louvain:
Blondel, V. D., et al. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics.
Infomap:
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. PNAS, 105(4), 1118-1123.
Multilayer Modularity:
Mucha, P. J., et al. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876-878.
See Citation and References for complete citations with DOIs.
Next Steps
Multilayer Network Centrality Measures - Centrality measures tutorial
Multilayer Modularity and Community Detection - Multilayer modularity details
Algorithm Selection Guide - Algorithm selection guide
Examples in
examples/example_community_detection.py