Community Detection Tutorial

This tutorial demonstrates community detection algorithms for multilayer networks using py3plex.

Overview

Community detection identifies groups of nodes that are more densely connected to each other than to the rest of the network. For multilayer networks, communities can span multiple layers, accounting for both intra-layer and inter-layer structure.

Supported Algorithms

py3plex provides several community detection algorithms:

  • Louvain - Fast modularity optimization (recommended for most use cases)

  • Infomap - Flow-based community detection (requires external binary)

  • Label Propagation - Semi-supervised approach with known seed communities

  • Multilayer Modularity - True multilayer community detection (Mucha et al. 2010)

Louvain Algorithm

Basic Usage

Fastest algorithm for large networks:

from py3plex.core import multinet
from py3plex.algorithms.community_detection import community_louvain

# Create or load network
network = multinet.multi_layer_network()
network.load_network("data.graphml", input_type="graphml")

# Detect communities using Louvain
communities = community_louvain.best_partition(network.core_network)

# Print results
for node, community_id in communities.items():
    print(f"Node {node} -> Community {community_id}")

Parameters

# With custom resolution parameter
communities = community_louvain.best_partition(
    network.core_network,
    resolution=1.0  # Higher = more communities, Lower = fewer communities
)

Resolution parameter:

  • resolution=1.0 - Standard modularity

  • resolution>1.0 - More, smaller communities

  • resolution<1.0 - Fewer, larger communities

Advantages

  • Very fast: \(O(n \\log n)\)

  • Scales to millions of nodes

  • BSD license (commercial-friendly)

  • Well-established and widely used

Disadvantages

  • Non-deterministic (random initialization)

  • Cannot find overlapping communities

  • Resolution limit issues

Infomap Algorithm

Basic Usage

Flow-based approach for detecting communities:

from py3plex.algorithms.community_detection import community_wrapper

# Detect communities using Infomap
communities = community_wrapper.infomap_communities(
    network.core_network,
    binary_path="/path/to/infomap"  # Path to Infomap binary
)

With Hierarchical Structure

# Get hierarchical community structure
hierarchical_communities = community_wrapper.infomap_communities(
    network.core_network,
    binary_path="/path/to/infomap",
    hierarchical=True
)

Advantages

  • Can detect overlapping communities

  • Flow-based (natural for many applications)

  • Hierarchical structure

  • Information-theoretic foundation

Disadvantages

  • Requires external binary

  • AGPLv3 license (viral copyleft - problematic for commercial use)

  • Slower than Louvain

Label Propagation

Semi-Supervised Detection

Use when you have some known community memberships:

from py3plex.algorithms.community_detection import label_propagation

# Provide seed labels for some nodes
seed_labels = {
    'node1': 0,
    'node2': 0,
    'node3': 1,
    'node4': 1
}

# Propagate labels to unlabeled nodes
communities = label_propagation.propagate(
    network.core_network,
    seed_labels=seed_labels,
    max_iter=100
)

Fully Unsupervised

# Without seed labels (random initialization)
communities = label_propagation.propagate(
    network.core_network,
    max_iter=100
)

Advantages

  • Very fast: \(O(m)\) linear in edges

  • Can incorporate prior knowledge

  • MIT license

  • Simple and interpretable

Disadvantages

  • Non-deterministic

  • Sensitive to initialization

  • May not converge

  • Lower quality than Louvain/Infomap

Multilayer Modularity

True Multilayer Detection

Accounts for multilayer structure following Mucha et al. (2010):

from py3plex.algorithms.community_detection import multilayer_modularity as mlm

# Get supra-adjacency matrix
supra_adj = network.get_supra_adjacency_matrix(sparse=True)

# Detect communities with multilayer modularity
communities = mlm.multilayer_louvain(
    supra_adj,
    gamma=1.0,      # Resolution parameter
    omega=1.0       # Inter-layer coupling strength
)

Parameter Tuning

# Emphasize layer-specific structure
communities = mlm.multilayer_louvain(
    supra_adj,
    gamma=1.0,
    omega=0.1  # Low coupling = layer-specific communities
)

# Emphasize cross-layer structure
communities = mlm.multilayer_louvain(
    supra_adj,
    gamma=1.0,
    omega=10.0  # High coupling = cross-layer communities
)

Mathematical Formulation

Multilayer modularity is defined as:

\[\begin{split}Q^{ML} = \\frac{1}{2\\mu} \\sum_{ij\\alpha\\beta} \\left[ (A_{ij}^{[\\alpha]} - \\gamma_{\\alpha} P_{ij}^{[\\alpha]})\\delta_{\\alpha\\beta} + \\omega_{\\alpha\\beta}\\delta_{ij} \\right] \\delta(g_{i}^{[\\alpha]}, g_{j}^{[\\beta]})\end{split}\]

Where:

  • \(A_{ij}^{[\\alpha]}\) is the adjacency matrix of layer \(\\alpha\)

  • \(P_{ij}^{[\\alpha]}\) is the null model (e.g., configuration model)

  • \(\\gamma_{\\alpha}\) is the resolution parameter for layer \(\\alpha\)

  • \(\\omega_{\\alpha\\beta}\) is the coupling strength between layers

  • \(\\delta(g_{i}^{[\\alpha]}, g_{j}^{[\\beta]})\) is 1 if nodes are in the same community, 0 otherwise

Advantages

  • Accounts for multilayer structure

  • Implements state-of-the-art algorithm

  • Configurable inter-layer coupling

  • Published in Science (Mucha et al. 2010)

Disadvantages

  • More computationally expensive

  • Requires parameter tuning

  • May not scale to very large networks (>100k nodes)

Evaluating Community Quality

Modularity Score

import networkx as nx

# Compute modularity
modularity = nx.community.modularity(network.core_network, communities)
print(f"Modularity: {modularity:.3f}")

Interpretation:

  • \(Q > 0.3\): Good community structure

  • \(Q > 0.5\): Strong community structure

  • \(Q < 0.3\): Weak or no community structure

Coverage and Performance

# Coverage: fraction of edges within communities
coverage = nx.community.coverage(network.core_network, communities)

# Performance: fraction of correctly classified node pairs
performance = nx.community.performance(network.core_network, communities)

print(f"Coverage: {coverage:.3f}")
print(f"Performance: {performance:.3f}")

Visualizing Communities

Color by Community

from py3plex.visualization.multilayer import hairball_plot
import matplotlib.pyplot as plt

# Map communities to colors
node_colors = [communities.get(node, 0) for node in network.core_network.nodes()]

# Visualize with community colors
hairball_plot(
    network.core_network,
    node_color=node_colors,
    layout_algorithm='force',
    cmap='tab10'
)
plt.show()

Community Size Distribution

from collections import Counter
import matplotlib.pyplot as plt

# Count community sizes
community_sizes = Counter(communities.values())
sizes = list(community_sizes.values())

# Plot distribution
plt.hist(sizes, bins=20)
plt.xlabel('Community Size')
plt.ylabel('Frequency')
plt.title('Community Size Distribution')
plt.show()

Comparing Algorithms

Run Multiple Algorithms

# Run different algorithms
louvain_comms = community_louvain.best_partition(network.core_network)
label_prop_comms = label_propagation.propagate(network.core_network)

# Compare number of communities
print(f"Louvain: {len(set(louvain_comms.values()))} communities")
print(f"Label Prop: {len(set(label_prop_comms.values()))} communities")

# Compare modularity
louvain_mod = nx.community.modularity(network.core_network,
                                      [set(n for n, c in louvain_comms.items() if c == i)
                                       for i in set(louvain_comms.values())])
label_mod = nx.community.modularity(network.core_network,
                                    [set(n for n, c in label_prop_comms.items() if c == i)
                                     for i in set(label_prop_comms.values())])

print(f"Louvain modularity: {louvain_mod:.3f}")
print(f"Label Prop modularity: {label_mod:.3f}")

Normalized Mutual Information

Compare similarity between community structures:

from sklearn.metrics import normalized_mutual_info_score

# Convert to lists
louvain_list = [louvain_comms[node] for node in network.core_network.nodes()]
label_list = [label_prop_comms[node] for node in network.core_network.nodes()]

# Compute NMI
nmi = normalized_mutual_info_score(louvain_list, label_list)
print(f"NMI between Louvain and Label Prop: {nmi:.3f}")

Interpretation:

  • NMI = 1.0: Identical community structures

  • NMI = 0.0: Completely different structures

  • NMI > 0.5: Similar structures

Best Practices

Algorithm Selection

Network Size

Speed Priority

Quality Priority

Recommendation

Small (<1K)

Any

Any

Try all algorithms

Medium (1K-10K)

Louvain

Louvain/Infomap

Louvain (good balance)

Large (10K-100K)

Louvain/Label Prop

Louvain

Louvain

Very Large (>100K)

Label Prop

Louvain

Label Prop or sample

Parameter Guidelines

Louvain resolution:

  • Start with resolution=1.0

  • Increase if communities are too large

  • Decrease if communities are too fragmented

Multilayer coupling (omega):

  • omega=1.0 - Default, balanced

  • omega<1.0 - Emphasize layer-specific structure

  • omega>1.0 - Emphasize cross-layer structure

Validation

Always validate community detection results:

  1. Visual inspection - Plot and examine communities

  2. Modularity - Check modularity score (>0.3 is good)

  3. Size distribution - Check for giant communities or singletons

  4. Domain knowledge - Do communities make sense for your application?

References

Louvain:

Blondel, V. D., et al. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics.

Infomap:

Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. PNAS, 105(4), 1118-1123.

Multilayer Modularity:

Mucha, P. J., et al. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876-878.

See Citation and References for complete citations with DOIs.

Next Steps