How to Run Community Detection on Multilayer Networks
Goal: This guide demonstrates how to apply community detection algorithms to multilayer networks and interpret their results. Community detection identifies mesoscale structure—groups of nodes that are more densely connected internally than to the rest of the network. In multilayer networks, communities can exist within single layers, span multiple layers, or emerge from inter-layer coupling patterns. This analysis is essential for understanding functional modules, organizational structure, and hierarchical clustering in complex systems.
📓 Run this guide online
You can run this tutorial in your browser without any local installation:
Or see the full executable example: example_community_detection.py
Prerequisites:
A loaded multilayer network (see How to Load and Build Networks)
Basic familiarity with network terminology (nodes, edges, layers)
Understanding of modularity as a quality metric (covered in this guide)
When to use community detection:
Identifying functional modules in biological networks
Detecting organizational units in social networks
Finding coherent topics in multi-relational knowledge graphs
Analyzing temporal evolution of communities across time-sliced networks
Discovering cross-layer relationships in multiplex systems
Quick Start: Louvain Algorithm
What is Louvain?
The Louvain algorithm (Blondel et al., 2008) is a fast, greedy method that optimizes modularity, defined as:
where \(A_{ij}\) is the adjacency matrix, \(k_i\) is node degree, \(m\) is total edges, and \(\delta(c_i, c_j)=1\) if nodes \(i,j\) are in the same community. Higher \(Q\) indicates stronger community structure.
How it works:
Initialize: each node starts in its own community
For each node, compute \(\Delta Q\) from moving to each neighbor’s community
Move the node to the community with maximum positive \(\Delta Q\)
Aggregate: collapse communities into super-nodes and repeat
Stop when no further improvement is possible
Time complexity: \(O(n \log n)\) for sparse networks
Basic example:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.community_wrapper import louvain_communities
# Load multilayer network
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
# Run Louvain (operates on flattened network by default)
communities = louvain_communities(network)
# Analyze results
from collections import Counter
comm_sizes = Counter(communities.values())
print(f"Number of communities: {len(comm_sizes)}")
print(f"Largest community: {max(comm_sizes.values())} nodes")
print(f"Smallest community: {min(comm_sizes.values())} nodes")
print(f"Average size: {sum(comm_sizes.values())/len(comm_sizes):.1f}")
# Sample assignments
for node, comm_id in list(communities.items())[:5]:
print(f" {node} → Community {comm_id}")
Expected output:
Number of communities: 4
Largest community: 45 nodes
Smallest community: 8 nodes
Average size: 22.8
('A1', 'layer1') → Community 0
('A2', 'layer1') → Community 0
('B1', 'layer1') → Community 1
('B2', 'layer2') → Community 1
('C1', 'layer2') → Community 2
Note: The standard louvain_communities function flattens the multilayer network into a single-layer graph (projecting all nodes across layers into a unified node set). For layer-aware detection, use louvain_multilayer (see next section).
Multilayer-Specific: Multilayer Louvain
What makes multilayer community detection different?
Standard Louvain treats a multilayer network as a single flattened graph, losing layer identity. Multilayer Louvain (Mucha et al., 2010) optimizes the multilayer modularity:
where:
\(A^\alpha_{ij}\): adjacency in layer \(\alpha\)
\(\gamma^\alpha\): resolution parameter for layer \(\alpha\) (default 1.0)
\(\omega_{\alpha\beta}\): inter-layer coupling strength (default 1.0)
\(\delta_{ij}=1\) if \(i=j\) (inter-layer edges connect same node across layers)
\(\delta(g_{i\alpha}, g_{j\beta})=1\) if node \(i\) in layer \(\alpha\) and node \(j\) in layer \(\beta\) are in the same community
\(\mu\): total weight in supra-network
Key insight: The coupling term \(\omega_{\alpha\beta}\) controls whether communities span layers:
ω = 0: Layers are independent → separate communities per layer
ω → ∞: Strong coupling → communities span all layers
0 < ω < ∞: Partial coupling → communities can span some layers
Full workflow example:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.multilayer_modularity import (
louvain_multilayer,
multilayer_modularity
)
from collections import Counter, defaultdict
# Load multilayer network
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
print("Network structure:")
print(f" Layers: {network.get_layers()}")
print(f" Nodes: {len(network.get_nodes())}")
print(f" Edges (total): {network.number_of_edges()}")
# Run multilayer Louvain with different coupling strengths
for omega in [0.0, 0.5, 1.0, 2.0]:
print(f"\n--- Coupling ω={omega} ---")
communities = louvain_multilayer(
network,
gamma=1.0, # Resolution (default)
omega=omega, # Inter-layer coupling
random_state=42 # For reproducibility
)
# Count communities
n_communities = len(set(communities.values()))
# Calculate multilayer modularity
Q = multilayer_modularity(network, communities, gamma=1.0, omega=omega)
# Analyze layer coverage
layer_coverage = defaultdict(set) # community -> set of layers
for (node, layer), comm_id in communities.items():
layer_coverage[comm_id].add(layer)
cross_layer = sum(1 for layers in layer_coverage.values() if len(layers) > 1)
single_layer = len(layer_coverage) - cross_layer
print(f" Communities: {n_communities}")
print(f" Modularity Q: {Q:.4f}")
print(f" Cross-layer communities: {cross_layer}")
print(f" Single-layer communities: {single_layer}")
# Size distribution
comm_sizes = Counter(communities.values())
avg_size = sum(comm_sizes.values()) / len(comm_sizes)
print(f" Average community size: {avg_size:.1f} node-layers")
Expected output:
Network structure:
Layers: ['layer1', 'layer2', 'layer3']
Nodes: 120 (40 nodes × 3 layers)
Edges (total): 284
--- Coupling ω=0.0 ---
Communities: 12
Modularity Q: 0.3456
Cross-layer communities: 0
Single-layer communities: 12
Average community size: 10.0 node-layers
--- Coupling ω=0.5 ---
Communities: 8
Modularity Q: 0.4123
Cross-layer communities: 3
Single-layer communities: 5
Average community size: 15.0 node-layers
--- Coupling ω=1.0 ---
Communities: 5
Modularity Q: 0.4589
Cross-layer communities: 4
Single-layer communities: 1
Average community size: 24.0 node-layers
--- Coupling ω=2.0 ---
Communities: 4
Modularity Q: 0.4234
Cross-layer communities: 4
Single-layer communities: 0
Average community size: 30.0 node-layers
Interpretation:
ω=0.0: Each layer has independent communities (useful for baseline)
ω=0.5-1.0: Balanced trade-off, some communities span layers
ω>1.0: Forces global communities across all layers (may over-integrate)
Choosing ω:
Use domain knowledge: biological function (high ω), temporal snapshots (low ω)
Grid search: try ω ∈ [0.1, 0.5, 1.0, 2.0, 5.0] and pick maximum Q
Consensus clustering: aggregate results across multiple ω values
Infomap Algorithm
What is Infomap?
Infomap (Rosvall & Bergstrom, 2008) uses information theory to find communities by minimizing the map equation:
where:
\(q_\curvearrowright\): probability of switching between modules (inter-module flow)
\(H(Q)\): entropy of module codebook
\(p_{\circlearrowright}^i\): probability of staying within module \(i\) (intra-module flow)
\(H(P^i)\): entropy of nodes within module \(i\)
Key insight: Infomap simulates a random walker and finds communities that compress the description length of the walker’s trajectory. Communities are regions where the walker gets “trapped” for extended periods.
Pros/cons vs. Louvain:
Pros: Often finds better communities for flow-based systems (e.g., citation networks, web graphs)
Cons: Requires external binary (not pure Python), slower than Louvain, harder to interpret parameters
Installation:
Infomap requires the standalone binary from https://www.mapequation.org/infomap/:
# Download and install
wget https://www.mapequation.org/downloads/Infomap.zip
unzip Infomap.zip
cd Infomap
make
sudo cp Infomap /usr/local/bin/infomap
# Or install Python wrapper (alternative)
pip install infomap
Basic usage:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.community_wrapper import infomap_communities
import os
# Load network
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
# Check if binary exists
binary_path = "/usr/local/bin/infomap" # Adjust to your installation
if not os.path.exists(binary_path):
print(f"Infomap binary not found at {binary_path}")
print("Please install from: https://www.mapequation.org/infomap/")
print("Falling back to Louvain...")
# Use Louvain as fallback
from py3plex.algorithms.community_detection.community_wrapper import louvain_communities
communities = louvain_communities(network)
else:
# Run Infomap
communities = infomap_communities(
network,
binary=binary_path,
multiplex=True, # Use multiplex mode for multilayer networks
iterations=1000, # More iterations = better convergence
seed=42, # For reproducibility
verbose=False # Set True to see Infomap output
)
# Analyze results
from collections import Counter
comm_sizes = Counter(communities.values())
print(f"Number of communities: {len(comm_sizes)}")
print(f"Largest community: {max(comm_sizes.values())} nodes")
print(f"Average size: {sum(comm_sizes.values())/len(comm_sizes):.1f}")
Expected output:
Number of communities: 6
Largest community: 38 nodes
Average size: 20.0
Multiplex mode:
When multiplex=True, Infomap treats layers as separate networks but allows random walkers to switch layers (implicitly modeling inter-layer coupling). This is different from Louvain’s explicit \(\omega\) parameter.
Comparison workflow:
from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score
# Run both algorithms
louvain_comms = louvain_communities(network)
infomap_comms = infomap_communities(network, binary=binary_path, seed=42)
# Convert to aligned label vectors
nodes = list(louvain_comms.keys())
louvain_labels = [louvain_comms[n] for n in nodes]
infomap_labels = [infomap_comms[n] for n in nodes]
# Compute similarity
ari = adjusted_rand_score(louvain_labels, infomap_labels)
nmi = normalized_mutual_info_score(louvain_labels, infomap_labels)
print(f"Agreement between Louvain and Infomap:")
print(f" ARI: {ari:.3f} (1.0 = perfect agreement)")
print(f" NMI: {nmi:.3f} (1.0 = perfect agreement)")
Expected output:
Agreement between Louvain and Infomap:
ARI: 0.723 (1.0 = perfect agreement)
NMI: 0.815 (1.0 = perfect agreement)
When to use Infomap:
Citation/web networks with clear flow patterns
Networks where you care about information diffusion
When Louvain gives unsatisfying results (try both and compare)
When you have the binary installed (otherwise, stick with Louvain)
Label Propagation
What is Label Propagation?
Label propagation (Raghavan et al., 2007) is an extremely fast, near-linear time algorithm that works by iteratively assigning each node to the most common community among its neighbors.
Algorithm:
Initialize: each node gets a unique label (community ID)
For t=1 to T iterations:
Randomize node order
For each node \(i\):
Count neighbor labels: \(n_c = |\{j \in N(i) : c_j = c\}|\)
Assign \(c_i = \arg\max_c n_c\) (ties broken randomly)
Stop when labels stabilize or max iterations reached
Time complexity: \(O(m)\) per iteration (linear in edges)
Pros/cons:
Pros: Very fast, scales to millions of nodes, no parameters to tune
Cons: Non-deterministic (order-dependent), lower quality than Louvain/Infomap, may not converge
Implementation note:
py3plex uses NetworkX’s label propagation for single-layer networks:
from py3plex.core import multinet
import networkx as nx
from networkx.algorithms.community import asyn_lpa_communities
from collections import defaultdict
# Load network
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
# Convert to NetworkX (flattened single-layer graph)
G = nx.Graph()
for edge in network.core_network.edges():
G.add_edge(edge[0], edge[1])
# Run label propagation
communities_list = asyn_lpa_communities(G, seed=42)
# Convert to dict format: node -> community_id
communities = {}
for comm_id, comm_nodes in enumerate(communities_list):
for node in comm_nodes:
communities[node] = comm_id
# Analyze results
from collections import Counter
comm_sizes = Counter(communities.values())
print(f"Number of communities: {len(comm_sizes)}")
print(f"Largest community: {max(comm_sizes.values())} nodes")
print(f"Average size: {sum(comm_sizes.values())/len(comm_sizes):.1f}")
# Run multiple times to check stability
print("\nStability check (5 runs with same seed):")
for run in range(5):
comms_run = list(asyn_lpa_communities(G, seed=42))
n_comms = len(comms_run)
print(f" Run {run+1}: {n_comms} communities")
Expected output:
Number of communities: 7
Largest community: 34 nodes
Average size: 17.1
Stability check (5 runs with same seed):
Run 1: 7 communities
Run 2: 7 communities
Run 3: 7 communities
Run 4: 8 communities
Run 5: 7 communities
Layer-aware label propagation (custom implementation):
For multilayer networks, you can implement layer-aware label propagation:
import random
from collections import Counter
def multilayer_label_propagation(network, max_iter=100, seed=42):
"""
Layer-aware label propagation for multilayer networks.
Propagates labels within each layer independently.
"""
random.seed(seed)
# Initialize: each node-layer gets unique label
labels = {nl: i for i, nl in enumerate(network.get_nodes())}
# Get layer-specific edges
layer_edges = {}
for layer in network.get_layers():
layer_edges[layer] = [
(e[0], e[1]) for e in network.core_network.edges()
if e[0][1] == layer and e[1][1] == layer
]
# Iterate
for iteration in range(max_iter):
changed = False
nodes = list(labels.keys())
random.shuffle(nodes)
for node, layer in nodes:
# Get neighbors in same layer
neighbors = [
target for source, target in layer_edges.get(layer, [])
if source == (node, layer)
] + [
source for source, target in layer_edges.get(layer, [])
if target == (node, layer)
]
if not neighbors:
continue
# Count neighbor labels
neighbor_labels = [labels[n] for n in neighbors]
label_counts = Counter(neighbor_labels)
# Assign most common label (ties broken randomly)
most_common = label_counts.most_common()
max_count = most_common[0][1]
candidates = [lbl for lbl, cnt in most_common if cnt == max_count]
new_label = random.choice(candidates)
if new_label != labels[(node, layer)]:
labels[(node, layer)] = new_label
changed = True
if not changed:
print(f"Converged after {iteration+1} iterations")
break
# Renumber communities
unique_labels = sorted(set(labels.values()))
label_map = {old: new for new, old in enumerate(unique_labels)}
return {nl: label_map[lbl] for nl, lbl in labels.items()}
# Run custom implementation
communities = multilayer_label_propagation(network, max_iter=100, seed=42)
comm_sizes = Counter(communities.values())
print(f"\nLayer-aware label propagation:")
print(f" Communities: {len(comm_sizes)}")
print(f" Average size: {sum(comm_sizes.values())/len(comm_sizes):.1f}")
Expected output:
Converged after 23 iterations
Layer-aware label propagation:
Communities: 9
Average size: 13.3
When to use label propagation:
Very large networks (>100k nodes) where Louvain is too slow
Exploratory analysis where you need quick initial results
Streaming settings where you process edges incrementally
Not recommended for publication-quality results (use Louvain or Infomap instead)
Analyzing Community Structure
After detecting communities, you need to analyze and interpret the results. This section shows robust workflows for understanding community properties.
Count Nodes Per Community
Basic counting:
from collections import Counter
import numpy as np
# Assuming 'communities' is a dict: node -> community_id
comm_sizes = Counter(communities.values())
print(f"Total communities: {len(comm_sizes)}")
print(f"\nTop 10 largest communities:")
for comm_id, size in comm_sizes.most_common(10):
print(f" Community {comm_id}: {size} nodes")
# Size statistics
sizes = np.array(list(comm_sizes.values()))
print(f"\nSize distribution:")
print(f" Mean: {np.mean(sizes):.2f}")
print(f" Median: {np.median(sizes):.2f}")
print(f" Std dev: {np.std(sizes):.2f}")
print(f" Min: {np.min(sizes)}")
print(f" Max: {np.max(sizes)}")
print(f" Q1/Q3: {np.percentile(sizes, 25):.0f} / {np.percentile(sizes, 75):.0f}")
Expected output:
Total communities: 5
Top 10 largest communities:
Community 0: 45 nodes
Community 1: 38 nodes
Community 2: 22 nodes
Community 3: 10 nodes
Community 4: 5 nodes
Size distribution:
Mean: 24.00
Median: 22.00
Std dev: 15.87
Min: 5
Max: 45
Q1/Q3: 10 / 38
Layer coverage analysis (for multilayer networks):
from collections import defaultdict
# communities: {(node, layer): comm_id}
layer_coverage = defaultdict(lambda: defaultdict(set)) # comm -> layer -> nodes
for (node, layer), comm_id in communities.items():
layer_coverage[comm_id][layer].add(node)
print("Community layer coverage:")
for comm_id in sorted(layer_coverage.keys()):
layers = layer_coverage[comm_id]
total_size = sum(len(nodes) for nodes in layers.values())
print(f"\nCommunity {comm_id} (total: {total_size} node-layers):")
for layer, nodes in sorted(layers.items()):
print(f" {layer}: {len(nodes)} nodes")
# Cross-layer nodes (nodes appearing in multiple layers within same community)
all_nodes = set()
for nodes in layers.values():
all_nodes.update(nodes)
unique_nodes = len(all_nodes)
redundancy = total_size / unique_nodes if unique_nodes > 0 else 0
print(f" Unique nodes: {unique_nodes}, Redundancy: {redundancy:.2f}x")
Expected output:
Community layer coverage:
Community 0 (total: 45 node-layers):
layer1: 18 nodes
layer2: 15 nodes
layer3: 12 nodes
Unique nodes: 15, Redundancy: 3.00x
Community 1 (total: 38 node-layers):
layer1: 20 nodes
layer2: 18 nodes
Unique nodes: 20, Redundancy: 1.90x
Community 2 (total: 22 node-layers):
layer3: 22 nodes
Unique nodes: 22, Redundancy: 1.00x
Visualize Communities
Hairball plot with community colors:
from py3plex.visualization.multilayer import hairball_plot
import matplotlib.pyplot as plt
from py3plex.visualization.colors import colors_default
# Select top N communities to color
top_n = 8
top_communities = [c for c, _ in comm_sizes.most_common(top_n)]
# Create color mapping
color_map = dict(zip(
top_communities,
colors_default[:top_n]
))
# Assign colors to nodes
node_colors = []
for node in network.get_nodes():
comm_id = communities.get(node, -1)
if comm_id in color_map:
node_colors.append(color_map[comm_id])
else:
node_colors.append('lightgray') # Small communities
# Plot
plt.figure(figsize=(12, 10))
hairball_plot(
network.core_network,
color_list=node_colors,
layout_algorithm='force',
layout_parameters={'iterations': 500},
scale_by_size=True,
legend=False
)
plt.title('Community Structure (Top 8 Communities Colored)', fontsize=16)
plt.tight_layout()
plt.savefig('community_hairball.png', dpi=300, bbox_inches='tight')
plt.show()
print("Visualization saved to: community_hairball.png")
Size distribution histogram:
import matplotlib.pyplot as plt
import numpy as np
sizes = list(comm_sizes.values())
plt.figure(figsize=(10, 6))
plt.hist(sizes, bins=20, edgecolor='black', alpha=0.7)
plt.xlabel('Community Size (number of nodes)', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.title(f'Community Size Distribution (n={len(sizes)} communities)', fontsize=14)
plt.axvline(np.mean(sizes), color='red', linestyle='--', label=f'Mean: {np.mean(sizes):.1f}')
plt.axvline(np.median(sizes), color='blue', linestyle='--', label=f'Median: {np.median(sizes):.1f}')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('community_size_distribution.png', dpi=300)
plt.show()
Layer-specific visualization:
For multilayer networks, visualize community composition across layers:
import pandas as pd
import seaborn as sns
# Build matrix: communities × layers
layers = network.get_layers()
comm_ids = sorted(set(communities.values()))
matrix = np.zeros((len(comm_ids), len(layers)))
for (node, layer), comm_id in communities.items():
layer_idx = layers.index(layer)
comm_idx = comm_ids.index(comm_id)
matrix[comm_idx, layer_idx] += 1
# Heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(
matrix,
xticklabels=layers,
yticklabels=[f'C{i}' for i in comm_ids],
cmap='YlOrRd',
annot=True,
fmt='.0f',
cbar_kws={'label': 'Number of nodes'}
)
plt.xlabel('Layer', fontsize=12)
plt.ylabel('Community', fontsize=12)
plt.title('Community × Layer Composition Heatmap', fontsize=14)
plt.tight_layout()
plt.savefig('community_layer_heatmap.png', dpi=300)
plt.show()
Export Communities
CSV export (most common):
import pandas as pd
# Convert to DataFrame
data = []
for (node, layer), comm_id in communities.items():
data.append({
'node': node,
'layer': layer,
'community': comm_id
})
df = pd.DataFrame(data)
# Add community size
size_map = dict(comm_sizes)
df['community_size'] = df['community'].map(size_map)
# Sort by community, then layer, then node
df = df.sort_values(['community', 'layer', 'node'])
# Save
df.to_csv('communities.csv', index=False)
print(f"Exported {len(df)} node-layer assignments to communities.csv")
print(f"\nFirst few rows:")
print(df.head(10))
Expected output:
Exported 120 node-layer assignments to communities.csv
First few rows:
node layer community community_size
0 A1 layer1 0 45
1 A1 layer2 0 45
2 A1 layer3 0 45
3 A2 layer1 0 45
4 A2 layer2 0 45
5 B1 layer1 1 38
6 B1 layer2 1 38
7 B2 layer1 1 38
8 C1 layer3 2 22
9 C2 layer3 2 22
JSON export (for web apps):
import json
# Group by community
community_dict = defaultdict(list)
for (node, layer), comm_id in communities.items():
community_dict[str(comm_id)].append({
'node': node,
'layer': layer
})
# Add metadata
output = {
'num_communities': len(community_dict),
'num_nodes': len(set(node for node, _ in communities.keys())),
'num_layers': len(network.get_layers()),
'communities': dict(community_dict)
}
with open('communities.json', 'w') as f:
json.dump(output, f, indent=2)
print("Exported to communities.json")
Cytoscape format (for visualization):
# Node table
node_df = pd.DataFrame([
{
'node_id': f"{node}_{layer}",
'node': node,
'layer': layer,
'community': communities.get((node, layer), -1)
}
for node, layer in network.get_nodes()
])
node_df.to_csv('cytoscape_nodes.csv', index=False)
# Edge table
edge_data = []
for source, target in network.core_network.edges():
edge_data.append({
'source': f"{source[0]}_{source[1]}",
'target': f"{target[0]}_{target[1]}",
'source_community': communities.get(source, -1),
'target_community': communities.get(target, -1),
'is_intra_community': communities.get(source, -1) == communities.get(target, -1)
})
edge_df = pd.DataFrame(edge_data)
edge_df.to_csv('cytoscape_edges.csv', index=False)
print("Exported to cytoscape_nodes.csv and cytoscape_edges.csv")
print("Import these into Cytoscape for interactive visualization")
Query Communities with DSL
Goal: Use py3plex’s Domain-Specific Language (DSL) to query and analyze community-detected networks efficiently.
The DSL provides a declarative, SQL-like interface for querying multilayer networks. After detecting communities, you can use DSL queries to filter nodes by community membership, compute community-level statistics, and extract subnetworks.
Prerequisites:
Community detection results (e.g., from
louvain_communities())Familiarity with DSL basics (see How to Query Multilayer Graphs with the SQL-like DSL for full tutorial)
DSL Basics for Communities
String Syntax - SQL-like queries:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.community_wrapper import louvain_communities
from py3plex.dsl import execute_query
# Load network and detect communities
network = multinet.multi_layer_network(directed=False)
network.load_network(
"py3plex/datasets/_data/synthetic_multilayer.edges",
input_type="multiedgelist"
)
communities = louvain_communities(network)
# Attach community labels as node attributes
for (node, layer), comm_id in communities.items():
network.core_network.nodes[(node, layer)]['community'] = comm_id
# DSL Query: Find nodes in community 0
result = execute_query(
network,
'SELECT nodes WHERE community=0'
)
print(f"Nodes in community 0: {len(result)}")
for node in list(result)[:5]:
print(f" {node}")
Expected output:
Nodes in community 0: 18
('node1', 'layer1')
('node1', 'layer2')
('node2', 'layer1')
('node3', 'layer1')
('node3', 'layer3')
Builder API - Chainable operations:
from py3plex.dsl import Q, L
# Find high-degree nodes in a specific community
result = (
Q.nodes()
.where(community=0)
.compute("degree")
.where(degree__gt=5)
.order_by("degree", reverse=True)
.execute(network)
)
# Convert to pandas for analysis
import pandas as pd
df = pd.DataFrame([
{
'node': node[0],
'layer': node[1],
'degree': data['degree'],
'community': data.get('community', -1)
}
for node, data in result.items()
])
print("High-degree nodes in community 0:")
print(df.head(10))
Expected output:
High-degree nodes in community 0:
node layer degree community
0 node1 layer1 12 0
1 node1 layer2 10 0
2 node2 layer1 9 0
3 node5 layer1 8 0
4 node5 layer3 7 0
Community-Level Queries
Count nodes per community:
# Get all communities
community_ids = set(communities.values())
for comm_id in sorted(community_ids):
result = execute_query(
network,
f'SELECT nodes WHERE community={comm_id}'
)
print(f"Community {comm_id}: {len(result)} nodes")
Find inter-community edges:
from py3plex.dsl import Q
# Attach community labels to edges based on endpoint communities
for edge in network.core_network.edges():
source, target = edge
source_comm = communities.get(source, -1)
target_comm = communities.get(target, -1)
network.core_network.edges[edge]['source_community'] = source_comm
network.core_network.edges[edge]['target_community'] = target_comm
network.core_network.edges[edge]['is_intra_community'] = (source_comm == target_comm)
# Query inter-community edges
inter_comm_edges = (
Q.edges()
.where(is_intra_community=False)
.execute(network)
)
intra_comm_edges = (
Q.edges()
.where(is_intra_community=True)
.execute(network)
)
print(f"Intra-community edges: {len(intra_comm_edges)}")
print(f"Inter-community edges: {len(inter_comm_edges)}")
print(f"Ratio: {len(inter_comm_edges)/len(intra_comm_edges):.3f}")
Expected output:
Intra-community edges: 245
Inter-community edges: 39
Ratio: 0.159
Layer-Specific Community Queries
Find nodes in a specific community and layer:
from py3plex.dsl import Q, L
# Community 0 nodes in layer1 only
result = (
Q.nodes()
.from_layers(L["layer1"])
.where(community=0)
.compute("degree")
.execute(network)
)
print(f"Community 0 in layer1: {len(result)} nodes")
print(f"Average degree: {sum(d['degree'] for d in result.values())/len(result):.2f}")
Compare community structure across layers:
layers = network.get_layers()
for layer in layers:
# Count communities present in this layer
layer_nodes = (
Q.nodes()
.from_layers(L[layer])
.execute(network)
)
layer_communities = set(
communities.get(node, -1)
for node in layer_nodes
)
print(f"{layer}: {len(layer_communities)} communities, {len(layer_nodes)} nodes")
Expected output:
layer1: 5 communities, 40 nodes
layer2: 4 communities, 40 nodes
layer3: 3 communities, 40 nodes
Extract Community Subnetworks
Extract a single community as a subnetwork:
from py3plex.dsl import Q
# Extract community 0
comm_0_nodes = execute_query(
network,
'SELECT nodes WHERE community=0'
)
# Get induced subgraph
subgraph = network.core_network.subgraph(comm_0_nodes)
# Convert to new multilayer network
community_network = multinet.multi_layer_network(directed=False)
community_network.core_network = subgraph.copy()
print(f"Community 0 subnetwork:")
print(f" Nodes: {community_network.number_of_nodes()}")
print(f" Edges: {community_network.number_of_edges()}")
print(f" Layers: {community_network.get_layers()}")
Expected output:
Community 0 subnetwork:
Nodes: 18
Edges: 67
Layers: ['layer1', 'layer2', 'layer3']
Compute Community-Level Statistics
Average centrality per community:
from py3plex.dsl import Q
from collections import defaultdict
# Compute centrality for all nodes
result = (
Q.nodes()
.compute("betweenness_centrality", "degree")
.execute(network)
)
# Group by community
comm_stats = defaultdict(list)
for node, data in result.items():
comm_id = data.get('community', -1)
comm_stats[comm_id].append({
'degree': data['degree'],
'betweenness': data['betweenness_centrality']
})
# Calculate averages
print("Community-level statistics:")
print(f"{'Community':<12} {'Nodes':<8} {'Avg Degree':<12} {'Avg Betweenness':<18}")
print("-" * 50)
for comm_id in sorted(comm_stats.keys()):
stats = comm_stats[comm_id]
n_nodes = len(stats)
avg_degree = sum(s['degree'] for s in stats) / n_nodes
avg_betw = sum(s['betweenness'] for s in stats) / n_nodes
print(f"{comm_id:<12} {n_nodes:<8} {avg_degree:<12.2f} {avg_betw:<18.6f}")
Expected output:
Community-level statistics:
Community Nodes Avg Degree Avg Betweenness
--------------------------------------------------
0 18 7.44 0.012345
1 15 6.13 0.008234
2 12 5.25 0.005678
3 8 4.50 0.003456
4 7 3.86 0.002123
Complex DSL Workflows
Multi-step analysis: Find bridge nodes between communities:
from py3plex.dsl import Q
# Bridge nodes: high betweenness + connect multiple communities
# First, compute betweenness
result = (
Q.nodes()
.compute("betweenness_centrality", "degree")
.execute(network)
)
# Identify potential bridges (high betweenness)
bridges = [
(node, data['betweenness_centrality'])
for node, data in result.items()
if data['betweenness_centrality'] > 0.01 # Threshold
]
print(f"Potential bridge nodes (betweenness > 0.01): {len(bridges)}")
# For each bridge, check which communities its neighbors belong to
for node, betw in sorted(bridges, key=lambda x: x[1], reverse=True)[:5]:
# Get neighbors
neighbors = list(network.core_network.neighbors(node))
neighbor_comms = set(communities.get(n, -1) for n in neighbors)
print(f" {node}: betweenness={betw:.6f}, connects {len(neighbor_comms)} communities")
Expected output:
Potential bridge nodes (betweenness > 0.01): 12
('node7', 'layer1'): betweenness=0.045678, connects 3 communities
('node12', 'layer2'): betweenness=0.034567, connects 2 communities
('node3', 'layer1'): betweenness=0.023456, connects 3 communities
('node15', 'layer3'): betweenness=0.019876, connects 2 communities
('node8', 'layer2'): betweenness=0.015432, connects 2 communities
Temporal community analysis (for time-sliced networks):
from py3plex.dsl import Q, L
# Assuming layers represent time slices: t1, t2, t3
time_layers = ['t1', 't2', 't3']
# Track specific nodes across time
tracked_nodes = ['Alice', 'Bob', 'Carol']
print("Community membership over time:")
for node in tracked_nodes:
print(f"\n{node}:")
for t_layer in time_layers:
node_key = (node, t_layer)
comm_id = communities.get(node_key, None)
if comm_id is not None:
print(f" {t_layer}: Community {comm_id}")
else:
print(f" {t_layer}: Not present")
Why use DSL for community analysis?
Declarative: Express what you want, not how to compute it
Composable: Chain operations to build complex queries
Efficient: DSL optimizes query execution internally
Readable: SQL-like syntax is self-documenting
Interoperable: Results integrate seamlessly with pandas, NumPy, and visualization tools
Next steps with DSL:
Full DSL tutorial: How to Query Multilayer Graphs with the SQL-like DSL - Comprehensive guide with advanced patterns
Builder API reference: ../reference/dsl_api - Complete API documentation
Temporal queries: How to Query Multilayer Graphs with the SQL-like DSL (Temporal Queries section) - Time-varying networks
Compare Algorithms
Different algorithms optimize different objective functions and may produce different community structures. Comparing multiple algorithms helps validate findings and understand algorithm-specific biases.
Metrics for comparing partitions:
Adjusted Rand Index (ARI): Measures similarity adjusted for chance
Range: [-1, 1], where 1 = perfect agreement, 0 = random
Adjusted for cluster size imbalance
Normalized Mutual Information (NMI): Information-theoretic similarity
Range: [0, 1], where 1 = perfect agreement
Symmetric, handles different number of communities well
Variation of Information (VI): Distance metric (lower = more similar)
Range: [0, ∞], where 0 = identical partitions
Full comparison workflow:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.community_wrapper import (
louvain_communities,
infomap_communities
)
from py3plex.algorithms.community_detection.multilayer_modularity import (
louvain_multilayer,
multilayer_modularity
)
from py3plex.algorithms.community_detection.community_louvain import modularity
import networkx as nx
from networkx.algorithms.community import asyn_lpa_communities
from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score
from scipy.spatial.distance import jensenshannon
import numpy as np
from collections import Counter
# Load network
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
print("=" * 70)
print("COMMUNITY DETECTION ALGORITHM COMPARISON")
print("=" * 70)
# Run multiple algorithms
print("\n1. Running algorithms...")
# Louvain (flattened)
louvain_comms = louvain_communities(network)
# Multilayer Louvain (ω=1.0)
multilayer_comms = louvain_multilayer(
network, gamma=1.0, omega=1.0, random_state=42
)
# Label propagation (flattened NetworkX graph)
G = nx.Graph()
for edge in network.core_network.edges():
G.add_edge(edge[0], edge[1])
lpa_comms_list = asyn_lpa_communities(G, seed=42)
lpa_comms = {}
for comm_id, nodes in enumerate(lpa_comms_list):
for node in nodes:
lpa_comms[node] = comm_id
# (Optional) Infomap - skip if binary not available
try:
infomap_comms = infomap_communities(
network, binary="/usr/local/bin/infomap",
seed=42, verbose=False
)
has_infomap = True
except Exception:
has_infomap = False
print(" [SKIP] Infomap not available")
# Store results
algorithms = {
'Louvain (flat)': louvain_comms,
'Louvain (multilayer)': multilayer_comms,
'Label Propagation': lpa_comms,
}
if has_infomap:
algorithms['Infomap'] = infomap_comms
# 2. Basic statistics
print("\n2. Basic statistics:")
print(f"{'Algorithm':<25} {'#Comm':<10} {'Largest':<10} {'Avg Size':<10}")
print("-" * 70)
for name, comms in algorithms.items():
sizes = Counter(comms.values())
n_comms = len(sizes)
largest = max(sizes.values())
avg_size = sum(sizes.values()) / n_comms
print(f"{name:<25} {n_comms:<10} {largest:<10} {avg_size:<10.1f}")
# 3. Modularity scores
print("\n3. Modularity scores:")
print(f"{'Algorithm':<25} {'Modularity (Q)':<15}")
print("-" * 70)
for name, comms in algorithms.items():
if name == 'Louvain (multilayer)':
# Use multilayer modularity
Q = multilayer_modularity(network, comms, gamma=1.0, omega=1.0)
else:
# Use single-layer modularity on flattened graph
Q = modularity(comms, G, weight='weight')
print(f"{name:<25} {Q:<15.4f}")
# 4. Pairwise agreement
print("\n4. Pairwise agreement (ARI / NMI):")
# Align all partitions to same node set
alg_names = list(algorithms.keys())
alg_labels = {}
# Get common nodes (for multilayer, use node-layer pairs)
all_nodes = set()
for comms in algorithms.values():
all_nodes.update(comms.keys())
common_nodes = sorted(all_nodes)
# Convert to label vectors
for name, comms in algorithms.items():
alg_labels[name] = [comms.get(node, -1) for node in common_nodes]
# Compute pairwise metrics
print(f"\n{'Pair':<45} {'ARI':<10} {'NMI':<10}")
print("-" * 70)
for i in range(len(alg_names)):
for j in range(i+1, len(alg_names)):
name1, name2 = alg_names[i], alg_names[j]
labels1 = alg_labels[name1]
labels2 = alg_labels[name2]
ari = adjusted_rand_score(labels1, labels2)
nmi = normalized_mutual_info_score(labels1, labels2)
print(f"{name1} vs {name2:<25} {ari:<10.3f} {nmi:<10.3f}")
# 5. Size distribution comparison
print("\n5. Size distribution similarity:")
# Normalize size distributions
def normalize_sizes(comms):
sizes = list(Counter(comms.values()).values())
sizes_array = np.array(sorted(sizes, reverse=True))
# Pad to same length
max_len = max(len(Counter(c.values())) for c in algorithms.values())
padded = np.zeros(max_len)
padded[:len(sizes_array)] = sizes_array
return padded / padded.sum()
size_dists = {name: normalize_sizes(comms) for name, comms in algorithms.items()}
print(f"{'Pair':<45} {'JS Divergence':<15}")
print("-" * 70)
for i in range(len(alg_names)):
for j in range(i+1, len(alg_names)):
name1, name2 = alg_names[i], alg_names[j]
js_div = jensenshannon(size_dists[name1], size_dists[name2])
print(f"{name1} vs {name2:<25} {js_div:<15.4f}")
Expected output:
======================================================================
COMMUNITY DETECTION ALGORITHM COMPARISON
======================================================================
1. Running algorithms...
[SKIP] Infomap not available
2. Basic statistics:
Algorithm #Comm Largest Avg Size
----------------------------------------------------------------------
Louvain (flat) 5 45 24.0
Louvain (multilayer) 4 52 30.0
Label Propagation 7 38 17.1
3. Modularity scores:
Algorithm Modularity (Q)
----------------------------------------------------------------------
Louvain (flat) 0.4234
Louvain (multilayer) 0.4589
Label Propagation 0.3891
4. Pairwise agreement (ARI / NMI):
Pair ARI NMI
----------------------------------------------------------------------
Louvain (flat) vs Louvain (multilayer) 0.812 0.878
Louvain (flat) vs Label Propagation 0.623 0.745
Louvain (multilayer) vs Label Propagation 0.589 0.712
5. Size distribution similarity:
Pair JS Divergence
----------------------------------------------------------------------
Louvain (flat) vs Louvain (multilayer) 0.1234
Louvain (flat) vs Label Propagation 0.2456
Louvain (multilayer) vs Label Propagation 0.2789
Interpretation:
High ARI/NMI (>0.8): Algorithms agree strongly → robust communities
Medium ARI/NMI (0.5-0.8): Partial agreement → sensitive to algorithm choice
Low ARI/NMI (<0.5): Strong disagreement → no clear community structure or algorithm-specific artifacts
Consensus clustering:
When algorithms disagree, use consensus clustering to find stable communities:
from collections import defaultdict
# Build co-occurrence matrix: how often do pairs of nodes appear together?
co_occurrence = defaultdict(int)
n_algorithms = len(algorithms)
for comms in algorithms.values():
# For each community in this partition
comm_groups = defaultdict(list)
for node, comm_id in comms.items():
comm_groups[comm_id].append(node)
# Increment co-occurrence for all pairs in same community
for nodes in comm_groups.values():
for i, node1 in enumerate(nodes):
for node2 in nodes[i+1:]:
pair = tuple(sorted([node1, node2]))
co_occurrence[pair] += 1
# Threshold: keep pairs that co-occur in ≥50% of algorithms
threshold = n_algorithms * 0.5
stable_pairs = {pair for pair, count in co_occurrence.items() if count >= threshold}
print(f"\nConsensus clustering:")
print(f" Total node pairs: {len(co_occurrence)}")
print(f" Stable pairs (≥50% agreement): {len(stable_pairs)}")
print(f" Stability ratio: {len(stable_pairs)/len(co_occurrence):.2%}")
Expected output:
Consensus clustering:
Total node pairs: 1845
Stable pairs (≥50% agreement): 1234
Stability ratio: 66.88%
Layer-Specific Communities
Motivation:
In multilayer networks, you may want to detect communities within individual layers and then compare them across layers. This reveals:
Layer-specific structure (e.g., friendship communities vs. work communities)
How community organization changes across contexts
Which communities are stable vs. layer-dependent
Workflow:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.community_wrapper import louvain_communities
from py3plex.algorithms.community_detection.community_louvain import modularity
import networkx as nx
from collections import Counter
# Load multilayer network
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
print("LAYER-SPECIFIC COMMUNITY DETECTION")
print("=" * 70)
# Extract and analyze each layer separately
layer_communities = {}
layer_stats = {}
for layer in network.get_layers():
print(f"\n--- Layer: {layer} ---")
# Extract layer-specific edges
layer_edges = [
(e[0][0], e[1][0]) # (node, node) without layer info
for e in network.core_network.edges()
if e[0][1] == layer and e[1][1] == layer
]
# Build single-layer graph
G_layer = nx.Graph()
G_layer.add_edges_from(layer_edges)
print(f" Nodes: {G_layer.number_of_nodes()}")
print(f" Edges: {G_layer.number_of_edges()}")
if G_layer.number_of_edges() == 0:
print(f" [SKIP] No edges in this layer")
continue
# Run Louvain on this layer
communities = louvain_communities(G_layer)
layer_communities[layer] = communities
# Statistics
comm_sizes = Counter(communities.values())
n_comms = len(comm_sizes)
Q = modularity(communities, G_layer, weight='weight')
layer_stats[layer] = {
'n_communities': n_comms,
'modularity': Q,
'sizes': comm_sizes
}
print(f" Communities: {n_comms}")
print(f" Modularity: {Q:.4f}")
print(f" Largest community: {max(comm_sizes.values())} nodes")
print(f" Average size: {sum(comm_sizes.values())/n_comms:.1f}")
Expected output:
LAYER-SPECIFIC COMMUNITY DETECTION
======================================================================
--- Layer: layer1 ---
Nodes: 40
Edges: 95
Communities: 4
Modularity: 0.4123
Largest community: 15 nodes
Average size: 10.0
--- Layer: layer2 ---
Nodes: 40
Edges: 102
Communities: 5
Modularity: 0.3876
Largest community: 12 nodes
Average size: 8.0
--- Layer: layer3 ---
Nodes: 40
Edges: 87
Communities: 3
Modularity: 0.4456
Largest community: 18 nodes
Average size: 13.3
Cross-layer stability analysis:
Check how consistently nodes are grouped across layers:
from sklearn.metrics import normalized_mutual_info_score
import pandas as pd
# Build node-level community assignments per layer
node_layer_assignments = {}
all_nodes = set()
for layer, communities in layer_communities.items():
for node, comm_id in communities.items():
if node not in node_layer_assignments:
node_layer_assignments[node] = {}
node_layer_assignments[node][layer] = comm_id
all_nodes.add(node)
# For each node, check consistency across layers
print("\n" + "=" * 70)
print("CROSS-LAYER STABILITY")
print("=" * 70)
layers = list(layer_communities.keys())
# Pairwise NMI between layers
print(f"\nPairwise NMI between layers:")
print(f"{'Layer Pair':<30} {'NMI':<10} {'Interpretation'}")
print("-" * 70)
for i in range(len(layers)):
for j in range(i+1, len(layers)):
layer1, layer2 = layers[i], layers[j]
# Get common nodes
nodes1 = set(layer_communities[layer1].keys())
nodes2 = set(layer_communities[layer2].keys())
common = nodes1 & nodes2
if not common:
continue
# Compute NMI
labels1 = [layer_communities[layer1][n] for n in common]
labels2 = [layer_communities[layer2][n] for n in common]
nmi = normalized_mutual_info_score(labels1, labels2)
# Interpret
if nmi > 0.8:
interp = "Very similar"
elif nmi > 0.5:
interp = "Moderately similar"
else:
interp = "Different"
print(f"{layer1} vs {layer2:<20} {nmi:<10.3f} {interp}")
# Node-level stability score
print(f"\nNode-level stability:")
node_stability = []
for node in sorted(all_nodes):
assignments = node_layer_assignments.get(node, {})
# How many layers does this node appear in?
n_layers = len(assignments)
if n_layers < 2:
continue
# Are the community IDs consistent?
# (This is a simplified measure - in reality, IDs may differ but structure may be same)
comm_ids = list(assignments.values())
is_stable = len(set(comm_ids)) == 1 # All same community ID
node_stability.append({
'node': node,
'n_layers': n_layers,
'is_stable': is_stable,
'assignments': assignments
})
stable_nodes = sum(1 for s in node_stability if s['is_stable'])
print(f" Nodes appearing in ≥2 layers: {len(node_stability)}")
print(f" Stable nodes (same community ID): {stable_nodes}")
print(f" Stability rate: {stable_nodes/len(node_stability)*100:.1f}%")
# Example unstable nodes
print(f"\n Example unstable nodes:")
unstable = [s for s in node_stability if not s['is_stable']][:5]
for item in unstable:
print(f" {item['node']}: {item['assignments']}")
Expected output:
======================================================================
CROSS-LAYER STABILITY
======================================================================
Pairwise NMI between layers:
Layer Pair NMI Interpretation
----------------------------------------------------------------------
layer1 vs layer2 0.723 Moderately similar
layer1 vs layer3 0.456 Different
layer2 vs layer3 0.512 Moderately similar
Node-level stability:
Nodes appearing in ≥2 layers: 40
Stable nodes (same community ID): 18
Stability rate: 45.0%
Example unstable nodes:
A5: {'layer1': 0, 'layer2': 1, 'layer3': 0}
B12: {'layer1': 2, 'layer2': 3}
C3: {'layer1': 1, 'layer2': 0, 'layer3': 2}
D7: {'layer1': 0, 'layer2': 2}
E9: {'layer1': 3, 'layer2': 1, 'layer3': 1}
Visualization - Alluvial diagram:
Show how community membership flows across layers (requires external tools or manual construction):
import pandas as pd
# Export data for alluvial diagram (use R ggalluvial or similar)
alluvial_data = []
for node in all_nodes:
assignments = node_layer_assignments.get(node, {})
if len(assignments) >= 2:
row = {'node': node}
for layer in layers:
row[f'comm_{layer}'] = assignments.get(layer, -1)
alluvial_data.append(row)
df_alluvial = pd.DataFrame(alluvial_data)
df_alluvial.to_csv('alluvial_data.csv', index=False)
print("\nExported alluvial_data.csv for visualization in R/Python")
print("Example R code:")
print(" library(ggalluvial)")
print(" ggplot(data, aes(axis1=comm_layer1, axis2=comm_layer2, axis3=comm_layer3)) +")
print(" geom_alluvium(aes(fill=node)) + geom_stratum()")
When to use layer-specific detection:
Exploratory analysis: Understand layer-specific structure before multilayer methods
Heterogeneous layers: Layers represent fundamentally different relationships (e.g., co-authorship vs. citation)
Baseline comparison: Compare layer-specific vs. multilayer results to quantify benefit of multilayer methods
Dynamic networks: Detect communities in temporal snapshots and track evolution
Cross-Layer Community Analysis
Motivation:
After detecting communities in the full multilayer network, you want to understand:
Do communities span multiple layers?
Which layers contribute most to each community?
Are there inter-layer bridges (nodes connecting different layer-specific communities)?
Community × Layer composition:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.multilayer_modularity import louvain_multilayer
from collections import defaultdict
import numpy as np
import pandas as pd
# Load network and detect communities
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
communities = louvain_multilayer(network, gamma=1.0, omega=1.0, random_state=42)
print("CROSS-LAYER COMMUNITY ANALYSIS")
print("=" * 70)
# Build composition matrix: community × layer
layers = network.get_layers()
comm_ids = sorted(set(communities.values()))
composition = defaultdict(lambda: defaultdict(int))
for (node, layer), comm_id in communities.items():
composition[comm_id][layer] += 1
# Convert to DataFrame for easier manipulation
data = []
for comm_id in comm_ids:
row = {'community': comm_id}
for layer in layers:
row[layer] = composition[comm_id][layer]
row['total'] = sum(composition[comm_id].values())
data.append(row)
df_comp = pd.DataFrame(data)
print("\nCommunity × Layer composition:")
print(df_comp.to_string(index=False))
# Calculate layer entropy for each community
print("\n" + "-" * 70)
print("Community layer diversity (entropy):")
print(f"{'Community':<12} {'Entropy':<10} {'Interpretation'}")
print("-" * 70)
for comm_id in comm_ids:
# Calculate entropy: H = -Σ p_i log(p_i)
counts = [composition[comm_id][layer] for layer in layers]
total = sum(counts)
if total == 0:
continue
probs = np.array(counts) / total
probs = probs[probs > 0] # Remove zeros
entropy = -np.sum(probs * np.log2(probs))
max_entropy = np.log2(len(layers)) # Maximum possible entropy
normalized_entropy = entropy / max_entropy if max_entropy > 0 else 0
# Interpret
if normalized_entropy > 0.9:
interp = "Highly dispersed (spans all layers)"
elif normalized_entropy > 0.5:
interp = "Moderately dispersed (multi-layer)"
else:
interp = "Concentrated (layer-specific)"
print(f"C{comm_id:<11} {entropy:<10.3f} {interp}")
Expected output:
CROSS-LAYER COMMUNITY ANALYSIS
======================================================================
Community × Layer composition:
community layer1 layer2 layer3 total
0 15 14 16 45
1 18 20 0 38
2 0 0 22 22
3 7 6 2 15
----------------------------------------------------------------------
Community layer diversity (entropy):
Community Entropy Interpretation
----------------------------------------------------------------------
C0 1.585 Highly dispersed (spans all layers)
C1 0.997 Moderately dispersed (multi-layer)
C2 0.000 Concentrated (layer-specific)
C3 1.252 Moderately dispersed (multi-layer)
Inter-layer bridges:
Identify nodes that connect different communities across layers:
print("\n" + "=" * 70)
print("INTER-LAYER BRIDGE ANALYSIS")
print("=" * 70)
# For each node, check if it belongs to different communities in different layers
node_communities = defaultdict(dict) # node -> layer -> comm_id
for (node, layer), comm_id in communities.items():
node_communities[node][layer] = comm_id
# Identify bridge nodes
bridge_nodes = []
for node, layer_comms in node_communities.items():
if len(layer_comms) < 2:
continue
# Check if community IDs differ across layers
comm_ids = set(layer_comms.values())
if len(comm_ids) > 1:
bridge_nodes.append({
'node': node,
'n_layers': len(layer_comms),
'n_communities': len(comm_ids),
'assignments': dict(layer_comms)
})
print(f"\nBridge nodes (spanning multiple communities across layers):")
print(f" Total nodes: {len(node_communities)}")
print(f" Bridge nodes: {len(bridge_nodes)} ({len(bridge_nodes)/len(node_communities)*100:.1f}%)")
# Show examples
print(f"\n Top 10 bridge nodes:")
print(f" {'Node':<15} {'Layers':<10} {'Communities':<15} {'Assignments'}")
print(" " + "-" * 65)
bridge_nodes_sorted = sorted(bridge_nodes, key=lambda x: x['n_communities'], reverse=True)
for item in bridge_nodes_sorted[:10]:
node = item['node']
n_layers = item['n_layers']
n_comms = item['n_communities']
assignments = ', '.join([f"{l}:C{c}" for l, c in sorted(item['assignments'].items())])
print(f" {str(node):<15} {n_layers:<10} {n_comms:<15} {assignments}")
Expected output:
======================================================================
INTER-LAYER BRIDGE ANALYSIS
======================================================================
Bridge nodes (spanning multiple communities across layers):
Total nodes: 40
Bridge nodes: 12 (30.0%)
Top 10 bridge nodes:
Node Layers Communities Assignments
-----------------------------------------------------------------
A5 3 3 layer1:C0, layer2:C1, layer3:C2
B12 3 2 layer1:C0, layer2:C1, layer3:C1
C3 3 2 layer1:C1, layer2:C0, layer3:C0
D7 2 2 layer1:C0, layer2:C3
E9 3 2 layer1:C3, layer2:C1, layer3:C1
F4 2 2 layer1:C1, layer2:C0
G8 3 2 layer1:C0, layer2:C0, layer3:C2
H2 2 2 layer1:C3, layer2:C0
I6 3 2 layer1:C1, layer2:C1, layer3:C2
J11 2 2 layer1:C0, layer2:C1
Community connectivity graph:
Build a meta-graph where nodes are communities and edges represent inter-layer bridges:
import networkx as nx
import matplotlib.pyplot as plt
# Build community connectivity graph
G_meta = nx.Graph()
# Add community nodes
for comm_id in comm_ids:
G_meta.add_node(f"C{comm_id}")
# Add edges for bridge nodes
for item in bridge_nodes:
comms = list(item['assignments'].values())
# Connect all pairs of communities this node bridges
for i in range(len(comms)):
for j in range(i+1, len(comms)):
c1, c2 = f"C{comms[i]}", f"C{comms[j]}"
if G_meta.has_edge(c1, c2):
G_meta[c1][c2]['weight'] += 1
else:
G_meta.add_edge(c1, c2, weight=1)
print(f"\n" + "=" * 70)
print("COMMUNITY CONNECTIVITY")
print("=" * 70)
print(f"\nCommunity-level connectivity:")
print(f" Communities: {G_meta.number_of_nodes()}")
print(f" Inter-community bridges: {G_meta.number_of_edges()}")
if G_meta.number_of_edges() > 0:
print(f"\n Strongest bridges (top 5):")
edges_sorted = sorted(G_meta.edges(data=True), key=lambda x: x[2]['weight'], reverse=True)
for c1, c2, data in edges_sorted[:5]:
print(f" {c1} ↔ {c2}: {data['weight']} bridge nodes")
# Visualize meta-graph
plt.figure(figsize=(8, 8))
pos = nx.spring_layout(G_meta, seed=42)
# Edge widths proportional to weight
weights = [G_meta[u][v]['weight'] for u, v in G_meta.edges()]
max_weight = max(weights) if weights else 1
edge_widths = [3 * w / max_weight for w in weights]
nx.draw_networkx_nodes(G_meta, pos, node_size=800, node_color='lightblue')
nx.draw_networkx_labels(G_meta, pos, font_size=12, font_weight='bold')
nx.draw_networkx_edges(G_meta, pos, width=edge_widths, alpha=0.6)
# Edge labels
edge_labels = {(u, v): f"{G_meta[u][v]['weight']}" for u, v in G_meta.edges()}
nx.draw_networkx_edge_labels(G_meta, pos, edge_labels, font_size=8)
plt.title('Community Connectivity Meta-Graph\n(Edge width = number of bridge nodes)', fontsize=14)
plt.axis('off')
plt.tight_layout()
plt.savefig('community_connectivity.png', dpi=300, bbox_inches='tight')
plt.show()
print(f"\n Visualization saved to: community_connectivity.png")
Expected output:
======================================================================
COMMUNITY CONNECTIVITY
======================================================================
Community-level connectivity:
Communities: 4
Inter-community bridges: 5
Strongest bridges (top 5):
C0 ↔ C1: 5 bridge nodes
C1 ↔ C2: 3 bridge nodes
C0 ↔ C3: 2 bridge nodes
C1 ↔ C3: 1 bridge nodes
C0 ↔ C2: 1 bridge nodes
Visualization saved to: community_connectivity.png
Use cases:
Biological networks: Proteins bridging functional modules across different interaction types
Social networks: Individuals connecting different social circles across contexts
Transportation: Transfer hubs connecting regional clusters across transport modes
Quality Metrics
Why quality metrics matter:
Quality metrics help you:
Compare algorithms objectively
Tune parameters (e.g., choosing optimal \(\omega\) in multilayer Louvain)
Validate results (high Q suggests real structure, not random fluctuations)
Detect overfitting (too many tiny communities = over-segmentation)
Compute Modularity
Single-layer modularity:
For flattened networks, use the Newman-Girvan modularity:
from py3plex.core import multinet
from py3plex.algorithms.community_detection.community_wrapper import louvain_communities
from py3plex.algorithms.community_detection.community_louvain import modularity
import networkx as nx
# Load network
network = multinet.multi_layer_network(directed=False)
network.load_network(
"datasets/synthetic_multilayer.txt",
input_type="multiedgelist"
)
# Detect communities
communities = louvain_communities(network)
# Convert to NetworkX for modularity calculation
G = nx.Graph()
for edge in network.core_network.edges():
G.add_edge(edge[0], edge[1])
# Calculate modularity
Q = modularity(communities, G, weight='weight')
print(f"Modularity Q: {Q:.4f}")
# Interpretation
if Q > 0.7:
print(" Interpretation: Excellent community structure")
elif Q > 0.4:
print(" Interpretation: Strong community structure")
elif Q > 0.2:
print(" Interpretation: Moderate community structure")
else:
print(" Interpretation: Weak or no community structure")
Expected output:
Modularity Q: 0.4234
Interpretation: Strong community structure
Multilayer modularity:
For multilayer networks, use the generalized modularity that accounts for inter-layer coupling:
from py3plex.algorithms.community_detection.multilayer_modularity import (
louvain_multilayer,
multilayer_modularity
)
# Run multilayer Louvain
communities = louvain_multilayer(
network,
gamma=1.0,
omega=1.0,
random_state=42
)
# Calculate multilayer modularity
Q_multi = multilayer_modularity(
network,
communities,
gamma=1.0,
omega=1.0
)
print(f"Multilayer modularity Q: {Q_multi:.4f}")
Expected output:
Multilayer modularity Q: 0.4589
Modularity resolution:
Modularity has a resolution limit: it cannot detect communities smaller than \(\sqrt{m}\) where \(m\) is the number of edges. The resolution parameter \(\gamma\) can help:
# Test different resolution parameters
print("Modularity vs. resolution:")
print(f"{'γ':<10} {'#Comm':<10} {'Q':<10} {'Avg Size':<10}")
print("-" * 45)
for gamma in [0.5, 1.0, 1.5, 2.0]:
comms = louvain_multilayer(
network, gamma=gamma, omega=1.0, random_state=42
)
n_comms = len(set(comms.values()))
Q = multilayer_modularity(network, comms, gamma=gamma, omega=1.0)
avg_size = len(comms) / n_comms
print(f"{gamma:<10.1f} {n_comms:<10} {Q:<10.4f} {avg_size:<10.1f}")
Expected output:
Modularity vs. resolution:
γ #Comm Q Avg Size
---------------------------------------------
0.5 3 0.3456 40.0
1.0 5 0.4589 24.0
1.5 8 0.4123 15.0
2.0 12 0.3678 10.0
Interpretation:
Lower γ: Fewer, larger communities (under-segmentation)
Higher γ: More, smaller communities (over-segmentation)
Optimal γ: Maximum Q (but check that communities are meaningful!)
Additional Quality Metrics
1. Coverage (fraction of edges within communities):
def calculate_coverage(network, communities):
"""Fraction of edges within communities."""
intra_edges = 0
total_edges = 0
for source, target in network.core_network.edges():
total_edges += 1
if communities.get(source) == communities.get(target):
intra_edges += 1
return intra_edges / total_edges if total_edges > 0 else 0
coverage = calculate_coverage(network, communities)
print(f"Coverage: {coverage:.4f} (fraction of intra-community edges)")
Expected output:
Coverage: 0.8234 (fraction of intra-community edges)
2. Performance (combines intra-community edges and inter-community non-edges):
def calculate_performance(network, communities):
"""Performance metric (Fortunato 2010)."""
nodes = list(communities.keys())
n = len(nodes)
# Count intra-community edges and inter-community non-edges
intra_edges = 0
inter_non_edges = 0
total_pairs = 0
for i in range(len(nodes)):
for j in range(i+1, len(nodes)):
node1, node2 = nodes[i], nodes[j]
same_community = communities[node1] == communities[node2]
is_edge = network.core_network.has_edge(node1, node2)
if same_community and is_edge:
intra_edges += 1
elif not same_community and not is_edge:
inter_non_edges += 1
total_pairs += 1
return (intra_edges + inter_non_edges) / total_pairs if total_pairs > 0 else 0
performance = calculate_performance(network, communities)
print(f"Performance: {performance:.4f}")
Expected output:
Performance: 0.7456
3. Conductance (quality of community boundaries):
def calculate_conductance(network, communities, comm_id):
"""Conductance of a specific community (lower is better)."""
comm_nodes = [n for n, c in communities.items() if c == comm_id]
if not comm_nodes:
return None
# Count edges
internal_edges = 0
boundary_edges = 0
for node in comm_nodes:
neighbors = list(network.core_network.neighbors(node))
for neighbor in neighbors:
if communities.get(neighbor) == comm_id:
internal_edges += 0.5 # Count each edge once
else:
boundary_edges += 1
volume = internal_edges * 2 + boundary_edges # Volume of the community
return boundary_edges / volume if volume > 0 else 0
# Calculate for all communities
print("\nConductance per community (lower = better defined):")
for comm_id in sorted(set(communities.values())):
cond = calculate_conductance(network, communities, comm_id)
if cond is not None:
print(f" Community {comm_id}: {cond:.4f}")
Expected output:
Conductance per community (lower = better defined):
Community 0: 0.1234
Community 1: 0.2456
Community 2: 0.0987
Community 3: 0.3123
Community 4: 0.1789
4. Null model comparison (compare to random partitions):
import random
# Calculate Q for real partition
Q_real = multilayer_modularity(network, communities, gamma=1.0, omega=1.0)
# Generate random partitions and calculate Q
nodes = list(communities.keys())
n_communities = len(set(communities.values()))
Q_random = []
for trial in range(100):
# Random partition with same number of communities
random_comms = {node: random.randint(0, n_communities-1) for node in nodes}
Q_rand = multilayer_modularity(network, random_comms, gamma=1.0, omega=1.0)
Q_random.append(Q_rand)
Q_rand_mean = np.mean(Q_random)
Q_rand_std = np.std(Q_random)
z_score = (Q_real - Q_rand_mean) / Q_rand_std if Q_rand_std > 0 else 0
print(f"\nNull model comparison:")
print(f" Real Q: {Q_real:.4f}")
print(f" Random Q (mean ± std): {Q_rand_mean:.4f} ± {Q_rand_std:.4f}")
print(f" Z-score: {z_score:.2f}")
if z_score > 3:
print(f" Interpretation: Highly significant (real structure)")
elif z_score > 2:
print(f" Interpretation: Significant (likely real structure)")
else:
print(f" Interpretation: Not significant (could be random)")
Expected output:
Null model comparison:
Real Q: 0.4589
Random Q (mean ± std): 0.0023 ± 0.0145
Z-score: 31.49
Interpretation: Highly significant (real structure)
Summary of metrics:
Modularity (Q): Overall quality, general-purpose
Coverage: Simple interpretability (% internal edges)
Performance: Balances true positives and true negatives
Conductance: Community boundary quality (per-community)
Null model: Statistical significance test
Recommendation: Always report modularity + at least one other metric to get a complete picture.
CLI Cross-Reference (Optional)
py3plex provides command-line tools for quick community detection without writing Python code.
Basic usage:
# Detect communities using Louvain (default algorithm)
py3plex community datasets/network.edgelist \
--algorithm louvain \
--output communities.json
# Using Infomap (requires Infomap binary installed)
py3plex community datasets/network.edgelist \
--algorithm infomap \
--output communities.json
# Using Label Propagation (fast for large networks)
py3plex community datasets/network.edgelist \
--algorithm label_prop \
--output communities.json
# With custom resolution parameter for Louvain
py3plex community datasets/network.edgelist \
--algorithm louvain \
--resolution 1.5 \
--output communities.json
Available algorithms:
louvain: Fast Louvain method (default) - optimizes modularity on flattened networkinfomap: Infomap algorithm - requires Infomap binary (https://www.mapequation.org/infomap/)label_prop: Label propagation - very fast, suitable for large networks
Output format:
The CLI outputs JSON files with structure:
{
"algorithm": "louvain",
"num_communities": 5,
"communities": {
"node1": 0,
"node2": 0,
"node3": 1,
...
},
"community_sizes": {
"0": 42,
"1": 27,
...
}
}
Note on multilayer networks:
The current CLI community command operates on flattened networks. For multilayer-specific community detection (with inter-layer coupling), use the Python API with louvain_multilayer() as shown in the examples above. Future CLI versions may add multilayer support.
Viewing results:
After running the CLI command, you can analyze the JSON output:
# View community statistics
py3plex community network.edgelist --algorithm louvain
# Output printed to console if no --output specified
For full CLI documentation, see Command-Line Interface (CLI) Tutorial or Command-Line Interface (CLI) Tutorial.
Next Steps
Further reading:
Algorithms: Algorithm Landscape - Deep dive into community detection theory
Visualization: How to Visualize Multilayer Networks - Advanced community visualization techniques
Benchmark: ../tutorials/benchmark_communities - Compare with ground-truth communities
Temporal analysis: ../tutorials/temporal_communities - Track community evolution over time
Recommended workflows:
Exploratory: Start with Louvain → visualize → if unsatisfied, try multilayer Louvain or Infomap
Publication: Run multiple algorithms → compare → report consensus + metrics
Large-scale: Use label propagation for initial exploration → refine with Louvain on filtered subgraph
Temporal: Detect communities in snapshots → track with NMI → visualize with alluvial diagrams
Common pitfalls:
Resolution limit: Modularity cannot detect communities smaller than \(\sqrt{m}\)
Non-determinism: Many algorithms are stochastic; always set random seeds for reproducibility
Overfitting: Too many tiny communities suggests over-segmentation; try lower resolution
Layer coupling: For multilayer networks, always try multiple \(\omega\) values
Community detection checklist:
[ ] Run at least 2 different algorithms
[ ] Calculate modularity and at least one other quality metric
[ ] Visualize size distribution to check for over/under-segmentation
[ ] Compare with null model to ensure statistical significance
[ ] For multilayer: test multiple \(\omega\) values
[ ] Export results to CSV for downstream analysis
[ ] Document random seeds for reproducibility
Questions?
GitHub Issues: https://github.com/SkBlaz/py3plex/issues
Documentation: https://skblaz.github.io/py3plex/
Examples:
examples/communities/directory in the repository
Key References:
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J. P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876-878.
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118-1123.
Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3), 036106.