Network Decomposition Tutorial
This tutorial demonstrates network decomposition techniques for heterogeneous and multilayer networks using py3plex.
Overview
Network decomposition transforms complex heterogeneous networks into structured feature representations that can be used for:
Node classification
Link prediction
Network comparison
Feature extraction for machine learning
py3plex’s HINMINE module provides methods for decomposing heterogeneous information networks (HINs) based on:
Meta-paths - Typed paths connecting node pairs
Cycles - Closed walks of specific types
Structural patterns - Subgraph patterns and motifs
What is Network Decomposition?
Heterogeneous Information Networks
A heterogeneous information network (HIN) contains multiple node types and edge types.
Example - Academic Network:
Node types: Authors, Papers, Venues
Edge types: Author-Paper (writes), Paper-Venue (published_in), Paper-Paper (cites)
Meta-paths
A meta-path is a sequence of node and edge types defining a composite relation.
Examples:
Author → Paper → Author (co-authorship)
Author → Paper → Venue → Paper → Author (same venue)
Paper → Paper → Paper (citation path of length 2)
HINMINE Decomposition
Basic Usage
from py3plex.core.HINMINE import decomposition
from py3plex.core import multinet
# Load heterogeneous network
network = multinet.multi_layer_network()
network.load_network("academic_network.graphml", input_type="graphml")
# Perform decomposition
decomposer = decomposition.HINMINE()
features = decomposer.run_decomposition(
network,
target_node_type='author',
max_path_length=3
)
# Features is a matrix: (num_nodes, num_meta_paths)
print(f"Feature matrix shape: {features.shape}")
Parameters
features = decomposer.run_decomposition(
network,
target_node_type='author', # Node type to extract features for
max_path_length=3, # Maximum meta-path length
include_cycles=True, # Include closed walks
normalize=True # Normalize features
)
Meta-Path Extraction
Enumerate Meta-Paths
# Get all meta-paths up to length 3
meta_paths = decomposer.enumerate_meta_paths(
network,
max_length=3,
node_types=['author', 'paper', 'venue']
)
print(f"Found {len(meta_paths)} meta-paths")
for mp in meta_paths[:5]:
print(mp)
Example output:
author-paper-author
author-paper-venue-paper-author
paper-paper
paper-author-paper
venue-paper-venue
Compute Meta-Path Instances
Count instances of specific meta-paths:
# Define a meta-path
meta_path = ['author', 'paper', 'author']
# Count instances for each node
instances = decomposer.compute_meta_path_instances(
network,
meta_path,
source_nodes=['author1', 'author2']
)
for node, count in instances.items():
print(f"{node}: {count} instances")
Cycle Enumeration
Extract Cycles
Find closed walks (cycles) in the network:
# Enumerate all cycles up to length 4
cycles = decomposer.enumerate_cycles(
network,
max_length=4,
starting_node_type='author'
)
print(f"Found {len(cycles)} cycle types")
Common Cycles
Examples of meaningful cycles in academic networks:
Triangle: Author → Paper → Venue → Paper → Author → Paper
Square: Paper → Cites → Paper → Cites → Paper → Cites → Paper
Collaboration cycle: Author → Paper → Author → Paper → Author
Feature Matrix Construction
Build Classification Features
# Build feature matrix for author classification
features = decomposer.build_feature_matrix(
network,
target_nodes=['author1', 'author2', 'author3'],
meta_paths=[
['author', 'paper', 'author'], # Co-authorship
['author', 'paper', 'venue'], # Publishing venues
['author', 'paper', 'paper'], # Citation patterns
],
normalize=True
)
# Features shape: (3, 3) for 3 nodes, 3 meta-paths
print(features)
Normalization Options
# L1 normalization (sum to 1)
features_l1 = decomposer.normalize_features(features, method='l1')
# L2 normalization (unit length)
features_l2 = decomposer.normalize_features(features, method='l2')
# Min-max scaling (0-1 range)
features_minmax = decomposer.normalize_features(features, method='minmax')
# Z-score standardization (mean 0, std 1)
features_zscore = decomposer.normalize_features(features, method='zscore')
Node Classification Example
Complete Workflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
# 1. Load network with node labels
network = multinet.multi_layer_network()
network.load_network("labeled_network.graphml", input_type="graphml")
# 2. Extract features via decomposition
decomposer = decomposition.HINMINE()
features = decomposer.run_decomposition(
network,
target_node_type='author',
max_path_length=3
)
# 3. Get labels
labels = network.labels # Assuming labels are stored in network
# 4. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
features, labels, test_size=0.3, random_state=42
)
# 5. Train classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# 6. Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')
print(f"Accuracy: {accuracy:.3f}")
print(f"F1-score: {f1:.3f}")
Feature Importance
Analyze which meta-paths are most informative:
import matplotlib.pyplot as plt
# Get feature importances from trained model
importances = clf.feature_importances_
meta_path_names = decomposer.get_meta_path_names()
# Plot
plt.figure(figsize=(10, 6))
plt.barh(range(len(importances)), importances)
plt.yticks(range(len(importances)), meta_path_names)
plt.xlabel('Feature Importance')
plt.title('Meta-Path Importance for Classification')
plt.tight_layout()
plt.show()
Link Prediction Example
Predict Missing Edges
from sklearn.linear_model import LogisticRegression
# 1. Sample node pairs (positive and negative examples)
positive_pairs = [] # Actual edges
negative_pairs = [] # Non-edges (sampled)
for u, v in network.core_network.edges():
positive_pairs.append((u, v))
# Sample negative examples
import random
nodes = list(network.get_nodes())
while len(negative_pairs) < len(positive_pairs):
u, v = random.sample(nodes, 2)
if not network.core_network.has_edge(u, v):
negative_pairs.append((u, v))
# 2. Extract features for node pairs
def pair_features(u, v, features, node_to_idx):
idx_u = node_to_idx[u]
idx_v = node_to_idx[v]
# Concatenate, multiply, or subtract features
return features[idx_u] * features[idx_v]
# Build dataset
X_pos = [pair_features(u, v, features, node_to_idx) for u, v in positive_pairs]
X_neg = [pair_features(u, v, features, node_to_idx) for u, v in negative_pairs]
X = X_pos + X_neg
y = [1] * len(X_pos) + [0] * len(X_neg)
# 3. Train-test split and classify
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf = LogisticRegression()
clf.fit(X_train, y_train)
# 4. Evaluate
accuracy = clf.score(X_test, y_test)
print(f"Link prediction accuracy: {accuracy:.3f}")
Advanced Decomposition
Custom Meta-Paths
Define domain-specific meta-paths:
# Academic domain
academic_meta_paths = [
['author', 'paper', 'author'], # Co-authorship
['author', 'paper', 'venue', 'paper', 'author'], # Same venue
['author', 'paper', 'paper', 'author'], # Citation-based
['venue', 'paper', 'author', 'paper', 'venue'], # Venue similarity
]
features = decomposer.build_feature_matrix(
network,
target_nodes=author_nodes,
meta_paths=academic_meta_paths
)
Weighted Meta-Paths
Assign importance weights to different meta-paths:
# Define meta-paths with weights
meta_path_weights = {
'author-paper-author': 2.0, # High importance
'author-paper-venue': 1.0, # Normal importance
'author-paper-paper-author': 0.5 # Lower importance
}
# Apply weights to features
weighted_features = decomposer.apply_weights(features, meta_path_weights)
Temporal Decomposition
For temporal networks with timestamps:
# Decompose network at different time slices
time_slices = [2018, 2019, 2020, 2021]
temporal_features = []
for year in time_slices:
# Filter network to specific time period
subnetwork = network.filter_by_time(year)
# Decompose
features = decomposer.run_decomposition(subnetwork, 'author')
temporal_features.append(features)
# Stack temporal features
import numpy as np
full_features = np.hstack(temporal_features)
Performance Optimization
Caching Results
HINMINE automatically caches decomposition results:
# First run: computes and caches
features1 = decomposer.run_decomposition(network, 'author')
# Second run: loads from cache (much faster)
features2 = decomposer.run_decomposition(network, 'author')
Cache is stored in .{md5_hash}
files based on network content.
Parallel Processing
For large networks, process meta-paths in parallel:
from joblib import Parallel, delayed
# Compute meta-path instances in parallel
results = Parallel(n_jobs=-1)(
delayed(decomposer.compute_meta_path_instances)(
network, mp, target_nodes
)
for mp in meta_paths
)
Sparse Representations
Use sparse matrices for large feature spaces:
from scipy.sparse import csr_matrix
# Convert to sparse format
sparse_features = csr_matrix(features)
# Use with scikit-learn
from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(sparse_features, labels)
Best Practices
Meta-Path Selection
Start simple - Begin with short meta-paths (length 2-3)
Domain knowledge - Use meaningful paths for your domain
Feature importance - Analyze which paths are most informative
Avoid redundancy - Remove highly correlated meta-paths
Normalization
L1 normalization - Good for interpretability (sum to 1)
L2 normalization - Good for distance-based methods
Min-max scaling - Good for neural networks
Z-score - Good for algorithms sensitive to scale
Validation
Always validate decomposition quality:
Feature coverage - Check that features capture relevant patterns
Downstream performance - Evaluate on actual task (classification, prediction)
Interpretability - Verify features are meaningful
Computational cost - Balance accuracy vs. efficiency
References
HINMINE:
Kralj, J., Robnik-Šikonja, M., & Lavrač, N. (2018). HINMINE: heterogeneous information network mining with information retrieval heuristics. Journal of Intelligent Information Systems, 50(1), 29-61.
Meta-paths:
Sun, Y., Han, J., Yan, X., Yu, P. S., & Wu, T. (2011). PathSim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, 4(11), 992-1003.
HIN Classification:
Shi, C., Li, Y., Zhang, J., Sun, Y., & Philip, S. Y. (2016). A survey of heterogeneous information network analysis. IEEE TKDE, 29(1), 17-37.
Next Steps
Community Detection Tutorial - Community detection tutorial
Multilayer Network Centrality Measures - Centrality measures
Algorithm Selection Guide - Algorithm selection
Examples in
examples/example_decomposition_and_classification.py