Network Decomposition Tutorial

This tutorial demonstrates network decomposition techniques for heterogeneous and multilayer networks using py3plex.

Overview

Network decomposition transforms complex heterogeneous networks into structured feature representations that can be used for:

Node classification
Link prediction
Network comparison
Feature extraction for machine learning

py3plex’s HINMINE module provides methods for decomposing heterogeneous information networks (HINs) based on:

Meta-paths - Typed paths connecting node pairs
Cycles - Closed walks of specific types
Structural patterns - Subgraph patterns and motifs

What is Network Decomposition?

Heterogeneous Information Networks

A heterogeneous information network (HIN) contains multiple node types and edge types.

Example - Academic Network:

Node types: Authors, Papers, Venues
Edge types: Author-Paper (writes), Paper-Venue (published_in), Paper-Paper (cites)

Meta-paths

A meta-path is a sequence of node and edge types defining a composite relation.

Examples:

Author → Paper → Author (co-authorship)
Author → Paper → Venue → Paper → Author (same venue)
Paper → Paper → Paper (citation path of length 2)

HINMINE Decomposition

Basic Usage

from py3plex.core.HINMINE import decomposition
from py3plex.core import multinet

# Load heterogeneous network
network = multinet.multi_layer_network()
network.load_network("academic_network.graphml", input_type="graphml")

# Perform decomposition
decomposer = decomposition.HINMINE()
features = decomposer.run_decomposition(
    network,
    target_node_type='author',
    max_path_length=3
)

# Features is a matrix: (num_nodes, num_meta_paths)
print(f"Feature matrix shape: {features.shape}")

Parameters

features = decomposer.run_decomposition(
    network,
    target_node_type='author',     # Node type to extract features for
    max_path_length=3,              # Maximum meta-path length
    include_cycles=True,            # Include closed walks
    normalize=True                  # Normalize features
)

Meta-Path Extraction

Enumerate Meta-Paths

# Get all meta-paths up to length 3
meta_paths = decomposer.enumerate_meta_paths(
    network,
    max_length=3,
    node_types=['author', 'paper', 'venue']
)

print(f"Found {len(meta_paths)} meta-paths")
for mp in meta_paths[:5]:
    print(mp)

Example output:

author-paper-author
author-paper-venue-paper-author
paper-paper
paper-author-paper
venue-paper-venue

Compute Meta-Path Instances

Count instances of specific meta-paths:

# Define a meta-path
meta_path = ['author', 'paper', 'author']

# Count instances for each node
instances = decomposer.compute_meta_path_instances(
    network,
    meta_path,
    source_nodes=['author1', 'author2']
)

for node, count in instances.items():
    print(f"{node}: {count} instances")

Cycle Enumeration

Extract Cycles

Find closed walks (cycles) in the network:

# Enumerate all cycles up to length 4
cycles = decomposer.enumerate_cycles(
    network,
    max_length=4,
    starting_node_type='author'
)

print(f"Found {len(cycles)} cycle types")

Common Cycles

Examples of meaningful cycles in academic networks:

Triangle: Author → Paper → Venue → Paper → Author → Paper
Square: Paper → Cites → Paper → Cites → Paper → Cites → Paper
Collaboration cycle: Author → Paper → Author → Paper → Author

Feature Matrix Construction

Build Classification Features

# Build feature matrix for author classification
features = decomposer.build_feature_matrix(
    network,
    target_nodes=['author1', 'author2', 'author3'],
    meta_paths=[
        ['author', 'paper', 'author'],           # Co-authorship
        ['author', 'paper', 'venue'],            # Publishing venues
        ['author', 'paper', 'paper'],            # Citation patterns
    ],
    normalize=True
)

# Features shape: (3, 3) for 3 nodes, 3 meta-paths
print(features)

Normalization Options

# L1 normalization (sum to 1)
features_l1 = decomposer.normalize_features(features, method='l1')

# L2 normalization (unit length)
features_l2 = decomposer.normalize_features(features, method='l2')

# Min-max scaling (0-1 range)
features_minmax = decomposer.normalize_features(features, method='minmax')

# Z-score standardization (mean 0, std 1)
features_zscore = decomposer.normalize_features(features, method='zscore')

Node Classification Example

Complete Workflow

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

# 1. Load network with node labels
network = multinet.multi_layer_network()
network.load_network("labeled_network.graphml", input_type="graphml")

# 2. Extract features via decomposition
decomposer = decomposition.HINMINE()
features = decomposer.run_decomposition(
    network,
    target_node_type='author',
    max_path_length=3
)

# 3. Get labels
labels = network.labels  # Assuming labels are stored in network

# 4. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    features, labels, test_size=0.3, random_state=42
)

# 5. Train classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# 6. Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.3f}")
print(f"F1-score: {f1:.3f}")

Feature Importance

Analyze which meta-paths are most informative:

import matplotlib.pyplot as plt

# Get feature importances from trained model
importances = clf.feature_importances_
meta_path_names = decomposer.get_meta_path_names()

# Plot
plt.figure(figsize=(10, 6))
plt.barh(range(len(importances)), importances)
plt.yticks(range(len(importances)), meta_path_names)
plt.xlabel('Feature Importance')
plt.title('Meta-Path Importance for Classification')
plt.tight_layout()
plt.show()

Link Prediction Example

Predict Missing Edges

from sklearn.linear_model import LogisticRegression

# 1. Sample node pairs (positive and negative examples)
positive_pairs = []  # Actual edges
negative_pairs = []  # Non-edges (sampled)

for u, v in network.core_network.edges():
    positive_pairs.append((u, v))

# Sample negative examples
import random
nodes = list(network.get_nodes())
while len(negative_pairs) < len(positive_pairs):
    u, v = random.sample(nodes, 2)
    if not network.core_network.has_edge(u, v):
        negative_pairs.append((u, v))

# 2. Extract features for node pairs
def pair_features(u, v, features, node_to_idx):
    idx_u = node_to_idx[u]
    idx_v = node_to_idx[v]
    # Concatenate, multiply, or subtract features
    return features[idx_u] * features[idx_v]

# Build dataset
X_pos = [pair_features(u, v, features, node_to_idx) for u, v in positive_pairs]
X_neg = [pair_features(u, v, features, node_to_idx) for u, v in negative_pairs]

X = X_pos + X_neg
y = [1] * len(X_pos) + [0] * len(X_neg)

# 3. Train-test split and classify
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

clf = LogisticRegression()
clf.fit(X_train, y_train)

# 4. Evaluate
accuracy = clf.score(X_test, y_test)
print(f"Link prediction accuracy: {accuracy:.3f}")

Advanced Decomposition

Custom Meta-Paths

Define domain-specific meta-paths:

# Academic domain
academic_meta_paths = [
    ['author', 'paper', 'author'],                    # Co-authorship
    ['author', 'paper', 'venue', 'paper', 'author'],  # Same venue
    ['author', 'paper', 'paper', 'author'],           # Citation-based
    ['venue', 'paper', 'author', 'paper', 'venue'],   # Venue similarity
]

features = decomposer.build_feature_matrix(
    network,
    target_nodes=author_nodes,
    meta_paths=academic_meta_paths
)

Weighted Meta-Paths

Assign importance weights to different meta-paths:

# Define meta-paths with weights
meta_path_weights = {
    'author-paper-author': 2.0,           # High importance
    'author-paper-venue': 1.0,            # Normal importance
    'author-paper-paper-author': 0.5      # Lower importance
}

# Apply weights to features
weighted_features = decomposer.apply_weights(features, meta_path_weights)

Temporal Decomposition

For temporal networks with timestamps:

# Decompose network at different time slices
time_slices = [2018, 2019, 2020, 2021]

temporal_features = []
for year in time_slices:
    # Filter network to specific time period
    subnetwork = network.filter_by_time(year)

    # Decompose
    features = decomposer.run_decomposition(subnetwork, 'author')
    temporal_features.append(features)

# Stack temporal features
import numpy as np
full_features = np.hstack(temporal_features)

Performance Optimization

Caching Results

HINMINE automatically caches decomposition results:

# First run: computes and caches
features1 = decomposer.run_decomposition(network, 'author')

# Second run: loads from cache (much faster)
features2 = decomposer.run_decomposition(network, 'author')

Cache is stored in .{md5_hash} files based on network content.

Parallel Processing

For large networks, process meta-paths in parallel:

from joblib import Parallel, delayed

# Compute meta-path instances in parallel
results = Parallel(n_jobs=-1)(
    delayed(decomposer.compute_meta_path_instances)(
        network, mp, target_nodes
    )
    for mp in meta_paths
)

Sparse Representations

Use sparse matrices for large feature spaces:

from scipy.sparse import csr_matrix

# Convert to sparse format
sparse_features = csr_matrix(features)

# Use with scikit-learn
from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(sparse_features, labels)

Best Practices

Meta-Path Selection

Start simple - Begin with short meta-paths (length 2-3)
Domain knowledge - Use meaningful paths for your domain
Feature importance - Analyze which paths are most informative
Avoid redundancy - Remove highly correlated meta-paths

Normalization

L1 normalization - Good for interpretability (sum to 1)
L2 normalization - Good for distance-based methods
Min-max scaling - Good for neural networks
Z-score - Good for algorithms sensitive to scale

Validation

Always validate decomposition quality:

Feature coverage - Check that features capture relevant patterns
Downstream performance - Evaluate on actual task (classification, prediction)
Interpretability - Verify features are meaningful
Computational cost - Balance accuracy vs. efficiency

References

HINMINE:

Kralj, J., Robnik-Šikonja, M., & Lavrač, N. (2018). HINMINE: heterogeneous information network mining with information retrieval heuristics. Journal of Intelligent Information Systems, 50(1), 29-61.

Meta-paths:

Sun, Y., Han, J., Yan, X., Yu, P. S., & Wu, T. (2011). PathSim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, 4(11), 992-1003.

HIN Classification:

Shi, C., Li, Y., Zhang, J., Sun, Y., & Philip, S. Y. (2016). A survey of heterogeneous information network analysis. IEEE TKDE, 29(1), 17-37.

Next Steps

Community Detection Tutorial - Community detection tutorial
Multilayer Network Centrality Measures - Centrality measures
Algorithm Selection Guide - Algorithm selection
Examples in examples/example_decomposition_and_classification.py