Network Decomposition Tutorial

This tutorial demonstrates network decomposition techniques for heterogeneous and multilayer networks using py3plex.

Overview

Network decomposition transforms complex heterogeneous networks into structured feature representations that can be used for:

  • Node classification

  • Link prediction

  • Network comparison

  • Feature extraction for machine learning

py3plex’s HINMINE module provides methods for decomposing heterogeneous information networks (HINs) based on:

  • Meta-paths - Typed paths connecting node pairs

  • Cycles - Closed walks of specific types

  • Structural patterns - Subgraph patterns and motifs

What is Network Decomposition?

Heterogeneous Information Networks

A heterogeneous information network (HIN) contains multiple node types and edge types.

Example - Academic Network:

  • Node types: Authors, Papers, Venues

  • Edge types: Author-Paper (writes), Paper-Venue (published_in), Paper-Paper (cites)

Meta-paths

A meta-path is a sequence of node and edge types defining a composite relation.

Examples:

  • Author → Paper → Author (co-authorship)

  • Author → Paper → Venue → Paper → Author (same venue)

  • Paper → Paper → Paper (citation path of length 2)

HINMINE Decomposition

Basic Usage

from py3plex.core.HINMINE import decomposition
from py3plex.core import multinet

# Load heterogeneous network
network = multinet.multi_layer_network()
network.load_network("academic_network.graphml", input_type="graphml")

# Perform decomposition
decomposer = decomposition.HINMINE()
features = decomposer.run_decomposition(
    network,
    target_node_type='author',
    max_path_length=3
)

# Features is a matrix: (num_nodes, num_meta_paths)
print(f"Feature matrix shape: {features.shape}")

Parameters

features = decomposer.run_decomposition(
    network,
    target_node_type='author',     # Node type to extract features for
    max_path_length=3,              # Maximum meta-path length
    include_cycles=True,            # Include closed walks
    normalize=True                  # Normalize features
)

Meta-Path Extraction

Enumerate Meta-Paths

# Get all meta-paths up to length 3
meta_paths = decomposer.enumerate_meta_paths(
    network,
    max_length=3,
    node_types=['author', 'paper', 'venue']
)

print(f"Found {len(meta_paths)} meta-paths")
for mp in meta_paths[:5]:
    print(mp)

Example output:

author-paper-author
author-paper-venue-paper-author
paper-paper
paper-author-paper
venue-paper-venue

Compute Meta-Path Instances

Count instances of specific meta-paths:

# Define a meta-path
meta_path = ['author', 'paper', 'author']

# Count instances for each node
instances = decomposer.compute_meta_path_instances(
    network,
    meta_path,
    source_nodes=['author1', 'author2']
)

for node, count in instances.items():
    print(f"{node}: {count} instances")

Cycle Enumeration

Extract Cycles

Find closed walks (cycles) in the network:

# Enumerate all cycles up to length 4
cycles = decomposer.enumerate_cycles(
    network,
    max_length=4,
    starting_node_type='author'
)

print(f"Found {len(cycles)} cycle types")

Common Cycles

Examples of meaningful cycles in academic networks:

  • Triangle: Author → Paper → Venue → Paper → Author → Paper

  • Square: Paper → Cites → Paper → Cites → Paper → Cites → Paper

  • Collaboration cycle: Author → Paper → Author → Paper → Author

Feature Matrix Construction

Build Classification Features

# Build feature matrix for author classification
features = decomposer.build_feature_matrix(
    network,
    target_nodes=['author1', 'author2', 'author3'],
    meta_paths=[
        ['author', 'paper', 'author'],           # Co-authorship
        ['author', 'paper', 'venue'],            # Publishing venues
        ['author', 'paper', 'paper'],            # Citation patterns
    ],
    normalize=True
)

# Features shape: (3, 3) for 3 nodes, 3 meta-paths
print(features)

Normalization Options

# L1 normalization (sum to 1)
features_l1 = decomposer.normalize_features(features, method='l1')

# L2 normalization (unit length)
features_l2 = decomposer.normalize_features(features, method='l2')

# Min-max scaling (0-1 range)
features_minmax = decomposer.normalize_features(features, method='minmax')

# Z-score standardization (mean 0, std 1)
features_zscore = decomposer.normalize_features(features, method='zscore')

Node Classification Example

Complete Workflow

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

# 1. Load network with node labels
network = multinet.multi_layer_network()
network.load_network("labeled_network.graphml", input_type="graphml")

# 2. Extract features via decomposition
decomposer = decomposition.HINMINE()
features = decomposer.run_decomposition(
    network,
    target_node_type='author',
    max_path_length=3
)

# 3. Get labels
labels = network.labels  # Assuming labels are stored in network

# 4. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    features, labels, test_size=0.3, random_state=42
)

# 5. Train classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# 6. Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.3f}")
print(f"F1-score: {f1:.3f}")

Feature Importance

Analyze which meta-paths are most informative:

import matplotlib.pyplot as plt

# Get feature importances from trained model
importances = clf.feature_importances_
meta_path_names = decomposer.get_meta_path_names()

# Plot
plt.figure(figsize=(10, 6))
plt.barh(range(len(importances)), importances)
plt.yticks(range(len(importances)), meta_path_names)
plt.xlabel('Feature Importance')
plt.title('Meta-Path Importance for Classification')
plt.tight_layout()
plt.show()

Advanced Decomposition

Custom Meta-Paths

Define domain-specific meta-paths:

# Academic domain
academic_meta_paths = [
    ['author', 'paper', 'author'],                    # Co-authorship
    ['author', 'paper', 'venue', 'paper', 'author'],  # Same venue
    ['author', 'paper', 'paper', 'author'],           # Citation-based
    ['venue', 'paper', 'author', 'paper', 'venue'],   # Venue similarity
]

features = decomposer.build_feature_matrix(
    network,
    target_nodes=author_nodes,
    meta_paths=academic_meta_paths
)

Weighted Meta-Paths

Assign importance weights to different meta-paths:

# Define meta-paths with weights
meta_path_weights = {
    'author-paper-author': 2.0,           # High importance
    'author-paper-venue': 1.0,            # Normal importance
    'author-paper-paper-author': 0.5      # Lower importance
}

# Apply weights to features
weighted_features = decomposer.apply_weights(features, meta_path_weights)

Temporal Decomposition

For temporal networks with timestamps:

# Decompose network at different time slices
time_slices = [2018, 2019, 2020, 2021]

temporal_features = []
for year in time_slices:
    # Filter network to specific time period
    subnetwork = network.filter_by_time(year)

    # Decompose
    features = decomposer.run_decomposition(subnetwork, 'author')
    temporal_features.append(features)

# Stack temporal features
import numpy as np
full_features = np.hstack(temporal_features)

Performance Optimization

Caching Results

HINMINE automatically caches decomposition results:

# First run: computes and caches
features1 = decomposer.run_decomposition(network, 'author')

# Second run: loads from cache (much faster)
features2 = decomposer.run_decomposition(network, 'author')

Cache is stored in .{md5_hash} files based on network content.

Parallel Processing

For large networks, process meta-paths in parallel:

from joblib import Parallel, delayed

# Compute meta-path instances in parallel
results = Parallel(n_jobs=-1)(
    delayed(decomposer.compute_meta_path_instances)(
        network, mp, target_nodes
    )
    for mp in meta_paths
)

Sparse Representations

Use sparse matrices for large feature spaces:

from scipy.sparse import csr_matrix

# Convert to sparse format
sparse_features = csr_matrix(features)

# Use with scikit-learn
from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(sparse_features, labels)

Best Practices

Meta-Path Selection

  1. Start simple - Begin with short meta-paths (length 2-3)

  2. Domain knowledge - Use meaningful paths for your domain

  3. Feature importance - Analyze which paths are most informative

  4. Avoid redundancy - Remove highly correlated meta-paths

Normalization

  • L1 normalization - Good for interpretability (sum to 1)

  • L2 normalization - Good for distance-based methods

  • Min-max scaling - Good for neural networks

  • Z-score - Good for algorithms sensitive to scale

Validation

Always validate decomposition quality:

  1. Feature coverage - Check that features capture relevant patterns

  2. Downstream performance - Evaluate on actual task (classification, prediction)

  3. Interpretability - Verify features are meaningful

  4. Computational cost - Balance accuracy vs. efficiency

References

HINMINE:

Kralj, J., Robnik-Šikonja, M., & Lavrač, N. (2018). HINMINE: heterogeneous information network mining with information retrieval heuristics. Journal of Intelligent Information Systems, 50(1), 29-61.

Meta-paths:

Sun, Y., Han, J., Yan, X., Yu, P. S., & Wu, T. (2011). PathSim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, 4(11), 992-1003.

HIN Classification:

Shi, C., Li, Y., Zhang, J., Sun, Y., & Philip, S. Y. (2016). A survey of heterogeneous information network analysis. IEEE TKDE, 29(1), 17-37.

Next Steps