Network Decomposition Tutorial =============================== This tutorial demonstrates network decomposition techniques for heterogeneous and multilayer networks using py3plex. Overview -------- Network decomposition transforms complex heterogeneous networks into structured feature representations that can be used for: * Node classification * Link prediction * Network comparison * Feature extraction for machine learning py3plex's HINMINE module provides methods for decomposing heterogeneous information networks (HINs) based on: * **Meta-paths** - Typed paths connecting node pairs * **Cycles** - Closed walks of specific types * **Structural patterns** - Subgraph patterns and motifs What is Network Decomposition? ------------------------------- Heterogeneous Information Networks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A **heterogeneous information network (HIN)** contains multiple node types and edge types. Example - Academic Network: * **Node types:** Authors, Papers, Venues * **Edge types:** Author-Paper (writes), Paper-Venue (published_in), Paper-Paper (cites) Meta-paths ~~~~~~~~~~ A **meta-path** is a sequence of node and edge types defining a composite relation. Examples: * Author → Paper → Author (co-authorship) * Author → Paper → Venue → Paper → Author (same venue) * Paper → Paper → Paper (citation path of length 2) HINMINE Decomposition --------------------- Basic Usage ~~~~~~~~~~~ .. code-block:: python from py3plex.core.HINMINE import decomposition from py3plex.core import multinet # Load heterogeneous network network = multinet.multi_layer_network() network.load_network("academic_network.graphml", input_type="graphml") # Perform decomposition decomposer = decomposition.HINMINE() features = decomposer.run_decomposition( network, target_node_type='author', max_path_length=3 ) # Features is a matrix: (num_nodes, num_meta_paths) print(f"Feature matrix shape: {features.shape}") Parameters ~~~~~~~~~~ .. code-block:: python features = decomposer.run_decomposition( network, target_node_type='author', # Node type to extract features for max_path_length=3, # Maximum meta-path length include_cycles=True, # Include closed walks normalize=True # Normalize features ) Meta-Path Extraction -------------------- Enumerate Meta-Paths ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get all meta-paths up to length 3 meta_paths = decomposer.enumerate_meta_paths( network, max_length=3, node_types=['author', 'paper', 'venue'] ) print(f"Found {len(meta_paths)} meta-paths") for mp in meta_paths[:5]: print(mp) Example output:: author-paper-author author-paper-venue-paper-author paper-paper paper-author-paper venue-paper-venue Compute Meta-Path Instances ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Count instances of specific meta-paths: .. code-block:: python # Define a meta-path meta_path = ['author', 'paper', 'author'] # Count instances for each node instances = decomposer.compute_meta_path_instances( network, meta_path, source_nodes=['author1', 'author2'] ) for node, count in instances.items(): print(f"{node}: {count} instances") Cycle Enumeration ----------------- Extract Cycles ~~~~~~~~~~~~~~ Find closed walks (cycles) in the network: .. code-block:: python # Enumerate all cycles up to length 4 cycles = decomposer.enumerate_cycles( network, max_length=4, starting_node_type='author' ) print(f"Found {len(cycles)} cycle types") Common Cycles ~~~~~~~~~~~~~ Examples of meaningful cycles in academic networks: * **Triangle:** Author → Paper → Venue → Paper → Author → Paper * **Square:** Paper → Cites → Paper → Cites → Paper → Cites → Paper * **Collaboration cycle:** Author → Paper → Author → Paper → Author Feature Matrix Construction ---------------------------- Build Classification Features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Build feature matrix for author classification features = decomposer.build_feature_matrix( network, target_nodes=['author1', 'author2', 'author3'], meta_paths=[ ['author', 'paper', 'author'], # Co-authorship ['author', 'paper', 'venue'], # Publishing venues ['author', 'paper', 'paper'], # Citation patterns ], normalize=True ) # Features shape: (3, 3) for 3 nodes, 3 meta-paths print(features) Normalization Options ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # L1 normalization (sum to 1) features_l1 = decomposer.normalize_features(features, method='l1') # L2 normalization (unit length) features_l2 = decomposer.normalize_features(features, method='l2') # Min-max scaling (0-1 range) features_minmax = decomposer.normalize_features(features, method='minmax') # Z-score standardization (mean 0, std 1) features_zscore = decomposer.normalize_features(features, method='zscore') Node Classification Example ---------------------------- Complete Workflow ~~~~~~~~~~~~~~~~~ .. code-block:: python from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, f1_score # 1. Load network with node labels network = multinet.multi_layer_network() network.load_network("labeled_network.graphml", input_type="graphml") # 2. Extract features via decomposition decomposer = decomposition.HINMINE() features = decomposer.run_decomposition( network, target_node_type='author', max_path_length=3 ) # 3. Get labels labels = network.labels # Assuming labels are stored in network # 4. Train-test split X_train, X_test, y_train, y_test = train_test_split( features, labels, test_size=0.3, random_state=42 ) # 5. Train classifier clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) # 6. Evaluate y_pred = clf.predict(X_test) accuracy = accuracy_score(y_test, y_pred) f1 = f1_score(y_test, y_pred, average='weighted') print(f"Accuracy: {accuracy:.3f}") print(f"F1-score: {f1:.3f}") Feature Importance ~~~~~~~~~~~~~~~~~~ Analyze which meta-paths are most informative: .. code-block:: python import matplotlib.pyplot as plt # Get feature importances from trained model importances = clf.feature_importances_ meta_path_names = decomposer.get_meta_path_names() # Plot plt.figure(figsize=(10, 6)) plt.barh(range(len(importances)), importances) plt.yticks(range(len(importances)), meta_path_names) plt.xlabel('Feature Importance') plt.title('Meta-Path Importance for Classification') plt.tight_layout() plt.show() Link Prediction Example ----------------------- Predict Missing Edges ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from sklearn.linear_model import LogisticRegression # 1. Sample node pairs (positive and negative examples) positive_pairs = [] # Actual edges negative_pairs = [] # Non-edges (sampled) for u, v in network.core_network.edges(): positive_pairs.append((u, v)) # Sample negative examples import random nodes = list(network.get_nodes()) while len(negative_pairs) < len(positive_pairs): u, v = random.sample(nodes, 2) if not network.core_network.has_edge(u, v): negative_pairs.append((u, v)) # 2. Extract features for node pairs def pair_features(u, v, features, node_to_idx): idx_u = node_to_idx[u] idx_v = node_to_idx[v] # Concatenate, multiply, or subtract features return features[idx_u] * features[idx_v] # Build dataset X_pos = [pair_features(u, v, features, node_to_idx) for u, v in positive_pairs] X_neg = [pair_features(u, v, features, node_to_idx) for u, v in negative_pairs] X = X_pos + X_neg y = [1] * len(X_pos) + [0] * len(X_neg) # 3. Train-test split and classify X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) clf = LogisticRegression() clf.fit(X_train, y_train) # 4. Evaluate accuracy = clf.score(X_test, y_test) print(f"Link prediction accuracy: {accuracy:.3f}") Advanced Decomposition ---------------------- Custom Meta-Paths ~~~~~~~~~~~~~~~~~ Define domain-specific meta-paths: .. code-block:: python # Academic domain academic_meta_paths = [ ['author', 'paper', 'author'], # Co-authorship ['author', 'paper', 'venue', 'paper', 'author'], # Same venue ['author', 'paper', 'paper', 'author'], # Citation-based ['venue', 'paper', 'author', 'paper', 'venue'], # Venue similarity ] features = decomposer.build_feature_matrix( network, target_nodes=author_nodes, meta_paths=academic_meta_paths ) Weighted Meta-Paths ~~~~~~~~~~~~~~~~~~~ Assign importance weights to different meta-paths: .. code-block:: python # Define meta-paths with weights meta_path_weights = { 'author-paper-author': 2.0, # High importance 'author-paper-venue': 1.0, # Normal importance 'author-paper-paper-author': 0.5 # Lower importance } # Apply weights to features weighted_features = decomposer.apply_weights(features, meta_path_weights) Temporal Decomposition ~~~~~~~~~~~~~~~~~~~~~~ For temporal networks with timestamps: .. code-block:: python # Decompose network at different time slices time_slices = [2018, 2019, 2020, 2021] temporal_features = [] for year in time_slices: # Filter network to specific time period subnetwork = network.filter_by_time(year) # Decompose features = decomposer.run_decomposition(subnetwork, 'author') temporal_features.append(features) # Stack temporal features import numpy as np full_features = np.hstack(temporal_features) Performance Optimization ------------------------ Caching Results ~~~~~~~~~~~~~~~ HINMINE automatically caches decomposition results: .. code-block:: python # First run: computes and caches features1 = decomposer.run_decomposition(network, 'author') # Second run: loads from cache (much faster) features2 = decomposer.run_decomposition(network, 'author') Cache is stored in ``.{md5_hash}`` files based on network content. Parallel Processing ~~~~~~~~~~~~~~~~~~~ For large networks, process meta-paths in parallel: .. code-block:: python from joblib import Parallel, delayed # Compute meta-path instances in parallel results = Parallel(n_jobs=-1)( delayed(decomposer.compute_meta_path_instances)( network, mp, target_nodes ) for mp in meta_paths ) Sparse Representations ~~~~~~~~~~~~~~~~~~~~~~ Use sparse matrices for large feature spaces: .. code-block:: python from scipy.sparse import csr_matrix # Convert to sparse format sparse_features = csr_matrix(features) # Use with scikit-learn from sklearn.svm import LinearSVC clf = LinearSVC() clf.fit(sparse_features, labels) Best Practices -------------- Meta-Path Selection ~~~~~~~~~~~~~~~~~~~ 1. **Start simple** - Begin with short meta-paths (length 2-3) 2. **Domain knowledge** - Use meaningful paths for your domain 3. **Feature importance** - Analyze which paths are most informative 4. **Avoid redundancy** - Remove highly correlated meta-paths Normalization ~~~~~~~~~~~~~ * **L1 normalization** - Good for interpretability (sum to 1) * **L2 normalization** - Good for distance-based methods * **Min-max scaling** - Good for neural networks * **Z-score** - Good for algorithms sensitive to scale Validation ~~~~~~~~~~ Always validate decomposition quality: 1. **Feature coverage** - Check that features capture relevant patterns 2. **Downstream performance** - Evaluate on actual task (classification, prediction) 3. **Interpretability** - Verify features are meaningful 4. **Computational cost** - Balance accuracy vs. efficiency References ---------- **HINMINE:** Kralj, J., Robnik-Šikonja, M., & Lavrač, N. (2018). HINMINE: heterogeneous information network mining with information retrieval heuristics. *Journal of Intelligent Information Systems*, 50(1), 29-61. **Meta-paths:** Sun, Y., Han, J., Yan, X., Yu, P. S., & Wu, T. (2011). PathSim: Meta path-based top-k similarity search in heterogeneous information networks. *VLDB*, 4(11), 992-1003. **HIN Classification:** Shi, C., Li, Y., Zhang, J., Sun, Y., & Philip, S. Y. (2016). A survey of heterogeneous information network analysis. *IEEE TKDE*, 29(1), 17-37. Next Steps ---------- * :doc:`community_detection` - Community detection tutorial * :doc:`multilayer_centrality` - Centrality measures * :doc:`../algorithm_guide` - Algorithm selection * Examples in ``examples/example_decomposition_and_classification.py``