Random Walk Algorithms ====================== Py3plex provides comprehensive random walk primitives for graph-based algorithms, including Node2Vec, DeepWalk, and diffusion processes. These implementations support weighted, directed, and multilayer networks with deterministic reproducibility. Overview -------- Random walks are fundamental building blocks for many graph algorithms: - **Node2Vec**: Learn node embeddings using biased random walks - **DeepWalk**: Generate node embeddings through uniform random walks - **Personalized PageRank**: Compute node importance via random walks with restart - **Diffusion processes**: Model information or disease propagation - **Community detection**: Identify network structure through walk patterns Key Features ~~~~~~~~~~~~ - ✅ Basic random walks with proper edge weight handling - ✅ Second-order (biased) random walks following Node2Vec logic - ✅ Multiple simultaneous walks with deterministic seeding - ✅ Support for directed, weighted, and multigraphs - ✅ Multilayer network-aware walks with layer constraints - ✅ Efficient sparse adjacency matrix handling - ✅ Comprehensive test suite validating correctness properties Basic Random Walk ----------------- The basic random walk samples the next node proportionally to normalized edge weights. Simple Example ~~~~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import basic_random_walk import networkx as nx # Create a simple graph G = nx.karate_club_graph() # Perform a random walk walk = basic_random_walk( G, start_node=0, walk_length=10, seed=42 # for reproducibility ) print(f"Walk: {walk}") print(f"Length: {len(walk)}") # 11 nodes (includes start) Weighted Graphs ~~~~~~~~~~~~~~~ Edge weights are automatically used when ``weighted=True`` (default): .. code-block:: python from py3plex.algorithms.general.walkers import basic_random_walk import networkx as nx # Create weighted graph G = nx.Graph() G.add_weighted_edges_from([ (0, 1, 10.0), # high weight -> more likely (0, 2, 1.0), # low weight -> less likely (1, 2, 5.0), ]) # Weighted walk (respects edge weights) walk = basic_random_walk(G, 0, walk_length=20, weighted=True, seed=42) # Unweighted walk (uniform transitions) walk_uniform = basic_random_walk(G, 0, walk_length=20, weighted=False, seed=42) Directed Graphs ~~~~~~~~~~~~~~~ Directed edges are handled correctly: .. code-block:: python import networkx as nx from py3plex.algorithms.general.walkers import basic_random_walk # Create directed graph G = nx.DiGraph() G.add_edges_from([(0, 1), (1, 2), (2, 0), (1, 3)]) # Walk respects edge direction walk = basic_random_walk(G, 0, walk_length=10, seed=42) # Verify all transitions follow directed edges for i in range(len(walk) - 1): assert G.has_edge(walk[i], walk[i+1]) Node2Vec Biased Random Walk ---------------------------- Second-order random walks with return parameter ``p`` and in-out parameter ``q`` following the Node2Vec algorithm (Grover & Leskovec, 2016). Parameters ~~~~~~~~~~ When transitioning from node ``t → v → x``, the probability of choosing ``x`` is biased: - **If x == t** (return to previous): ``weight / p`` - **If x is neighbor of t** (stay close): ``weight / 1`` - **If x is not neighbor of t** (explore): ``weight / q`` **Parameter Effects:** - **p > 1**: Less likely to return to previous node (explore forward) - **p < 1**: More likely to return (backtrack) - **q > 1**: Less likely to explore distant nodes (stay local, BFS-like) - **q < 1**: More likely to explore distant nodes (venture far, DFS-like) - **p = q = 1**: Equivalent to basic random walk Basic Usage ~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import node2vec_walk import networkx as nx G = nx.karate_club_graph() # Balanced walk (p=1, q=1) walk_balanced = node2vec_walk(G, 0, walk_length=20, p=1.0, q=1.0, seed=42) # BFS-like walk (stay local) walk_bfs = node2vec_walk(G, 0, walk_length=20, p=1.0, q=2.0, seed=42) # DFS-like walk (explore outward) walk_dfs = node2vec_walk(G, 0, walk_length=20, p=1.0, q=0.5, seed=42) Exploring vs Backtracking ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import node2vec_walk import networkx as nx # Triangle graph G = nx.Graph() G.add_edges_from([(0, 1), (1, 2), (2, 0)]) # Low p, high q: tends to backtrack walk_backtrack = node2vec_walk(G, 0, walk_length=20, p=0.1, q=10.0, seed=42) # High p, low q: tends to explore outward walk_explore = node2vec_walk(G, 0, walk_length=20, p=10.0, q=0.1, seed=42) # Count backtracking patterns backtracks = sum(1 for i in range(2, len(walk_backtrack)) if walk_backtrack[i] == walk_backtrack[i-2]) print(f"Backtracks: {backtracks}") Multiple Walk Generation ------------------------- Generate multiple walks efficiently with deterministic seeding. Generate Walks from All Nodes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import generate_walks import networkx as nx G = nx.karate_club_graph() # Generate 10 walks from each node walks = generate_walks( G, num_walks=10, walk_length=10, seed=42 ) print(f"Total walks: {len(walks)}") # 340 (34 nodes × 10 walks) print(f"First walk: {walks[0]}") Generate from Specific Nodes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import generate_walks import networkx as nx G = nx.karate_club_graph() # Generate walks only from nodes 0, 1, 2 walks = generate_walks( G, num_walks=5, walk_length=15, start_nodes=[0, 1, 2], seed=42 ) print(f"Total walks: {len(walks)}") # 15 (3 nodes × 5 walks) Node2Vec-Style Walks ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import generate_walks import networkx as nx G = nx.karate_club_graph() # Generate biased walks with p=0.5, q=2.0 walks = generate_walks( G, num_walks=10, walk_length=20, p=0.5, q=2.0, seed=42 ) Edge Sequences ~~~~~~~~~~~~~~ Return walks as edge sequences instead of node sequences: .. code-block:: python from py3plex.algorithms.general.walkers import generate_walks import networkx as nx G = nx.karate_club_graph() # Generate edge sequences edge_walks = generate_walks( G, num_walks=5, walk_length=10, return_edges=True, seed=42 ) # First walk is a list of edges print(edge_walks[0]) # [(0, 3), (3, 2), (2, 1), ...] Multilayer Network Walks ------------------------- Perform random walks on multilayer networks with layer constraints. Layer-Constrained Walks ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import layer_specific_random_walk from py3plex.core import multinet # Create multilayer network network = multinet.multi_layer_network() network.add_layer("social") network.add_layer("biological") network.add_nodes([ ("A", "social"), ("B", "social"), ("C", "social"), ("A", "biological"), ("B", "biological") ]) network.add_edges([ (("A", "social"), ("B", "social")), (("B", "social"), ("C", "social")), (("A", "biological"), ("B", "biological")), ]) # Walk constrained to social layer walk = layer_specific_random_walk( network.core_network, start_node="A---social", walk_length=10, layer="social", cross_layer_prob=0.0, # never cross layers seed=42 ) # All nodes in walk are in social layer print(walk) Cross-Layer Transitions ~~~~~~~~~~~~~~~~~~~~~~~ Allow occasional transitions between layers: .. code-block:: python from py3plex.algorithms.general.walkers import layer_specific_random_walk # Same network as above # Walk with 10% probability of crossing layers walk = layer_specific_random_walk( network.core_network, start_node="A---social", walk_length=20, layer="social", cross_layer_prob=0.1, # 10% chance to cross seed=42 ) # May contain nodes from both layers print(walk) Reproducibility --------------- All walk functions support deterministic seeding for reproducibility. Fixed Seed ~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import basic_random_walk import networkx as nx G = nx.karate_club_graph() # Same seed produces identical walks walk1 = basic_random_walk(G, 0, 50, seed=42) walk2 = basic_random_walk(G, 0, 50, seed=42) assert walk1 == walk2 # Identical Different Seeds ~~~~~~~~~~~~~~~ .. code-block:: python # Different seeds produce different walks walk1 = basic_random_walk(G, 0, 50, seed=42) walk2 = basic_random_walk(G, 0, 50, seed=123) assert walk1 != walk2 # Different Batch Reproducibility ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.algorithms.general.walkers import generate_walks G = nx.karate_club_graph() # Same seed produces identical walk sets walks1 = generate_walks(G, num_walks=100, walk_length=10, seed=42) walks2 = generate_walks(G, num_walks=100, walk_length=10, seed=42) assert walks1 == walks2 Performance Tips ---------------- Large Graphs ~~~~~~~~~~~~ For large sparse graphs (>100k nodes), use basic walks for better performance: .. code-block:: python from py3plex.algorithms.general.walkers import generate_walks import networkx as nx # Large sparse graph G = nx.erdos_renyi_graph(100000, 0.0001, seed=42) # Use basic walk (p=1, q=1) for speed walks = generate_walks( G, num_walks=10, walk_length=100, p=1.0, # No bias = faster q=1.0, seed=42 ) Parallel Processing ~~~~~~~~~~~~~~~~~~~ For generating many walks, consider parallel processing: .. code-block:: python from py3plex.algorithms.general.walkers import generate_walks import networkx as nx from multiprocessing import Pool G = nx.karate_club_graph() def generate_batch(seed): return generate_walks(G, num_walks=10, walk_length=20, seed=seed) # Generate walks in parallel with Pool(4) as pool: batch_walks = pool.map(generate_batch, range(10)) # Flatten results all_walks = [walk for batch in batch_walks for walk in batch] Correctness Validation ---------------------- The implementation has been validated against the following properties: Uniformity Test ~~~~~~~~~~~~~~~ On unweighted regular graphs, transition probabilities are uniform: .. code-block:: python import networkx as nx from py3plex.algorithms.general.walkers import basic_random_walk from collections import Counter # Complete graph (regular) G = nx.complete_graph(10) # Count visits to neighbors from node 0 visits = Counter() for i in range(10000): walk = basic_random_walk(G, 0, 1, weighted=False, seed=i) if len(walk) > 1: visits[walk[1]] += 1 # All neighbors should be visited roughly equally print(visits) # Expected: ~1111 visits to each of 9 neighbors Conservation Test ~~~~~~~~~~~~~~~~~ Sum of transition probabilities equals 1.0 (within machine epsilon): .. code-block:: python import networkx as nx import numpy as np G = nx.karate_club_graph() # For each node, verify probability conservation for node in G.nodes(): neighbors = list(G.neighbors(node)) if len(neighbors) > 0: weights = np.array([G[node][n].get('weight', 1.0) for n in neighbors]) probs = weights / weights.sum() assert abs(probs.sum() - 1.0) < 1e-10 # Machine epsilon Bias Consistency Test ~~~~~~~~~~~~~~~~~~~~~ Node2Vec parameters p and q produce expected biases: .. code-block:: python import networkx as nx from py3plex.algorithms.general.walkers import node2vec_walk # Triangle graph G = nx.Graph() G.add_edges_from([(0, 1), (1, 2), (2, 0)]) # Low p should encourage backtracking backtracks_low_p = 0 for i in range(1000): walk = node2vec_walk(G, 0, 10, p=0.1, q=1.0, seed=i) for j in range(2, len(walk)): if walk[j] == walk[j-2]: backtracks_low_p += 1 # High p should discourage backtracking backtracks_high_p = 0 for i in range(1000): walk = node2vec_walk(G, 0, 10, p=10.0, q=1.0, seed=i) for j in range(2, len(walk)): if walk[j] == walk[j-2]: backtracks_high_p += 1 assert backtracks_low_p > backtracks_high_p * 2 # Significantly more Edge Weight Test ~~~~~~~~~~~~~~~~ Visit frequency matches edge weight ratios: .. code-block:: python import networkx as nx from py3plex.algorithms.general.walkers import basic_random_walk from collections import Counter G = nx.Graph() G.add_weighted_edges_from([ (0, 1, 1.0), (0, 2, 2.0), (0, 3, 3.0), ]) visits = Counter() for i in range(10000): walk = basic_random_walk(G, 0, 1, weighted=True, seed=i) if len(walk) > 1: visits[walk[1]] += 1 # Expected ratio: 1:2:3 total = sum(visits.values()) ratios = [visits[1]/total, visits[2]/total, visits[3]/total] expected = [1/6, 2/6, 3/6] # Within 5% tolerance for obs, exp in zip(ratios, expected): assert abs(obs - exp) < 0.05 References ---------- **Node2Vec:** Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. *Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 855-864. https://doi.org/10.1145/2939672.2939754 **DeepWalk:** Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). DeepWalk: Online learning of social representations. *Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 701-710. https://doi.org/10.1145/2623330.2623732 **Random Walks on Graphs:** Lovász, L. (1993). Random walks on graphs: A survey. *Combinatorics, Paul Erdős is Eighty*, 2, 1-46. API Reference ------------- See :mod:`py3plex.algorithms.general.walkers` for complete API documentation. Examples -------- Complete examples can be found in: - ``examples/example_random_walks.py`` - Basic usage examples - ``tests/test_random_walks.py`` - Comprehensive test suite Repository: https://github.com/SkBlaz/py3plex