Random Walk Algorithms

Py3plex provides comprehensive random walk primitives for graph-based algorithms, including Node2Vec, DeepWalk, and diffusion processes. These implementations support weighted, directed, and multilayer networks with deterministic reproducibility.

Overview

Random walks are fundamental building blocks for many graph algorithms:

Node2Vec: Learn node embeddings using biased random walks
DeepWalk: Generate node embeddings through uniform random walks
Personalized PageRank: Compute node importance via random walks with restart
Diffusion processes: Model information or disease propagation
Community detection: Identify network structure through walk patterns

Key Features

✅ Basic random walks with proper edge weight handling
✅ Second-order (biased) random walks following Node2Vec logic
✅ Multiple simultaneous walks with deterministic seeding
✅ Support for directed, weighted, and multigraphs
✅ Multilayer network-aware walks with layer constraints
✅ Efficient sparse adjacency matrix handling
✅ Comprehensive test suite validating correctness properties

Basic Random Walk

The basic random walk samples the next node proportionally to normalized edge weights.

Simple Example

from py3plex.algorithms.general.walkers import basic_random_walk
import networkx as nx

# Create a simple graph
G = nx.karate_club_graph()

# Perform a random walk
walk = basic_random_walk(
    G,
    start_node=0,
    walk_length=10,
    seed=42  # for reproducibility
)

print(f"Walk: {walk}")
print(f"Length: {len(walk)}")  # 11 nodes (includes start)

Weighted Graphs

Edge weights are automatically used when weighted=True (default):

from py3plex.algorithms.general.walkers import basic_random_walk
import networkx as nx

# Create weighted graph
G = nx.Graph()
G.add_weighted_edges_from([
    (0, 1, 10.0),  # high weight -> more likely
    (0, 2, 1.0),   # low weight -> less likely
    (1, 2, 5.0),
])

# Weighted walk (respects edge weights)
walk = basic_random_walk(G, 0, walk_length=20, weighted=True, seed=42)

# Unweighted walk (uniform transitions)
walk_uniform = basic_random_walk(G, 0, walk_length=20, weighted=False, seed=42)

Directed Graphs

Directed edges are handled correctly:

import networkx as nx
from py3plex.algorithms.general.walkers import basic_random_walk

# Create directed graph
G = nx.DiGraph()
G.add_edges_from([(0, 1), (1, 2), (2, 0), (1, 3)])

# Walk respects edge direction
walk = basic_random_walk(G, 0, walk_length=10, seed=42)

# Verify all transitions follow directed edges
for i in range(len(walk) - 1):
    assert G.has_edge(walk[i], walk[i+1])

Node2Vec Biased Random Walk

Second-order random walks with return parameter p and in-out parameter q following the Node2Vec algorithm (Grover & Leskovec, 2016).

Parameters

When transitioning from node t → v → x, the probability of choosing x is biased:

If x == t (return to previous): weight / p
If x is neighbor of t (stay close): weight / 1
If x is not neighbor of t (explore): weight / q

Parameter Effects:

p > 1: Less likely to return to previous node (explore forward)
p < 1: More likely to return (backtrack)
q > 1: Less likely to explore distant nodes (stay local, BFS-like)
q < 1: More likely to explore distant nodes (venture far, DFS-like)
p = q = 1: Equivalent to basic random walk

Basic Usage

from py3plex.algorithms.general.walkers import node2vec_walk
import networkx as nx

G = nx.karate_club_graph()

# Balanced walk (p=1, q=1)
walk_balanced = node2vec_walk(G, 0, walk_length=20, p=1.0, q=1.0, seed=42)

# BFS-like walk (stay local)
walk_bfs = node2vec_walk(G, 0, walk_length=20, p=1.0, q=2.0, seed=42)

# DFS-like walk (explore outward)
walk_dfs = node2vec_walk(G, 0, walk_length=20, p=1.0, q=0.5, seed=42)

Exploring vs Backtracking

from py3plex.algorithms.general.walkers import node2vec_walk
import networkx as nx

# Triangle graph
G = nx.Graph()
G.add_edges_from([(0, 1), (1, 2), (2, 0)])

# Low p, high q: tends to backtrack
walk_backtrack = node2vec_walk(G, 0, walk_length=20, p=0.1, q=10.0, seed=42)

# High p, low q: tends to explore outward
walk_explore = node2vec_walk(G, 0, walk_length=20, p=10.0, q=0.1, seed=42)

# Count backtracking patterns
backtracks = sum(1 for i in range(2, len(walk_backtrack))
                 if walk_backtrack[i] == walk_backtrack[i-2])
print(f"Backtracks: {backtracks}")

Multiple Walk Generation

Generate multiple walks efficiently with deterministic seeding.

Generate Walks from All Nodes

from py3plex.algorithms.general.walkers import generate_walks
import networkx as nx

G = nx.karate_club_graph()

# Generate 10 walks from each node
walks = generate_walks(
    G,
    num_walks=10,
    walk_length=10,
    seed=42
)

print(f"Total walks: {len(walks)}")  # 340 (34 nodes × 10 walks)
print(f"First walk: {walks[0]}")

Generate from Specific Nodes

from py3plex.algorithms.general.walkers import generate_walks
import networkx as nx

G = nx.karate_club_graph()

# Generate walks only from nodes 0, 1, 2
walks = generate_walks(
    G,
    num_walks=5,
    walk_length=15,
    start_nodes=[0, 1, 2],
    seed=42
)

print(f"Total walks: {len(walks)}")  # 15 (3 nodes × 5 walks)

Node2Vec-Style Walks

from py3plex.algorithms.general.walkers import generate_walks
import networkx as nx

G = nx.karate_club_graph()

# Generate biased walks with p=0.5, q=2.0
walks = generate_walks(
    G,
    num_walks=10,
    walk_length=20,
    p=0.5,
    q=2.0,
    seed=42
)

Edge Sequences

Return walks as edge sequences instead of node sequences:

from py3plex.algorithms.general.walkers import generate_walks
import networkx as nx

G = nx.karate_club_graph()

# Generate edge sequences
edge_walks = generate_walks(
    G,
    num_walks=5,
    walk_length=10,
    return_edges=True,
    seed=42
)

# First walk is a list of edges
print(edge_walks[0])  # [(0, 3), (3, 2), (2, 1), ...]

Multilayer Network Walks

Perform random walks on multilayer networks with layer constraints.

Layer-Constrained Walks

from py3plex.algorithms.general.walkers import layer_specific_random_walk
from py3plex.core import multinet

# Create multilayer network
network = multinet.multi_layer_network()
network.add_layer("social")
network.add_layer("biological")

network.add_nodes([
    ("A", "social"), ("B", "social"), ("C", "social"),
    ("A", "biological"), ("B", "biological")
])

network.add_edges([
    (("A", "social"), ("B", "social")),
    (("B", "social"), ("C", "social")),
    (("A", "biological"), ("B", "biological")),
])

# Walk constrained to social layer
walk = layer_specific_random_walk(
    network.core_network,
    start_node="A---social",
    walk_length=10,
    layer="social",
    cross_layer_prob=0.0,  # never cross layers
    seed=42
)

# All nodes in walk are in social layer
print(walk)

Cross-Layer Transitions

Allow occasional transitions between layers:

from py3plex.algorithms.general.walkers import layer_specific_random_walk

# Same network as above

# Walk with 10% probability of crossing layers
walk = layer_specific_random_walk(
    network.core_network,
    start_node="A---social",
    walk_length=20,
    layer="social",
    cross_layer_prob=0.1,  # 10% chance to cross
    seed=42
)

# May contain nodes from both layers
print(walk)

Reproducibility

All walk functions support deterministic seeding for reproducibility.

Fixed Seed

from py3plex.algorithms.general.walkers import basic_random_walk
import networkx as nx

G = nx.karate_club_graph()

# Same seed produces identical walks
walk1 = basic_random_walk(G, 0, 50, seed=42)
walk2 = basic_random_walk(G, 0, 50, seed=42)

assert walk1 == walk2  # Identical

Different Seeds

# Different seeds produce different walks
walk1 = basic_random_walk(G, 0, 50, seed=42)
walk2 = basic_random_walk(G, 0, 50, seed=123)

assert walk1 != walk2  # Different

Batch Reproducibility

from py3plex.algorithms.general.walkers import generate_walks

G = nx.karate_club_graph()

# Same seed produces identical walk sets
walks1 = generate_walks(G, num_walks=100, walk_length=10, seed=42)
walks2 = generate_walks(G, num_walks=100, walk_length=10, seed=42)

assert walks1 == walks2

Performance Tips

Large Graphs

For large sparse graphs (>100k nodes), use basic walks for better performance:

from py3plex.algorithms.general.walkers import generate_walks
import networkx as nx

# Large sparse graph
G = nx.erdos_renyi_graph(100000, 0.0001, seed=42)

# Use basic walk (p=1, q=1) for speed
walks = generate_walks(
    G,
    num_walks=10,
    walk_length=100,
    p=1.0,  # No bias = faster
    q=1.0,
    seed=42
)

Parallel Processing

For generating many walks, consider parallel processing:

from py3plex.algorithms.general.walkers import generate_walks
import networkx as nx
from multiprocessing import Pool

G = nx.karate_club_graph()

def generate_batch(seed):
    return generate_walks(G, num_walks=10, walk_length=20, seed=seed)

# Generate walks in parallel
with Pool(4) as pool:
    batch_walks = pool.map(generate_batch, range(10))

# Flatten results
all_walks = [walk for batch in batch_walks for walk in batch]

Correctness Validation

The implementation has been validated against the following properties:

Uniformity Test

On unweighted regular graphs, transition probabilities are uniform:

import networkx as nx
from py3plex.algorithms.general.walkers import basic_random_walk
from collections import Counter

# Complete graph (regular)
G = nx.complete_graph(10)

# Count visits to neighbors from node 0
visits = Counter()
for i in range(10000):
    walk = basic_random_walk(G, 0, 1, weighted=False, seed=i)
    if len(walk) > 1:
        visits[walk[1]] += 1

# All neighbors should be visited roughly equally
print(visits)
# Expected: ~1111 visits to each of 9 neighbors

Conservation Test

Sum of transition probabilities equals 1.0 (within machine epsilon):

import networkx as nx
import numpy as np

G = nx.karate_club_graph()

# For each node, verify probability conservation
for node in G.nodes():
    neighbors = list(G.neighbors(node))
    if len(neighbors) > 0:
        weights = np.array([G[node][n].get('weight', 1.0) for n in neighbors])
        probs = weights / weights.sum()
        assert abs(probs.sum() - 1.0) < 1e-10  # Machine epsilon

Bias Consistency Test

Node2Vec parameters p and q produce expected biases:

import networkx as nx
from py3plex.algorithms.general.walkers import node2vec_walk

# Triangle graph
G = nx.Graph()
G.add_edges_from([(0, 1), (1, 2), (2, 0)])

# Low p should encourage backtracking
backtracks_low_p = 0
for i in range(1000):
    walk = node2vec_walk(G, 0, 10, p=0.1, q=1.0, seed=i)
    for j in range(2, len(walk)):
        if walk[j] == walk[j-2]:
            backtracks_low_p += 1

# High p should discourage backtracking
backtracks_high_p = 0
for i in range(1000):
    walk = node2vec_walk(G, 0, 10, p=10.0, q=1.0, seed=i)
    for j in range(2, len(walk)):
        if walk[j] == walk[j-2]:
            backtracks_high_p += 1

assert backtracks_low_p > backtracks_high_p * 2  # Significantly more

Edge Weight Test

Visit frequency matches edge weight ratios:

import networkx as nx
from py3plex.algorithms.general.walkers import basic_random_walk
from collections import Counter

G = nx.Graph()
G.add_weighted_edges_from([
    (0, 1, 1.0),
    (0, 2, 2.0),
    (0, 3, 3.0),
])

visits = Counter()
for i in range(10000):
    walk = basic_random_walk(G, 0, 1, weighted=True, seed=i)
    if len(walk) > 1:
        visits[walk[1]] += 1

# Expected ratio: 1:2:3
total = sum(visits.values())
ratios = [visits[1]/total, visits[2]/total, visits[3]/total]
expected = [1/6, 2/6, 3/6]

# Within 5% tolerance
for obs, exp in zip(ratios, expected):
    assert abs(obs - exp) < 0.05

References

Node2Vec:: Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855-864. https://doi.org/10.1145/2939672.2939754
DeepWalk:: Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). DeepWalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701-710. https://doi.org/10.1145/2623330.2623732
Random Walks on Graphs:: Lovász, L. (1993). Random walks on graphs: A survey. Combinatorics, Paul Erdős is Eighty, 2, 1-46.

API Reference

See py3plex.algorithms.general.walkers for complete API documentation.

Examples

Complete examples can be found in:

examples/example_random_walks.py - Basic usage examples
tests/test_random_walks.py - Comprehensive test suite

Repository: https://github.com/SkBlaz/py3plex