How to Find Graph Patterns and Motifs with the Pattern Matching APIο
Goal: Use py3plexβs Pattern Matching Builder API to find graph motifs, paths, triangles, and complex subgraph patterns in multilayer networks.
π― What Youβll Learn
How to express graph patterns using the
Q.pattern()builder APIHow to match edges, paths, triangles, and custom motifs
How to apply multilayer-aware constraints (within/between layers)
How to filter patterns using node and edge predicates
How to extract and analyze pattern match results
π» Complete Example
See the full executable example: example_pattern_matching.py
Prerequisites:
A loaded
multi_layer_networkobject (see How to Load and Build Networks)Basic familiarity with the DSL (see How to Query Multilayer Graphs with the SQL-like DSL)
Understanding of graph motifs (edges, paths, triangles, subgraphs)
Why Pattern Matching?ο
Pattern matching allows you to find specific subgraph structures in your network. Unlike node/edge queries that return individual elements, pattern matching returns complete matches where multiple nodes and edges satisfy structural constraints.
Common Use Cases:
Motif Detection: Find triangles, cliques, or other structural patterns
Path Analysis: Discover multi-hop connections between nodes
Multilayer Patterns: Identify cross-layer relationships
Complex Queries: Express βfind nodes A, B, C where A connects to B with weight > 0.5, B connects to C, and all are in the social layerβ
Comparison to Basic Queries:
Quick Start: Your First Patternο
Letβs start with a simple example: finding all edges in a network.
from py3plex.core import multinet
from py3plex.dsl import Q
# Create a simple network
network = multinet.multi_layer_network(directed=False)
network.add_nodes([
{'source': 'Alice', 'type': 'social'},
{'source': 'Bob', 'type': 'social'},
{'source': 'Charlie', 'type': 'social'},
])
network.add_edges([
{'source': 'Alice', 'target': 'Bob',
'source_type': 'social', 'target_type': 'social', 'weight': 1.0},
{'source': 'Bob', 'target': 'Charlie',
'source_type': 'social', 'target_type': 'social', 'weight': 2.0},
])
# Find all edges (simple pattern)
pattern = (
Q.pattern()
.node("a") # Define node variable "a"
.node("b") # Define node variable "b"
.edge("a", "b") # Require edge between a and b
)
result = pattern.execute(network)
print(f"Found {result.count} matches")
print(result.to_pandas())
Output:
Found 4 matches
a b
0 (Alice, social) (Bob, social)
1 (Bob, social) (Alice, social)
2 (Bob, social) (Charlie, social)
3 (Charlie, social) (Bob, social)
Notice that undirected edges produce matches in both directions.
Pattern Componentsο
Nodesο
Define node variables that will be bound to actual nodes during matching:
# Simple node
Q.pattern().node("a")
# Node with semantic labels (metadata only)
Q.pattern().node("a", labels="person")
# Node with predicates
Q.pattern().node("a").where(degree__gt=3)
# Node with layer constraint
Q.pattern().node("a").where(layer="social")
Node Predicates:
All standard DSL predicates work with pattern nodes:
degree__gt=5β degree greater than 5degree__lt=2β degree less than 2layer="social"β node must be in social layerAny node attribute:
age__gt=30,type="user", etc.
Edgesο
Define edges between node variables:
# Undirected edge
Q.pattern().edge("a", "b", directed=False)
# Directed edge
Q.pattern().edge("a", "b", directed=True)
# Edge with predicates
Q.pattern().edge("a", "b").where(weight__gt=0.5)
# Edge with optional type
Q.pattern().edge("a", "b", etype="friendship")
Edge Predicates:
Filter edges by attributes:
weight__gt=0.5β edge weight greater than 0.5weight__lt=1.0β edge weight less than 1.0Any edge attribute:
timestamp__gt=100,color="red", etc.
Pathsο
Sugar method for defining sequential connections:
# 2-hop path: a β b β c
Q.pattern().path(["a", "b", "c"])
# Equivalent to:
Q.pattern().edge("a", "b").edge("b", "c")
# Directed path
Q.pattern().path(["a", "b", "c"], directed=True)
Trianglesο
Sugar method for triangle motifs:
# Triangle: a-b, b-c, c-a
Q.pattern().triangle("a", "b", "c")
# Equivalent to:
Q.pattern().edge("a", "b").edge("b", "c").edge("c", "a")
Multilayer-Aware Patternsο
One of the most powerful features is specifying layer constraints on edges.
Within Layerο
Find edges that exist within a single layer:
# Edges within the social layer
pattern = (
Q.pattern()
.node("a").where(layer="social")
.node("b").where(layer="social")
.edge("a", "b")
)
Between Layersο
Find edges that cross between specific layers:
# Edges between social and work layers
pattern = (
Q.pattern()
.node("a").where(layer="social")
.node("b").where(layer="work")
.edge("a", "b").between_layers("social", "work")
)
Any Layerο
Allow edges in any layer (default behavior):
pattern = Q.pattern().edge("a", "b").any_layer()
Practical Examplesο
Example 1: Find Triangles with Weight Constraintsο
Problem: Find triangles where all edges have weight > 1.0.
pattern = (
Q.pattern()
.triangle("a", "b", "c")
.limit(10)
)
result = pattern.execute(network)
# Filter by weight in post-processing
# (Edge predicates during matching coming in future version)
triangles = result.to_pandas()
print(f"Found {len(triangles)} triangles")
Example 2: Find 2-Hop Paths in Specific Layerο
Problem: Find all 2-hop paths within the social layer.
pattern = (
Q.pattern()
.node("a").where(layer="social")
.node("b").where(layer="social")
.node("c").where(layer="social")
.path(["a", "b", "c"])
.returning("a", "b", "c")
.limit(5)
)
result = pattern.execute(network)
df = result.to_pandas()
print(f"Found {result.count} paths")
print(df)
Output:
Found 5 paths
a b c
0 (Bob, social) (Alice, social) (Bob, social)
1 (Bob, social) (Alice, social) (Charlie, social)
2 (Charlie, social) (Bob, social) (Alice, social)
...
Example 3: Find High-Degree Nodes with Specific Connectionsο
Problem: Find pairs of high-degree nodes (degree > 2) connected by strong edges (weight > 1.5).
pattern = (
Q.pattern()
.node("a").where(layer="social", degree__gt=2)
.node("b").where(layer="social", degree__gt=2)
.edge("a", "b").where(weight__gt=1.5)
.constraint("a != b") # Ensure different nodes
.returning("a", "b")
)
result = pattern.execute(network)
print(f"Found {result.count} high-degree pairs")
# Get unique nodes involved
nodes = result.to_nodes(unique=True)
print(f"Unique nodes: {len(nodes)}")
Example 4: Cross-Layer Hub Detectionο
Problem: Find nodes that appear in multiple layers and have high degree in each.
# This requires multiple pattern queries
# Find nodes in social layer
social_hubs = (
Q.pattern()
.node("a").where(layer="social", degree__gt=3)
.returning("a")
.execute(network)
)
# Find nodes in work layer
work_hubs = (
Q.pattern()
.node("a").where(layer="work", degree__gt=3)
.returning("a")
.execute(network)
)
# Find intersection (nodes in both)
social_nodes = set(n[0] for n in social_hubs.to_nodes())
work_nodes = set(n[0] for n in work_hubs.to_nodes())
cross_layer_hubs = social_nodes & work_nodes
print(f"Cross-layer hubs: {cross_layer_hubs}")
Working with Resultsο
The PatternQueryResult object provides multiple ways to access matches.
Accessing Rowsο
result = pattern.execute(network)
# Get all matches as list of dictionaries
for match in result.rows:
print(f"Match: {match}")
# e.g., {'a': ('Alice', 'social'), 'b': ('Bob', 'social')}
Converting to Pandasο
df = result.to_pandas()
# Now use pandas operations
print(df.head())
print(df.describe())
# Export to CSV
df.to_csv("pattern_matches.csv", index=False)
Extracting Nodesο
# Get all matched nodes (as tuples)
all_nodes = result.to_nodes()
# Get unique nodes only
unique_nodes = result.to_nodes(unique=True)
# Get nodes for specific variables
a_nodes = result.to_nodes(vars=["a"], unique=True)
Extracting Edgesο
# Get all matched edges as tuples
edges = result.to_edges()
# Each edge is a tuple of node tuples
for src, dst in edges:
print(f"Edge: {src} -> {dst}")
Creating Subgraphsο
# Create induced subgraph of matched nodes
subgraph = result.to_subgraph(network)
# Now use NetworkX operations
print(f"Subgraph has {subgraph.number_of_nodes()} nodes")
print(f"Subgraph has {subgraph.number_of_edges()} edges")
# Create one subgraph per match
subgraphs = result.to_subgraph(network, per_match=True)
print(f"Created {len(subgraphs)} subgraphs")
Filtering Resultsο
# Filter matches using a predicate
filtered = result.filter(lambda match: match["a"][0].startswith("A"))
# Limit number of results
limited = result.limit(10)
Query Execution Planningο
Use .explain() to see how the pattern will be executed:
pattern = (
Q.pattern()
.node("a").where(degree__gt=3)
.node("b")
.edge("a", "b")
)
plan = pattern.explain()
print(f"Root variable: {plan['root_var']}")
print(f"Join order: {[step['var'] for step in plan['join_order']]}")
print(f"Estimated complexity: {plan['estimated_complexity']}")
Example Output:
Root variable: a
Join order: ['a', 'b']
Estimated complexity: 1000
The planner chooses a as the root because it has the most restrictive predicates (degree__gt=3), which will generate fewer candidates.
Performance Tipsο
Add Predicates Early: The more selective predicates you add to the root variable, the faster the query.
# Good: Restrictive predicate on first node Q.pattern().node("a").where(degree__gt=10).node("b").edge("a", "b") # Less efficient: No predicates Q.pattern().node("a").node("b").edge("a", "b")
Use Layer Constraints: Layer filters significantly reduce the search space.
# Good: Restrict to one layer Q.pattern().node("a").where(layer="social") # Less efficient: Search all layers Q.pattern().node("a")
Set Limits: For large networks, use
.limit()to cap results.pattern.limit(100).execute(network)
Use Constraints: Add
a != bconstraints to avoid self-loops if not needed.Q.pattern().node("a").node("b").edge("a", "b").constraint("a != b")
Advanced Featuresο
Constraintsο
Global constraints that apply across variables:
# All-different constraint
pattern = (
Q.pattern()
.node("a")
.node("b")
.node("c")
.edge("a", "b")
.edge("b", "c")
.constraint("a != b")
.constraint("b != c")
.constraint("a != c")
)
Returning Specific Variablesο
By default, all variables are returned. You can select specific ones:
pattern = (
Q.pattern()
.node("a")
.node("b")
.node("c")
.path(["a", "b", "c"])
.returning("a", "c") # Only return endpoints
)
result = pattern.execute(network)
# Result only contains 'a' and 'c' columns
Execution Optionsο
# Set maximum matches
result = pattern.execute(network, max_matches=100)
# Set timeout (in seconds)
result = pattern.execute(network, timeout=10.0)
Comparison with Other Approachesο
Pattern Matching vs. NetworkXο
NetworkX Approach |
Pattern Matching Approach |
|---|---|
|
|
|
|
Manual loops and filtering |
Declarative pattern specification |
Advantages of Pattern Matching:
Declarative and composable
Built-in support for multilayer constraints
Integrated with DSL ecosystem
Returns results in consistent format (DataFrame, nodes, edges, subgraph)
Pattern Matching vs. DSL Node/Edge Queriesο
Use node/edge queries when you need individual elements. Use pattern matching when you need to find subgraph structures.
Limitations and Future Workο
Current Limitations (v1)ο
Fixed-length paths only: Variable-length paths like
(a)-[*1..3]-(b)not yet supportedNo optional matching: All pattern elements must match
No UNION patterns: Cannot express βmatch pattern A OR pattern Bβ
Limited aggregation: Aggregation over matches not yet implemented
These features are planned for future versions while maintaining the current clean API.
Best Practicesο
Start simple: Begin with small patterns and add complexity incrementally
Test on small data: Validate pattern logic on toy networks before scaling up
Use explain(): Check execution plans for complex queries
Combine with DSL: Use pattern matching for structure, DSL for attributes
Document patterns: Add comments explaining complex pattern logic
Troubleshootingο
Pattern Returns No Matchesο
Problem: Your pattern should match but returns 0 results.
Solutions:
Check layer constraints β are nodes actually in the layers you specified?
Verify predicates β are the thresholds too restrictive?
Use
.explain()to see the execution planTest with simpler patterns first
Check if your network has the expected structure
# Debug: Check basic statistics first
result = Q.nodes().where(layer="social").execute(network)
print(f"Social layer has {result.count} nodes")
result = Q.edges().where(intralayer=True).execute(network)
print(f"Network has {result.count} intralayer edges")
Too Many Matchesο
Problem: Pattern returns too many results.
Solutions:
Add more predicates to narrow the search
Use
.limit()to cap resultsAdd layer constraints to restrict search space
Use
.constraint("a != b")to avoid duplicates
# Limit results
pattern.limit(100).execute(network)
Slow Performanceο
Problem: Pattern matching takes too long.
Solutions:
Add selective predicates to the root variable
Use layer constraints to reduce search space
Consider breaking pattern into smaller sub-patterns
Use
.explain()to check estimated complexitySet timeout to avoid runaway queries
# Set timeout
result = pattern.execute(network, timeout=30.0)
Next Stepsο
Explore the Query Zoo: DSL Gallery for Multilayer Analysis for more complex examples
Try combining pattern matching with DSL queries
Experiment with different motifs (triangles, cliques, paths)
Build your own domain-specific pattern library
Tip
Pro Tip: Pattern matching is most powerful when combined with the rest of the DSL. Use patterns to find structures, then use regular DSL queries to compute metrics on the matches!