How to Find Graph Patterns and Motifs with the Pattern Matching API

Goal: Use py3plex’s Pattern Matching Builder API to find graph motifs, paths, triangles, and complex subgraph patterns in multilayer networks.

🎯 What You’ll Learn

  • How to express graph patterns using the Q.pattern() builder API

  • How to match edges, paths, triangles, and custom motifs

  • How to apply multilayer-aware constraints (within/between layers)

  • How to filter patterns using node and edge predicates

  • How to extract and analyze pattern match results

πŸ’» Complete Example

See the full executable example: example_pattern_matching.py

Prerequisites:

Why Pattern Matching?

Pattern matching allows you to find specific subgraph structures in your network. Unlike node/edge queries that return individual elements, pattern matching returns complete matches where multiple nodes and edges satisfy structural constraints.

Common Use Cases:

  • Motif Detection: Find triangles, cliques, or other structural patterns

  • Path Analysis: Discover multi-hop connections between nodes

  • Multilayer Patterns: Identify cross-layer relationships

  • Complex Queries: Express β€œfind nodes A, B, C where A connects to B with weight > 0.5, B connects to C, and all are in the social layer”

Comparison to Basic Queries:

Quick Start: Your First Pattern

Let’s start with a simple example: finding all edges in a network.

from py3plex.core import multinet
from py3plex.dsl import Q

# Create a simple network
network = multinet.multi_layer_network(directed=False)
network.add_nodes([
    {'source': 'Alice', 'type': 'social'},
    {'source': 'Bob', 'type': 'social'},
    {'source': 'Charlie', 'type': 'social'},
])
network.add_edges([
    {'source': 'Alice', 'target': 'Bob',
     'source_type': 'social', 'target_type': 'social', 'weight': 1.0},
    {'source': 'Bob', 'target': 'Charlie',
     'source_type': 'social', 'target_type': 'social', 'weight': 2.0},
])

# Find all edges (simple pattern)
pattern = (
    Q.pattern()
     .node("a")              # Define node variable "a"
     .node("b")              # Define node variable "b"
     .edge("a", "b")         # Require edge between a and b
)

result = pattern.execute(network)
print(f"Found {result.count} matches")
print(result.to_pandas())

Output:

Found 4 matches
                 a                  b
0  (Alice, social)      (Bob, social)
1    (Bob, social)    (Alice, social)
2    (Bob, social)  (Charlie, social)
3  (Charlie, social)    (Bob, social)

Notice that undirected edges produce matches in both directions.

Pattern Components

Nodes

Define node variables that will be bound to actual nodes during matching:

# Simple node
Q.pattern().node("a")

# Node with semantic labels (metadata only)
Q.pattern().node("a", labels="person")

# Node with predicates
Q.pattern().node("a").where(degree__gt=3)

# Node with layer constraint
Q.pattern().node("a").where(layer="social")

Node Predicates:

All standard DSL predicates work with pattern nodes:

  • degree__gt=5 β€” degree greater than 5

  • degree__lt=2 β€” degree less than 2

  • layer="social" β€” node must be in social layer

  • Any node attribute: age__gt=30, type="user", etc.

Edges

Define edges between node variables:

# Undirected edge
Q.pattern().edge("a", "b", directed=False)

# Directed edge
Q.pattern().edge("a", "b", directed=True)

# Edge with predicates
Q.pattern().edge("a", "b").where(weight__gt=0.5)

# Edge with optional type
Q.pattern().edge("a", "b", etype="friendship")

Edge Predicates:

Filter edges by attributes:

  • weight__gt=0.5 β€” edge weight greater than 0.5

  • weight__lt=1.0 β€” edge weight less than 1.0

  • Any edge attribute: timestamp__gt=100, color="red", etc.

Paths

Sugar method for defining sequential connections:

# 2-hop path: a β†’ b β†’ c
Q.pattern().path(["a", "b", "c"])

# Equivalent to:
Q.pattern().edge("a", "b").edge("b", "c")

# Directed path
Q.pattern().path(["a", "b", "c"], directed=True)

Triangles

Sugar method for triangle motifs:

# Triangle: a-b, b-c, c-a
Q.pattern().triangle("a", "b", "c")

# Equivalent to:
Q.pattern().edge("a", "b").edge("b", "c").edge("c", "a")

Multilayer-Aware Patterns

One of the most powerful features is specifying layer constraints on edges.

Within Layer

Find edges that exist within a single layer:

# Edges within the social layer
pattern = (
    Q.pattern()
     .node("a").where(layer="social")
     .node("b").where(layer="social")
     .edge("a", "b")
)

Between Layers

Find edges that cross between specific layers:

# Edges between social and work layers
pattern = (
    Q.pattern()
     .node("a").where(layer="social")
     .node("b").where(layer="work")
     .edge("a", "b").between_layers("social", "work")
)

Any Layer

Allow edges in any layer (default behavior):

pattern = Q.pattern().edge("a", "b").any_layer()

Practical Examples

Example 1: Find Triangles with Weight Constraints

Problem: Find triangles where all edges have weight > 1.0.

pattern = (
    Q.pattern()
     .triangle("a", "b", "c")
     .limit(10)
)

result = pattern.execute(network)

# Filter by weight in post-processing
# (Edge predicates during matching coming in future version)
triangles = result.to_pandas()
print(f"Found {len(triangles)} triangles")

Example 2: Find 2-Hop Paths in Specific Layer

Problem: Find all 2-hop paths within the social layer.

pattern = (
    Q.pattern()
     .node("a").where(layer="social")
     .node("b").where(layer="social")
     .node("c").where(layer="social")
     .path(["a", "b", "c"])
     .returning("a", "b", "c")
     .limit(5)
)

result = pattern.execute(network)
df = result.to_pandas()

print(f"Found {result.count} paths")
print(df)

Output:

Found 5 paths
                   a                b                  c
0      (Bob, social)  (Alice, social)      (Bob, social)
1      (Bob, social)  (Alice, social)  (Charlie, social)
2  (Charlie, social)    (Bob, social)    (Alice, social)
...

Example 3: Find High-Degree Nodes with Specific Connections

Problem: Find pairs of high-degree nodes (degree > 2) connected by strong edges (weight > 1.5).

pattern = (
    Q.pattern()
     .node("a").where(layer="social", degree__gt=2)
     .node("b").where(layer="social", degree__gt=2)
     .edge("a", "b").where(weight__gt=1.5)
     .constraint("a != b")  # Ensure different nodes
     .returning("a", "b")
)

result = pattern.execute(network)
print(f"Found {result.count} high-degree pairs")

# Get unique nodes involved
nodes = result.to_nodes(unique=True)
print(f"Unique nodes: {len(nodes)}")

Example 4: Cross-Layer Hub Detection

Problem: Find nodes that appear in multiple layers and have high degree in each.

# This requires multiple pattern queries
# Find nodes in social layer
social_hubs = (
    Q.pattern()
     .node("a").where(layer="social", degree__gt=3)
     .returning("a")
     .execute(network)
)

# Find nodes in work layer
work_hubs = (
    Q.pattern()
     .node("a").where(layer="work", degree__gt=3)
     .returning("a")
     .execute(network)
)

# Find intersection (nodes in both)
social_nodes = set(n[0] for n in social_hubs.to_nodes())
work_nodes = set(n[0] for n in work_hubs.to_nodes())
cross_layer_hubs = social_nodes & work_nodes

print(f"Cross-layer hubs: {cross_layer_hubs}")

Working with Results

The PatternQueryResult object provides multiple ways to access matches.

Accessing Rows

result = pattern.execute(network)

# Get all matches as list of dictionaries
for match in result.rows:
    print(f"Match: {match}")
    # e.g., {'a': ('Alice', 'social'), 'b': ('Bob', 'social')}

Converting to Pandas

df = result.to_pandas()

# Now use pandas operations
print(df.head())
print(df.describe())

# Export to CSV
df.to_csv("pattern_matches.csv", index=False)

Extracting Nodes

# Get all matched nodes (as tuples)
all_nodes = result.to_nodes()

# Get unique nodes only
unique_nodes = result.to_nodes(unique=True)

# Get nodes for specific variables
a_nodes = result.to_nodes(vars=["a"], unique=True)

Extracting Edges

# Get all matched edges as tuples
edges = result.to_edges()

# Each edge is a tuple of node tuples
for src, dst in edges:
    print(f"Edge: {src} -> {dst}")

Creating Subgraphs

# Create induced subgraph of matched nodes
subgraph = result.to_subgraph(network)

# Now use NetworkX operations
print(f"Subgraph has {subgraph.number_of_nodes()} nodes")
print(f"Subgraph has {subgraph.number_of_edges()} edges")

# Create one subgraph per match
subgraphs = result.to_subgraph(network, per_match=True)
print(f"Created {len(subgraphs)} subgraphs")

Filtering Results

# Filter matches using a predicate
filtered = result.filter(lambda match: match["a"][0].startswith("A"))

# Limit number of results
limited = result.limit(10)

Query Execution Planning

Use .explain() to see how the pattern will be executed:

pattern = (
    Q.pattern()
     .node("a").where(degree__gt=3)
     .node("b")
     .edge("a", "b")
)

plan = pattern.explain()
print(f"Root variable: {plan['root_var']}")
print(f"Join order: {[step['var'] for step in plan['join_order']]}")
print(f"Estimated complexity: {plan['estimated_complexity']}")

Example Output:

Root variable: a
Join order: ['a', 'b']
Estimated complexity: 1000

The planner chooses a as the root because it has the most restrictive predicates (degree__gt=3), which will generate fewer candidates.

Performance Tips

  1. Add Predicates Early: The more selective predicates you add to the root variable, the faster the query.

    # Good: Restrictive predicate on first node
    Q.pattern().node("a").where(degree__gt=10).node("b").edge("a", "b")
    
    # Less efficient: No predicates
    Q.pattern().node("a").node("b").edge("a", "b")
    
  2. Use Layer Constraints: Layer filters significantly reduce the search space.

    # Good: Restrict to one layer
    Q.pattern().node("a").where(layer="social")
    
    # Less efficient: Search all layers
    Q.pattern().node("a")
    
  3. Set Limits: For large networks, use .limit() to cap results.

    pattern.limit(100).execute(network)
    
  4. Use Constraints: Add a != b constraints to avoid self-loops if not needed.

    Q.pattern().node("a").node("b").edge("a", "b").constraint("a != b")
    

Advanced Features

Constraints

Global constraints that apply across variables:

# All-different constraint
pattern = (
    Q.pattern()
     .node("a")
     .node("b")
     .node("c")
     .edge("a", "b")
     .edge("b", "c")
     .constraint("a != b")
     .constraint("b != c")
     .constraint("a != c")
)

Returning Specific Variables

By default, all variables are returned. You can select specific ones:

pattern = (
    Q.pattern()
     .node("a")
     .node("b")
     .node("c")
     .path(["a", "b", "c"])
     .returning("a", "c")  # Only return endpoints
)

result = pattern.execute(network)
# Result only contains 'a' and 'c' columns

Execution Options

# Set maximum matches
result = pattern.execute(network, max_matches=100)

# Set timeout (in seconds)
result = pattern.execute(network, timeout=10.0)

Comparison with Other Approaches

Pattern Matching vs. NetworkX

NetworkX Approach

Pattern Matching Approach

nx.triangles(G) Returns triangle count per node

Q.pattern().triangle("a","b","c") Returns actual triangle matches

nx.all_simple_paths(G, source, tgt) Enumerates all paths

Q.pattern().path(["a", "b", "c"]) Finds paths with constraints

Manual loops and filtering

Declarative pattern specification

Advantages of Pattern Matching:

  • Declarative and composable

  • Built-in support for multilayer constraints

  • Integrated with DSL ecosystem

  • Returns results in consistent format (DataFrame, nodes, edges, subgraph)

Pattern Matching vs. DSL Node/Edge Queries

Use node/edge queries when you need individual elements. Use pattern matching when you need to find subgraph structures.

Limitations and Future Work

Current Limitations (v1)

  • Fixed-length paths only: Variable-length paths like (a)-[*1..3]-(b) not yet supported

  • No optional matching: All pattern elements must match

  • No UNION patterns: Cannot express β€œmatch pattern A OR pattern B”

  • Limited aggregation: Aggregation over matches not yet implemented

These features are planned for future versions while maintaining the current clean API.

Best Practices

  1. Start simple: Begin with small patterns and add complexity incrementally

  2. Test on small data: Validate pattern logic on toy networks before scaling up

  3. Use explain(): Check execution plans for complex queries

  4. Combine with DSL: Use pattern matching for structure, DSL for attributes

  5. Document patterns: Add comments explaining complex pattern logic

Troubleshooting

Pattern Returns No Matches

Problem: Your pattern should match but returns 0 results.

Solutions:

  1. Check layer constraints β€” are nodes actually in the layers you specified?

  2. Verify predicates β€” are the thresholds too restrictive?

  3. Use .explain() to see the execution plan

  4. Test with simpler patterns first

  5. Check if your network has the expected structure

# Debug: Check basic statistics first
result = Q.nodes().where(layer="social").execute(network)
print(f"Social layer has {result.count} nodes")

result = Q.edges().where(intralayer=True).execute(network)
print(f"Network has {result.count} intralayer edges")

Too Many Matches

Problem: Pattern returns too many results.

Solutions:

  1. Add more predicates to narrow the search

  2. Use .limit() to cap results

  3. Add layer constraints to restrict search space

  4. Use .constraint("a != b") to avoid duplicates

# Limit results
pattern.limit(100).execute(network)

Slow Performance

Problem: Pattern matching takes too long.

Solutions:

  1. Add selective predicates to the root variable

  2. Use layer constraints to reduce search space

  3. Consider breaking pattern into smaller sub-patterns

  4. Use .explain() to check estimated complexity

  5. Set timeout to avoid runaway queries

# Set timeout
result = pattern.execute(network, timeout=30.0)

Next Steps

  • Explore the Query Zoo: DSL Gallery for Multilayer Analysis for more complex examples

  • Try combining pattern matching with DSL queries

  • Experiment with different motifs (triangles, cliques, paths)

  • Build your own domain-specific pattern library

Tip

Pro Tip: Pattern matching is most powerful when combined with the rest of the DSL. Use patterns to find structures, then use regular DSL queries to compute metrics on the matches!