SQL-like DSL for Multilayer Networks

Overview

Py3plex provides a Domain-Specific Language (DSL) for querying and analyzing multilayer networks using SQL-like syntax. This intuitive interface allows users to filter nodes and edges, compute network measures, and perform complex analyses with simple, readable queries.

DSL v2 introduces several major improvements:

  • Python Builder API: Chainable, type-hinted query construction

  • Layer Algebra: Union, difference, and intersection operations on layers

  • Rich Results: Export to pandas, NetworkX, or Arrow formats

  • EXPLAIN Mode: Query execution plans with complexity estimates

  • Parameterized Queries: Safe parameter binding for dynamic queries

  • Better Errors: “Did you mean?” suggestions for typos

Quick Start with Builder API

For the fastest start, see the comprehensive builder API example:

python examples/network_analysis/example_dsl_builder_api.py

This example demonstrates all DSL v2 features with working code and explanations.

The DSL enables you to express complex network queries in a natural, SQL-like language without writing verbose code. For example, instead of manually iterating through nodes and checking conditions, you can write:

String DSL syntax:

execute_query(network, 'SELECT nodes WHERE layer="social" AND degree > 5')

Or using the new Builder API (recommended):

from py3plex.dsl import Q, L

result = (
    Q.nodes()
     .from_layers(L["social"])
     .where(degree__gt=5)
     .execute(network)
)

The DSL is particularly useful for:

  • Interactive network exploration: Quickly test hypotheses and explore network structure

  • Rapid prototyping: Build analysis workflows without extensive coding

  • Educational purposes: Learn network concepts with intuitive queries

  • Production pipelines: Create maintainable, self-documenting analysis code

Basic Syntax

The DSL follows a SQL-inspired syntax:

SELECT target WHERE conditions COMPUTE measures

Where:

  • target: Either nodes or edges

  • conditions: Filtering criteria (optional)

  • measures: Network measures to compute (optional)

DSL Cheat Sheet

Quick Syntax Reference:

SELECT target WHERE conditions COMPUTE measures ORDER BY field LIMIT n

Common Query Patterns:

Task

DSL Query

Select all nodes in a layer

SELECT nodes WHERE layer="social"

Find high-degree nodes

SELECT nodes WHERE degree > 5

Filter by degree range

SELECT nodes WHERE degree >= 2 AND degree <= 10

Compute centrality

SELECT nodes COMPUTE betweenness_centrality

Filter + compute

SELECT nodes WHERE layer="social" COMPUTE degree_centrality

DSL String vs Python Builder API:

DSL String

Python Builder API

'SELECT nodes WHERE layer="social"'

Q.nodes().where(layer="social")

'SELECT nodes WHERE degree > 5'

Q.nodes().where(degree__gt=5)

'SELECT nodes WHERE layer="social" AND degree > 3'

Q.nodes().where(layer="social", degree__gt=3)

Layer union (social OR work)

Q.nodes().from_layers(L["social"] + L["work"])

Layer difference (social NOT bots)

Q.nodes().from_layers(L["social"] - L["bots"])

Order and limit

Q.nodes().compute("degree").order_by("-degree").limit(10)

Export to CSV

Q.nodes().compute("degree").export_csv("output.csv")

Export to JSON

Q.nodes().compute("degree").export_json("output.json")

Quick Start Example

Here’s a complete working example to get you started:

from py3plex.core import multinet
from py3plex.dsl import execute_query, format_result

# Create a multilayer network
network = multinet.multi_layer_network(directed=False)

# Add nodes to different layers
network.add_nodes([
    {'source': 'Alice', 'type': 'social'},
    {'source': 'Bob', 'type': 'social'},
    {'source': 'Charlie', 'type': 'social'},
    {'source': 'Alice', 'type': 'work'},
    {'source': 'Bob', 'type': 'work'},
])

# Add edges
network.add_edges([
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'work', 'target_type': 'work'},
])

# Query 1: Select all nodes in the social layer
result = execute_query(network, 'SELECT nodes WHERE layer="social"')
print(f"Found {result['count']} nodes in social layer")
print(result['nodes'])

# Query 2: Find high-degree nodes
result = execute_query(network, 'SELECT nodes WHERE degree > 1')
print(format_result(result))

# Query 3: Compute centrality for filtered nodes
result = execute_query(
    network,
    'SELECT nodes WHERE layer="social" COMPUTE betweenness_centrality'
)
for node, centrality in result['computed']['betweenness_centrality'].items():
    print(f"{node}: {centrality:.4f}")

Expected Output:

Found 3 nodes in social layer
[('Alice', 'social'), ('Bob', 'social'), ('Charlie', 'social')]

Query: SELECT nodes WHERE degree > 1
Target: nodes
Count: 1

Nodes (showing 1 of 1):
  ('Bob', 'social')

('Alice', 'social'): 0.0000
('Bob', 'social'): 1.0000
('Charlie', 'social'): 0.0000

Query Components

SELECT Clause

Specifies what to select from the network:

SELECT nodes     # Select nodes

Warning

Edge Queries (Experimental): Edge queries (SELECT edges) are currently in development and not fully supported. The DSL primarily focuses on node queries at this time. Use node-based queries for production work.

Note: Current version primarily supports node queries.

WHERE Clause

Filters results based on conditions. Supports:

Layer filtering:

WHERE layer="transport"
WHERE layer="social"

Degree filtering:

WHERE degree > 5
WHERE degree >= 3
WHERE degree <= 10

Logical operators:

WHERE layer="social" AND degree > 3
WHERE layer="work" OR layer="social"
WHERE NOT layer="transport"

Comparison operators:

  • = : Equal to

  • != : Not equal to

  • > : Greater than

  • < : Less than

  • >= : Greater than or equal

  • <= : Less than or equal

COMPUTE Clause

Calculates network measures for filtered nodes:

COMPUTE degree
COMPUTE betweenness_centrality
COMPUTE closeness_centrality
COMPUTE eigenvector_centrality

Supported measures:

  • degree - Node degree

  • degree_centrality - Normalized degree centrality

  • betweenness_centrality - Betweenness centrality

  • closeness_centrality - Closeness centrality

  • eigenvector_centrality - Eigenvector centrality

  • pagerank - PageRank score

  • clustering - Clustering coefficient

Multiple measures:

COMPUTE degree betweenness_centrality closeness_centrality

DSL Syntax Comparison: String vs Builder API

Py3plex provides two complementary ways to query networks: the SQL-like string DSL and the Python builder API (DSL v2). Both execute the same underlying query engine, but offer different developer experiences.

When to Use Each

Use String DSL when:

  • Writing quick, exploratory queries in notebooks

  • Teaching network concepts with familiar SQL syntax

  • Scripting simple one-off analyses

  • Maximum readability for domain experts

Use Builder API when:

  • Building production pipelines

  • Needing IDE autocompletion and type checking

  • Constructing complex, dynamic queries programmatically

  • Exporting results to multiple formats

  • Requiring advanced features (layer algebra, EXPLAIN mode)

Side-by-Side Examples

Here’s the same query implemented both ways:

Example 1: Basic node filtering

from py3plex.core import multinet
from py3plex.dsl import execute_query, Q, L

# Create a small network
network = multinet.multi_layer_network(directed=False)
network.add_nodes([
    {'source': 'Alice', 'type': 'social'},
    {'source': 'Bob', 'type': 'social'},
    {'source': 'Carol', 'type': 'social'},
])
network.add_edges([
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Carol', 'source_type': 'social', 'target_type': 'social'},
])

# STRING DSL: SQL-like syntax
result_string = execute_query(
    network,
    'SELECT nodes WHERE layer="social" AND degree > 1'
)
print(f"String DSL found: {result_string['count']} nodes")

# BUILDER API: Pythonic chainable calls
result_builder = (
    Q.nodes()
     .from_layers(L["social"])
     .where(degree__gt=1)
     .execute(network)
)
print(f"Builder API found: {result_builder.count} nodes")

Expected output:

String DSL found: 1 nodes
Builder API found: 1 nodes

Example 2: Computing centrality with ordering

# STRING DSL: Compute and return all results
result_string = execute_query(
    network,
    'SELECT nodes WHERE layer="social" '
    'COMPUTE betweenness_centrality'
)
# Manual sorting needed
centralities = result_string['computed']['betweenness_centrality']
sorted_nodes = sorted(centralities.items(), key=lambda x: -x[1])
top_3 = sorted_nodes[:3]

# BUILDER API: Ordering and limiting built-in
result_builder = (
    Q.nodes()
     .from_layers(L["social"])
     .compute("betweenness_centrality")
     .order_by("-betweenness_centrality")
     .limit(3)
     .execute(network)
)
# Results already ordered and limited
top_3 = list(result_builder)

Example 3: Layer algebra

# BUILDER API: Advanced layer operations
# Union: nodes in social OR work layer
result = (
    Q.nodes()
     .from_layers(L["social"] + L["work"])
     .execute(network)
)

# Difference: nodes in social BUT NOT bots
result = (
    Q.nodes()
     .from_layers(L["social"] - L["bots"])
     .execute(network)
)

# Intersection: nodes in BOTH social AND work
result = (
    Q.nodes()
     .from_layers(L["social"] & L["work"])
     .execute(network)
)

Note

Layer algebra operations (union, difference, intersection) are only available in the Builder API. The string DSL uses OR/AND operators but these work differently (node-level boolean logic, not layer sets).

Recommendation: Start with the string DSL for learning and exploration. Migrate to the builder API when building production workflows or needing advanced features.

Python Builder API (DSL v2)

DSL v2 introduces a Pythonic builder API that provides type hints, autocompletion, and a chainable interface for constructing queries. The builder API maps directly to the DSL syntax but with Python-native ergonomics.

Basic Usage

Import the builder components:

from py3plex.dsl import Q, L, Param

Create and execute a simple query:

# Select nodes in the social layer
result = Q.nodes().where(layer="social").execute(network)

# Get the count
print(f"Found {result.count} nodes")

# Iterate over results
for node in result:
    print(node)

Query Builder Methods

The Q class provides factory methods to start building queries:

  • Q.nodes() - Start a query for nodes

  • Q.edges() - Start a query for edges

The QueryBuilder returned supports these chainable methods:

Q.nodes()
 .from_layers(layer_expr)    # Filter by layers (optional)
 .where(**conditions)        # Filter by conditions (optional)
 .compute(*measures)         # Compute measures (optional)
 .order_by(*keys)            # Order results (optional)
 .limit(n)                   # Limit results (optional)
 .execute(network, **params) # Execute the query

WHERE Conditions

The where() method supports Django-style field lookups:

Equality:

.where(layer="social")

Comparisons (using double-underscore suffixes):

.where(degree__gt=5)      # degree > 5
.where(degree__gte=5)     # degree >= 5
.where(degree__lt=10)     # degree < 10
.where(degree__lte=10)    # degree <= 10
.where(layer__ne="bots")  # layer != "bots"

Multiple conditions (combined with AND):

.where(layer="social", degree__gt=5)

Special predicates:

.where(intralayer=True)                    # Edges within same layer
.where(interlayer=("social", "work"))     # Edges between specific layers

COMPUTE with Aliases

Compute network measures with optional aliases:

# Single measure
result = Q.nodes().compute("betweenness_centrality").execute(network)

# Single measure with alias
result = Q.nodes().compute("betweenness_centrality", alias="bc").execute(network)

# Multiple measures
result = Q.nodes().compute("degree", "clustering").execute(network)

# Multiple measures with aliases
result = Q.nodes().compute(aliases={
    "betweenness_centrality": "bc",
    "closeness_centrality": "cc"
}).execute(network)

ORDER BY and LIMIT

Sort and limit results:

# Order by degree (ascending)
result = Q.nodes().compute("degree").order_by("degree").execute(network)

# Order descending with - prefix
result = Q.nodes().compute("degree").order_by("-degree").execute(network)

# Order by multiple keys
result = Q.nodes().compute("degree", "clustering").order_by("-degree", "clustering").execute(network)

# Limit results
result = Q.nodes().compute("degree").order_by("-degree").limit(10).execute(network)

Layer Algebra

DSL v2 introduces layer algebra for combining multiple layers. Use the L proxy to reference layers and combine them with operators:

Union (+): Nodes from either layer:

layers = L["social"] + L["work"]
result = Q.nodes().from_layers(layers).execute(network)

Difference (-): Nodes from one layer but not another:

layers = L["social"] - L["bots"]
result = Q.nodes().from_layers(layers).execute(network)

Intersection (&): Nodes in both layers:

layers = L["social"] & L["work"]
result = Q.nodes().from_layers(layers).execute(network)

Complex expressions:

# (social OR work) - bots
layers = L["social"] + L["work"] - L["bots"]
result = Q.nodes().from_layers(layers).execute(network)

Complete Builder Example

Here’s a comprehensive example using the builder API:

from py3plex.core import multinet
from py3plex.dsl import Q, L

# Create network
network = multinet.multi_layer_network(directed=False)
network.add_nodes([
    {'source': 'Alice', 'type': 'social'},
    {'source': 'Bob', 'type': 'social'},
    {'source': 'Charlie', 'type': 'social'},
    {'source': 'Dave', 'type': 'work'},
    {'source': 'Eve', 'type': 'work'},
])
network.add_edges([
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Alice', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Dave', 'target': 'Eve', 'source_type': 'work', 'target_type': 'work'},
])

# Query using builder API
result = (
    Q.nodes()
     .from_layers(L["social"] + L["work"])
     .where(degree__gt=0)
     .compute("betweenness_centrality", alias="bc")
     .order_by("-bc")
     .limit(3)
     .execute(network)
)

# Access results
print(f"Top {result.count} nodes by betweenness centrality:")
df = result.to_pandas()
print(df)

QueryResult Object

The builder API returns a QueryResult object with rich export capabilities:

Properties:

result.target    # 'nodes' or 'edges'
result.items     # List of node/edge tuples
result.count     # Number of items
result.nodes     # Alias for items (when target='nodes')
result.edges     # Alias for items (when target='edges')
result.attributes  # Computed measure values

Export methods:

# Export to pandas DataFrame
df = result.to_pandas()

# Export to NetworkX subgraph
G = result.to_networkx(network)

# Export to Apache Arrow table
table = result.to_arrow()

# Export to dictionary
d = result.to_dict()

Iteration:

for node in result:
    print(node)

# Length
print(len(result))

Declarative File Exports

DSL v2 supports declarative file exports, allowing you to export query results to files as part of the query pipeline itself. The export is a side-effect - the query still returns a QueryResult object to Python.

Basic CSV Export:

from py3plex.dsl import Q, L

# Export to CSV file
result = (
    Q.nodes()
     .from_layers(L["social"])
     .compute("degree")
     .export_csv("results/social_degree.csv")
     .execute(network)
)

# Result is still available in Python
print(f"Exported {result.count} nodes")

JSON Export with Options:

# Export to JSON with custom format
result = (
    Q.nodes()
     .compute("degree", "betweenness_centrality")
     .order_by("degree", desc=True)
     .limit(10)
     .export_json(
         "results/top_nodes.json",
         columns=["id", "degree", "betweenness_centrality"],
         orient="records"
     )
     .execute(network)
)

Generic Export Method:

# Export with explicit format specification
result = (
    Q.nodes()
     .from_layers(L["social"])
     .compute("degree")
     .export(
         path="results/output.csv",
         fmt="csv",
         columns=["id", "degree"],
         delimiter=";"
     )
     .execute(network)
)

Supported Export Formats:

  • csv - Comma-separated values (default)

  • json - JSON format with various orientations

  • tsv - Tab-separated values

Export Options:

CSV/TSV Options:

  • delimiter - Field delimiter (default: , for CSV, \t for TSV)

  • columns - List of columns to include/order

JSON Options:

  • orient - JSON orientation (records, columns, split, index, values)

  • indent - Indentation level (default: 2)

  • columns - List of columns to include/order

Column Selection:

# Export only specific columns in specific order
result = (
    Q.nodes()
     .compute("degree", "betweenness_centrality", "clustering")
     .export_csv(
         "results/selected.csv",
         columns=["id", "degree"]  # Only export ID and degree
     )
     .execute(network)
)

Complete Export Example:

from py3plex.core import multinet
from py3plex.dsl import Q, L

# Create network
network = multinet.multi_layer_network(directed=False)
# ... add nodes and edges ...

# Export social layer analysis to CSV
(
    Q.nodes()
     .from_layers(L["social"])
     .compute("degree", "betweenness_centrality")
     .order_by("degree", desc=True)
     .export_csv("results/social_analysis.csv")
     .execute(network)
)

# Export work layer analysis to JSON
(
    Q.nodes()
     .from_layers(L["work"])
     .compute("degree")
     .export_json("results/work_analysis.json", orient="records")
     .execute(network)
)

# Export combined analysis with custom delimiter
(
    Q.nodes()
     .compute("degree")
     .export_csv("results/all_nodes.tsv", delimiter="\t")
     .execute(network)
)

The export functionality automatically creates parent directories if needed and provides clear error messages for unsupported formats or file I/O issues.

See also

For a comprehensive example with 7 different usage patterns, see: examples/network_analysis/example_dsl_export.py

EXPLAIN Mode

Get a query execution plan without actually running the query:

from py3plex.dsl import Q

# Build a query
q = Q.nodes().where(layer="social").compute("betweenness_centrality")

# Get execution plan
plan = q.explain().execute(network)

# Inspect the plan
for step in plan.steps:
    print(f"{step.description} ({step.estimated_complexity})")

# Check for warnings
for warning in plan.warnings:
    print(f"Warning: {warning}")

The execution plan includes:

  • Step-by-step breakdown of query execution

  • Estimated time complexity for each step

  • Warnings for expensive operations (e.g., betweenness centrality on large graphs)

Parameterized Queries

Use Param to create queries with placeholders that are bound at execution time:

from py3plex.dsl import Q, Param

# Create a reusable query template
q = Q.nodes().where(layer="social", degree__gt=Param.int("min_degree"))

# Execute with different parameters
result1 = q.execute(network, min_degree=5)
result2 = q.execute(network, min_degree=10)

Parameter types:

  • Param.int("name") - Integer parameter

  • Param.float("name") - Float parameter

  • Param.str("name") - String parameter

  • Param.ref("name") - Untyped parameter

Convert Builder to DSL String

Convert a builder query back to DSL string format:

q = Q.nodes().where(layer="social", degree__gt=5).compute("degree").limit(10)

# Get DSL string
dsl_string = q.to_dsl()
print(dsl_string)
# Output: SELECT nodes WHERE layer = "social" AND degree > 5 COMPUTE degree LIMIT 10

This is useful for:

  • Debugging queries

  • Logging and auditing

  • Serializing queries for later use

Error Handling with Suggestions

DSL v2 provides helpful error messages with “Did you mean?” suggestions:

from py3plex.dsl import Q, UnknownMeasureError

try:
    # Typo in measure name
    result = Q.nodes().compute("betweenes").execute(network)
except UnknownMeasureError as e:
    print(e)
    # Output: Unknown measure 'betweenes'. Did you mean 'betweenness'?
    #         Known measures: betweenness_centrality, closeness_centrality, ...

Measure Registry

DSL v2 includes a centralized registry for network measures. View available measures:

from py3plex.dsl import measure_registry

# List all measures
print(measure_registry.list_measures())

# Check if a measure exists
if measure_registry.has("degree"):
    print("degree is available")

# Get measure description
desc = measure_registry.get_description("betweenness_centrality")
print(desc)

Example Queries

Basic Queries

Select all nodes in a layer:

result = execute_query(network, 'SELECT nodes WHERE layer="social"')

Select high-degree nodes:

result = execute_query(network, 'SELECT nodes WHERE degree > 5')

Select all nodes (no filter):

result = execute_query(network, 'SELECT nodes')

Complex Queries

Combine multiple conditions:

# Nodes in transport layer with high degree
result = execute_query(
    network,
    'SELECT nodes WHERE layer="transport" AND degree > 5'
)

Use OR operator:

# Nodes in either social or work layer
result = execute_query(
    network,
    'SELECT nodes WHERE layer="social" OR layer="work"'
)

Degree range filtering:

# Nodes with moderate degree
result = execute_query(
    network,
    'SELECT nodes WHERE degree >= 2 AND degree <= 5'
)

Analytical Queries

Compute centrality for a layer:

result = execute_query(
    network,
    'SELECT nodes WHERE layer="transport" COMPUTE betweenness_centrality'
)

# Access computed values
for node, centrality in result['computed']['betweenness_centrality'].items():
    print(f"{node}: {centrality}")

Multiple measures for filtered nodes:

result = execute_query(
    network,
    'SELECT nodes WHERE degree > 3 COMPUTE degree_centrality closeness_centrality'
)

Working with Results

The execute_query function returns a dictionary containing:

  • query: Original query string

  • target: Query target (nodes or edges)

  • nodes or edges: List of selected items

  • count: Number of items returned

  • computed: Dictionary of computed measures (if COMPUTE used)

Example:

result = execute_query(network, 'SELECT nodes WHERE layer="social"')

# Access results
print(f"Found {result['count']} nodes")
for node in result['nodes']:
    print(node)

# If COMPUTE was used
if 'computed' in result:
    for measure, values in result['computed'].items():
        print(f"{measure}:")
        for node, value in values.items():
            print(f"  {node}: {value}")

Example Output:

Found 3 nodes
('Alice', 'social')
('Bob', 'social')
('Charlie', 'social')

Formatting Results

Use format_result for human-readable output:

from py3plex.dsl import format_result

result = execute_query(network, 'SELECT nodes WHERE degree > 3')
print(format_result(result, limit=10))

Convenience Functions

The DSL module provides convenience functions for common operations:

Select nodes by layer:

from py3plex.dsl import select_nodes_by_layer

nodes = select_nodes_by_layer(network, 'transport')

Select high-degree nodes:

from py3plex.dsl import select_high_degree_nodes

# All high-degree nodes
nodes = select_high_degree_nodes(network, min_degree=5)

# High-degree nodes in specific layer
nodes = select_high_degree_nodes(network, min_degree=5, layer='social')

Compute centrality for a layer:

from py3plex.dsl import compute_centrality_for_layer

centrality = compute_centrality_for_layer(
    network,
    layer='transport',
    centrality='betweenness_centrality'
)

Use Cases

Hub Identification

Find important nodes in each layer:

for layer in ['social', 'work', 'transport']:
    result = execute_query(
        network,
        f'SELECT nodes WHERE layer="{layer}" AND degree > 5'
    )
    print(f"Hubs in {layer}: {result['count']}")

Layer Comparison

Compare network properties across layers:

layers = ['social', 'work', 'transport']

for layer in layers:
    result = execute_query(
        network,
        f'SELECT nodes WHERE layer="{layer}" COMPUTE degree'
    )
    degrees = result['computed']['degree']
    avg_degree = sum(degrees.values()) / len(degrees)
    print(f"{layer} average degree: {avg_degree:.2f}")

Node Importance Ranking

Rank nodes by multiple measures:

result = execute_query(
    network,
    'SELECT nodes WHERE layer="social" COMPUTE betweenness_centrality degree_centrality'
)

# Combine measures for ranking
scores = {}
for node in result['nodes']:
    betweenness = result['computed']['betweenness_centrality'].get(node, 0)
    degree_cent = result['computed']['degree_centrality'].get(node, 0)
    scores[node] = betweenness + degree_cent

# Show top nodes
for node, score in sorted(scores.items(), key=lambda x: x[1], reverse=True)[:5]:
    print(f"{node}: {score:.4f}")

Network Filtering

Create subnetworks based on queries:

# Get high-degree nodes
result = execute_query(network, 'SELECT nodes WHERE degree > 5')
high_degree_nodes = result['nodes']

# Create subnetwork with these nodes
subnetwork = network.subnetwork(
    [node for node in high_degree_nodes],
    subset_by='node_layer_names'
)

Error Handling

The DSL raises specific exceptions for different error types.

Legacy Error Types

For string DSL queries:

from py3plex.dsl import execute_query, DSLSyntaxError, DSLExecutionError

try:
    result = execute_query(network, 'SELECT nodes WHERE invalid_condition')
except DSLSyntaxError as e:
    print(f"Syntax error: {e}")
except DSLExecutionError as e:
    print(f"Execution error: {e}")

DSL v2 Error Types

For builder API queries, more specific error types are available:

from py3plex.dsl import (
    Q,
    DslError,              # Base error class
    DslSyntaxError,        # Syntax errors
    DslExecutionError,     # Execution errors
    UnknownAttributeError, # Unknown attribute name
    UnknownMeasureError,   # Unknown measure name
    UnknownLayerError,     # Unknown layer name
    ParameterMissingError, # Missing parameter
    TypeMismatchError,     # Type mismatch
)

try:
    result = Q.nodes().compute("unknwon_measure").execute(network)
except UnknownMeasureError as e:
    print(e)  # Includes "Did you mean?" suggestion
except DslError as e:
    print(f"DSL error: {e}")

All DSL v2 errors include:

  • Original query context (when available)

  • Line and column information for syntax errors

  • “Did you mean?” suggestions using Levenshtein distance

Common syntax errors:

  • Missing SELECT keyword

  • Invalid target (not ‘nodes’ or ‘edges’)

  • Malformed conditions

  • Unknown operators

  • Invalid measure names

Common DSL Errors

Here’s an example of a common error and how to fix it:

Malformed Query (missing quotes around layer name):

# Wrong - missing quotes around layer name
result = execute_query(network, 'SELECT nodes WHERE layer=social')

Error:

DslSyntaxError: Invalid condition at position 27: expected quoted string for layer value.
Hint: Use layer="social" instead of layer=social

Fix:

# Correct - layer name is quoted
result = execute_query(network, 'SELECT nodes WHERE layer="social"')

Unknown measure name:

result = Q.nodes().compute("betweenes").execute(network)
# UnknownMeasureError: Unknown measure 'betweenes'. Did you mean 'betweenness_centrality'?

See the API Documentation for complete details on DSL exceptions and error types.

Complete Working Examples

This section provides complete, runnable examples demonstrating various DSL features with expected outputs.

Example 1: Basic Network Querying

Create a simple social network and query it:

from py3plex.core import multinet
from py3plex.dsl import execute_query, format_result

# Create network
network = multinet.multi_layer_network(directed=False)

# Add nodes in social layer
network.add_nodes([
    {'source': 'Alice', 'type': 'social'},
    {'source': 'Bob', 'type': 'social'},
    {'source': 'Charlie', 'type': 'social'},
    {'source': 'David', 'type': 'social'},
])

# Add edges
network.add_edges([
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Charlie', 'target': 'David', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Alice', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
])

# Query all nodes
result = execute_query(network, 'SELECT nodes WHERE layer="social"')
print(format_result(result))

# Find high-degree nodes
result = execute_query(network, 'SELECT nodes WHERE degree > 1')
print(f"High-degree nodes: {result['count']}")

Expected Output:

Query: SELECT nodes WHERE layer="social"
Target: nodes
Count: 4

Nodes (showing 4 of 4):
  ('Alice', 'social')
  ('Bob', 'social')
  ('Charlie', 'social')
  ('David', 'social')

High-degree nodes: 3

Example 2: Multilayer Network Analysis

Analyze a network with multiple layers:

from py3plex.core import multinet
from py3plex.dsl import execute_query

# Create multilayer network
network = multinet.multi_layer_network(directed=False)

# Add nodes to multiple layers
nodes = []
for person in ['Alice', 'Bob', 'Charlie']:
    for layer in ['social', 'work', 'family']:
        nodes.append({'source': person, 'type': layer})
network.add_nodes(nodes)

# Add edges in different layers
edges = [
    # Social connections
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    # Work connections
    {'source': 'Alice', 'target': 'Charlie', 'source_type': 'work', 'target_type': 'work'},
    # Family connections
    {'source': 'Alice', 'target': 'Charlie', 'source_type': 'family', 'target_type': 'family'},
]
network.add_edges(edges)

# Compare layers
for layer in ['social', 'work', 'family']:
    result = execute_query(network, f'SELECT nodes WHERE layer="{layer}"')
    print(f"{layer} layer: {result['count']} nodes")

    # Compute degree for this layer
    result = execute_query(network, f'SELECT nodes WHERE layer="{layer}" COMPUTE degree')
    degrees = result['computed']['degree']
    avg_degree = sum(degrees.values()) / len(degrees) if degrees else 0
    print(f"  Average degree: {avg_degree:.2f}")

Expected Output:

social layer: 3 nodes
  Average degree: 1.33
work layer: 3 nodes
  Average degree: 0.67
family layer: 3 nodes
  Average degree: 0.67

Example 3: Hub Identification

Find and rank important nodes using multiple centrality measures:

from py3plex.core import multinet
from py3plex.dsl import execute_query

# Create network
network = multinet.multi_layer_network(directed=False)

# Add nodes
network.add_nodes([
    {'source': 'Alice', 'type': 'social'},
    {'source': 'Bob', 'type': 'social'},
    {'source': 'Charlie', 'type': 'social'},
    {'source': 'David', 'type': 'social'},
    {'source': 'Eve', 'type': 'social'},
])

# Add edges creating a star network centered on Bob
network.add_edges([
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'David', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Eve', 'source_type': 'social', 'target_type': 'social'},
])

# Find high-degree nodes in social layer
result = execute_query(
    network,
    'SELECT nodes WHERE layer="social" AND degree >= 2'
)
print(f"Found {result['count']} hub nodes")

# Compute multiple centrality measures for hubs
result = execute_query(
    network,
    'SELECT nodes WHERE layer="social" AND degree >= 2 '
    'COMPUTE betweenness_centrality closeness_centrality degree_centrality'
)

# Rank nodes by betweenness centrality
if 'computed' in result and 'betweenness_centrality' in result['computed']:
    centralities = result['computed']['betweenness_centrality']
    sorted_nodes = sorted(centralities.items(), key=lambda x: x[1], reverse=True)

    print("\nTop nodes by betweenness centrality:")
    for node, centrality in sorted_nodes[:5]:
        print(f"  {node}: {centrality:.4f}")

Expected Output:

Found 1 hub nodes

Top nodes by betweenness centrality:
  ('Bob', 'social'): 1.0000

Example 4: Layer Comparison Workflow

Compare network structure across different layers:

from py3plex.core import multinet
from py3plex.dsl import execute_query

# Create multilayer network
network = multinet.multi_layer_network(directed=False)

# Add nodes to multiple layers
people = ['Alice', 'Bob', 'Charlie', 'David']
nodes = []
for person in people:
    for layer in ['social', 'work', 'transport']:
        nodes.append({'source': person, 'type': layer})
network.add_nodes(nodes)

# Add edges in different layers
network.add_edges([
    # Social (well connected)
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Bob', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Charlie', 'target': 'David', 'source_type': 'social', 'target_type': 'social'},
    {'source': 'Alice', 'target': 'Charlie', 'source_type': 'social', 'target_type': 'social'},
    # Work (moderately connected)
    {'source': 'Alice', 'target': 'Bob', 'source_type': 'work', 'target_type': 'work'},
    {'source': 'Bob', 'target': 'Charlie', 'source_type': 'work', 'target_type': 'work'},
    # Transport (sparsely connected)
    {'source': 'Alice', 'target': 'David', 'source_type': 'transport', 'target_type': 'transport'},
])

layers = ['social', 'work', 'transport']
layer_stats = {}

for layer in layers:
    # Get nodes in this layer
    result = execute_query(network, f'SELECT nodes WHERE layer="{layer}"')
    node_count = result['count']

    # Compute centrality measures
    result = execute_query(
        network,
        f'SELECT nodes WHERE layer="{layer}" COMPUTE betweenness_centrality'
    )

    if 'computed' in result and 'betweenness_centrality' in result['computed']:
        centralities = result['computed']['betweenness_centrality']
        avg_centrality = sum(centralities.values()) / len(centralities) if centralities else 0
        max_centrality = max(centralities.values()) if centralities else 0

        layer_stats[layer] = {
            'nodes': node_count,
            'avg_centrality': avg_centrality,
            'max_centrality': max_centrality
        }

# Print comparison
print("\nLayer Comparison:")
print(f"{'Layer':<12} {'Nodes':<8} {'Avg Centrality':<16} {'Max Centrality':<16}")
print("-" * 55)
for layer, stats in layer_stats.items():
    print(f"{layer:<12} {stats['nodes']:<8} {stats['avg_centrality']:<16.4f} {stats['max_centrality']:<16.4f}")

Expected Output:

Layer Comparison:
Layer        Nodes    Avg Centrality   Max Centrality
-------------------------------------------------------
social       4        0.1667           0.5000
work         4        0.0833           0.3333
transport    4        0.0000           0.0000

Example Files

Additional complete examples are available in the repository:

  • examples/network_analysis/example_dsl_builder_api.py - Comprehensive builder API examples (recommended starting point for DSL v2)

  • examples/network_analysis/example_dsl_queries.py - Basic DSL usage with string syntax

  • examples/network_analysis/example_dsl_advanced.py - Advanced queries and transportation network analysis

  • examples/network_analysis/example_dsl_community_detection.py - Community detection with DSL

  • examples/cli/example_3_dsl_queries.sh - CLI usage examples for both string and builder syntax

Run these examples:

# Recommended: Comprehensive builder API examples
python examples/network_analysis/example_dsl_builder_api.py

# String DSL examples
python examples/network_analysis/example_dsl_queries.py

# Advanced queries
python examples/network_analysis/example_dsl_advanced.py

API Reference

Main Functions

def execute_query(network: Any, query: str) -> Dict[str, Any]:
    """Execute a DSL query on a multilayer network.

    Args:
        network: Multilayer network object
        query: DSL query string

    Returns:
        Dictionary with 'nodes'/'edges', 'count', and optionally 'computed'
    """

def format_result(result: Dict[str, Any], limit: int = 10) -> str:
    """Format query result as human-readable string.

    Args:
        result: Result from execute_query
        limit: Maximum items to display

    Returns:
        Formatted string
    """

Convenience Functions

def select_nodes_by_layer(network: Any, layer: str) -> List[Any]:
    """Select all nodes in a specific layer."""

def select_high_degree_nodes(network: Any, min_degree: int,
                             layer: Optional[str] = None) -> List[Any]:
    """Select nodes with degree above threshold."""

def compute_centrality_for_layer(network: Any, layer: str,
                                 centrality: str = 'betweenness_centrality') -> Dict[Any, float]:
    """Compute centrality for all nodes in a layer."""

DSL v2 Builder API

class Q:
    """Query factory for creating QueryBuilder instances."""

    @staticmethod
    def nodes() -> QueryBuilder:
        """Create a query builder for nodes."""

    @staticmethod
    def edges() -> QueryBuilder:
        """Create a query builder for edges."""

class QueryBuilder:
    """Chainable query builder."""

    def from_layers(self, layer_expr: LayerExprBuilder) -> QueryBuilder:
        """Filter by layers using layer algebra."""

    def where(self, **kwargs) -> QueryBuilder:
        """Add WHERE conditions."""

    def compute(self, *measures: str, alias: str = None) -> QueryBuilder:
        """Add measures to compute."""

    def order_by(self, *keys: str, desc: bool = False) -> QueryBuilder:
        """Add ORDER BY clause."""

    def limit(self, n: int) -> QueryBuilder:
        """Limit number of results."""

    def explain(self) -> ExplainQuery:
        """Create EXPLAIN query for execution plan."""

    def execute(self, network: Any, **params) -> QueryResult:
        """Execute the query."""

    def to_ast(self) -> Query:
        """Export as AST Query object."""

    def to_dsl(self) -> str:
        """Export as DSL string."""

class QueryResult:
    """Rich result object from query execution."""

    target: str       # 'nodes' or 'edges'
    items: List[Any]  # List of node/edge tuples
    count: int        # Number of items
    attributes: Dict  # Computed measure values

    def to_pandas(self):
        """Export to pandas DataFrame."""

    def to_networkx(self, network=None):
        """Export to NetworkX subgraph."""

    def to_arrow(self):
        """Export to Apache Arrow table."""

    def to_dict(self) -> Dict[str, Any]:
        """Export as dictionary."""

class L:
    """Layer proxy for layer algebra."""

    def __getitem__(self, name: str) -> LayerExprBuilder:
        """Create layer expression: L['social']"""

class Param:
    """Factory for parameter references."""

    @staticmethod
    def int(name: str) -> ParamRef:
        """Create integer parameter."""

    @staticmethod
    def float(name: str) -> ParamRef:
        """Create float parameter."""

    @staticmethod
    def str(name: str) -> ParamRef:
        """Create string parameter."""

DSL-Based Dynamics Simulation

The py3plex DSL extends beyond network queries to support declarative dynamics simulation on multilayer networks. This section demonstrates how to use the dynamics DSL for epidemic modeling and other dynamical processes.

For detailed documentation and formalism, see ../../../book/part3_dsl/chapter10_advanced_queries_workflows.

Quickstart

The dynamics DSL uses a builder API similar to the query DSL:

from py3plex.dynamics import D, SIS
from py3plex.core import multinet

# Create network
network = multinet.multi_layer_network()
# ... add nodes and edges ...

# Define SIS simulation
sim = (
    D.process(SIS(beta=0.3, mu=0.1))  # Transmission and recovery rates
     .initial(infected=0.05)           # 5% initially infected
     .steps(100)                       # Run for 100 time steps
     .measure("prevalence", "incidence")  # Track measures
     .replicates(10)                   # Run 10 independent simulations
     .seed(42)                         # For reproducibility
)

# Execute simulation
result = sim.run(network)

# Access results
print(f"Mean final prevalence: {result.data['prevalence'][:, -1].mean():.3f}")

# Convert to pandas for analysis
df_dict = result.to_pandas()
prevalence_df = df_dict['prevalence']

Available Processes

The dynamics module supports several built-in processes:

  • SIS - Susceptible-Infected-Susceptible (endemic diseases)

  • SIR - Susceptible-Infected-Recovered (epidemic diseases with immunity)

  • RandomWalk - Random walk dynamics on networks

Each process has configurable parameters:

from py3plex.dynamics import SIS, SIR, RandomWalk

# SIS with transmission rate β=0.3, recovery rate μ=0.1
SIS(beta=0.3, mu=0.1)

# SIR with transmission rate β=0.4, recovery rate γ=0.15
SIR(beta=0.4, gamma=0.15)

# Random walk with teleportation probability
RandomWalk(teleport=0.05)

Multilayer Dynamics

The dynamics DSL seamlessly integrates with layer selection:

from py3plex.dsl import L

# Simulate on specific layers
sim = (
    D.process(SIS(beta=0.25, mu=0.08))
     .on_layers(L["offline"] + L["online"])  # Select layers using layer algebra
     .coupling(node_replicas="strong")       # Nodes share states across layers
     .initial(infected=0.1)
     .steps(120)
     .measure("prevalence", "prevalence_by_layer")
     .replicates(15)
)

result = sim.run(multilayer_network)

Integration with Query DSL

Use query DSL to specify targeted initial conditions:

from py3plex.dsl import Q

# Start infection at high-degree nodes (hubs)
sim = (
    D.process(SIS(beta=0.35, mu=0.12))
     .initial(
         infected=Q.nodes().where(degree__gte=5)  # Query selects hubs
     )
     .steps(100)
     .measure("prevalence")
     .replicates(10)
)

result = sim.run(network)

This powerful combination allows precise control over initial conditions based on network structure, centrality, or any other computable property.

Result Analysis

The SimulationResult object provides rich analysis capabilities:

# Get summary statistics
summary = result.summary()
print(summary)

# Plot time series with confidence intervals
import matplotlib.pyplot as plt
result.plot("prevalence")
plt.show()

# Export to pandas for custom analysis
df_dict = result.to_pandas()
prevalence_df = df_dict['prevalence']

# Compute mean trajectory across replicates
mean_trajectory = (
    prevalence_df
    .groupby('t')['value']
    .agg(['mean', 'std'])
)

Complete Example

See examples/network_analysis/example_dsl_dynamics.py for a comprehensive example demonstrating:

  • SIS and SIR epidemic simulations

  • Multilayer dynamics with coupling

  • Random walk dynamics

  • Query DSL integration for initial conditions

  • Parameter comparison across simulations

Run the example:

python examples/network_analysis/example_dsl_dynamics.py

Further Reading

For mathematical formalism and detailed documentation:

  • ../../../book/part3_dsl/chapter10_advanced_queries_workflows - Complete dynamics DSL guide with formalism

  • examples/network_analysis/example_dsl_dynamics.py - Comprehensive dynamics examples

  • examples/advanced/example_dynamics_core.py - Core dynamics classes (OOP-style)

  • SIR Epidemic Simulator on Multiplex Graphs - SIR multiplex simulator documentation

Limitations and Future Work

Current limitations:

  • Edge queries are not yet fully supported

  • Complex nested conditions require multiple queries

  • Limited to NetworkX-based measures

  • No aggregation functions (SUM, AVG, etc.)

Planned enhancements:

  • Full edge query support

  • Nested subqueries

  • Aggregation operators

  • Custom measure registration

  • Query optimization

  • Save/load query results

Best Practices

1. Choose the Right API

  • Builder API (Q.nodes()): Recommended for production code, complex queries, and when type hints are important

  • String DSL: Good for simple queries, interactive exploration, and when learning the syntax

2. Start simple, build incrementally

Begin with basic queries and add complexity step by step:

# Start simple
result = Q.nodes().execute(network)

# Add filtering
result = Q.nodes().where(layer="social").execute(network)

# Add computation
result = Q.nodes().where(layer="social").compute("degree").execute(network)

# Add ordering and limiting
result = (
    Q.nodes()
    .where(layer="social")
    .compute("degree")
    .order_by("-degree")
    .limit(10)
    .execute(network)
)

3. Use parameterized queries for reusability

Create reusable query templates with Param:

# Define once
top_nodes_query = (
    Q.nodes()
    .where(layer=Param.str("layer_name"), degree__gt=Param.int("threshold"))
    .compute("betweenness_centrality")
    .order_by("-betweenness_centrality")
    .limit(Param.int("top_n"))
)

# Execute many times with different parameters
social_hubs = top_nodes_query.execute(network, layer_name="social", threshold=5, top_n=10)
work_hubs = top_nodes_query.execute(network, layer_name="work", threshold=3, top_n=20)

4. Use EXPLAIN for expensive queries

Before running expensive queries on large networks, check the execution plan:

q = Q.nodes().compute("betweenness_centrality")
plan = q.explain().execute(network)

for step in plan.steps:
    print(f"{step.description} - {step.estimated_complexity}")

if plan.warnings:
    print("Warnings:", plan.warnings)

5. Validate data and check results

Always inspect result counts and samples before processing large result sets:

result = Q.nodes().where(degree__gt=5).execute(network)

print(f"Found {result.count} nodes")
if result.count > 0:
    print(f"Sample: {result.items[:3]}")
    # Process results...

6. Choose appropriate export format

  • to_pandas(): Best for data analysis, statistical operations, and visualization

  • to_networkx(): Best for further NetworkX operations or subgraph analysis

  • to_arrow(): Best for large datasets, columnar operations, or data interchange

  • to_dict(): Best for serialization, API responses, or custom processing

7. Handle errors gracefully

Use try-except blocks and leverage error messages:

from py3plex.dsl import Q, UnknownMeasureError

try:
    result = Q.nodes().compute("my_measure").execute(network)
except UnknownMeasureError as e:
    print(f"Measure not found: {e}")
    # Fallback logic or use suggested measure

8. Performance optimization

For large networks, follow these guidelines:

  • Filter by layer first to reduce search space

  • Use limit() to restrict result size when you don’t need all results

  • Cache computed measures if reusing them multiple times

  • Consider using degree instead of more expensive centrality measures for initial filtering

# Less efficient - computes centrality for all nodes
result = Q.nodes().compute("betweenness_centrality").order_by("-betweenness_centrality").limit(10).execute(network)

# More efficient - filter by degree first
result = Q.nodes().where(degree__gt=5).compute("betweenness_centrality").order_by("-betweenness_centrality").limit(10).execute(network)

Performance Considerations

  • Computing centrality measures can be expensive on large networks

  • Filter by layer first to reduce search space

  • Cache computed measures if reusing them

  • Consider using convenience functions for better performance

  • Pre-compute measures and store in node attributes for repeated use

Example performance optimization:

# Less efficient - computes centrality multiple times
for threshold in [3, 5, 7]:
    result = execute_query(
        network,
        f'SELECT nodes WHERE degree > {threshold} COMPUTE betweenness_centrality'
    )

# More efficient - compute once, filter in post-processing
result = execute_query(
    network,
    'SELECT nodes COMPUTE betweenness_centrality'
)
centralities = result['computed']['betweenness_centrality']

for threshold in [3, 5, 7]:
    high_degree = [n for n in result['nodes']
                  if network.core_network.degree(n) > threshold]

Further Reading

See Also

  • Dplyr-style Chainable Graph Operations - Dplyr-style chainable graph operations (alternative API for complex transformations)

  • NetworkX documentation for centrality measures

  • Examples directory for complete use cases

  • API documentation for detailed function signatures