How to Query Multilayer Graphs with the SQL-like DSL
Goal: Use py3plex’s SQL-inspired Domain-Specific Language (DSL) to query, filter, and analyze multilayer networks. The DSL is a first-class query language specifically designed for multilayer graph structures, providing both string syntax for interactive exploration and a type-safe builder API for production code.
📓 Run this guide online
You can run this tutorial in your browser without any local installation:
Or see the full executable example: example_dsl_builder_api.py
What Makes This DSL Special:
Graph-aware: Unlike generic query languages, the DSL understands multilayer structures—layers, layer intersections, intralayer vs. interlayer edges, and (node, layer) tuple semantics.
Dual interfaces: String syntax for rapid prototyping in notebooks; builder API (
Q,L) for IDE autocompletion and type checking.Integrated computation: Compute centrality, clustering, and other network metrics directly in queries, with results returned as pandas DataFrames or NetworkX graphs.
Temporal support: Query network snapshots and time ranges when your network includes temporal information.
Prerequisites:
A loaded
multi_layer_networkobject (see How to Load and Build Networks)Basic familiarity with multilayer network concepts (nodes, layers, intralayer/interlayer edges)
For complete DSL grammar and operator reference, see DSL Reference
Conceptual Overview
The DSL has two complementary interfaces that compile to the same internal representation:
String Syntax (
execute_query(network, "SELECT nodes WHERE ..."))SQL-like, human-readable
Ideal for interactive exploration in Jupyter notebooks or the REPL
Quick one-liners for common queries
Builder API (
Q.nodes().where(...).compute(...).execute(network))Pythonic, chainable methods
Type-safe with IDE autocompletion
Recommended for production code and complex workflows
Mental Model:
A typical DSL query follows this pipeline:
SELECT nodes/edges
→ FROM LAYERS (restrict to specific layers)
→ WHERE (filter by attributes or special predicates)
→ COMPUTE (calculate metrics like degree, centrality)
→ ORDER BY (sort results)
→ LIMIT (cap number of results)
→ EXPORT (materialize as DataFrame, NetworkX graph, etc.)
Key Concepts:
Nodes as (node, layer) tuples: In multilayer networks, a node may appear in multiple layers. The DSL represents these as
(node_id, layer_name)pairs.Layer set algebra: Combine layers with set operations (
|union,&intersection,-difference,~complement). The new LayerSet algebra enables expressive layer selection likeL["* - coupling"]orL["(ppi | gene) & disease"]. See Layer Set Algebra for complete documentation.Special predicates:
intralayer=Trueselects edges within a layer;interlayer=("layer1", "layer2")selects edges crossing specific layers.Lazy execution: Queries are built incrementally and executed only when
.execute(network)is called.
Comparison to SQL:
Think of the DSL as SQL for graphs:
SELECT nodes WHERE degree > 5≈ SQL’sSELECT * FROM nodes WHERE degree > 5But instead of tables, you’re querying nodes and edges with multilayer and temporal attributes
Layer filters and graph-specific predicates (
intralayer,interlayer) have no SQL equivalent
String Syntax (Quick and Readable)
The string syntax provides a concise, SQL-like way to express queries. Best for exploratory analysis and quick investigations.
Basic SELECT
Select all nodes and inspect the result:
Note
Where to find this data
The examples in this guide use one of the following:
Built-in data generators like
random_generators.random_multilayer_ER(...)(recommended for self-contained examples)Example files from the repository at
datasets/multiedgelist.txtor similarThe built-in datasets module:
from py3plex.datasets import fetch_multilayer
For this example, we’ll create a simple network programmatically:
from py3plex.core import multinet
from py3plex.dsl import execute_query
# Create a simple multilayer network
network = multinet.multi_layer_network()
network.add_edges([
['alice', 'social', 'bob', 'social', 1],
['bob', 'social', 'charlie', 'social', 1],
['alice', 'work', 'charlie', 'work', 1],
['bob', 'work', 'dave', 'work', 1],
], input_type="list")
# Get all nodes
result = execute_query(network, 'SELECT nodes')
print(f"Found {len(result)} nodes")
# Inspect a few items
for i, (node, data) in enumerate(result.items()):
print(f" {node}: {data}")
if i >= 4:
break
Expected output:
Found 6 nodes
('alice', 'social'): {'degree': 1, 'layer': 'social', 'layer_count': 2}
('bob', 'social'): {'degree': 2, 'layer': 'social', 'layer_count': 2}
('charlie', 'social'): {'degree': 1, 'layer': 'social', 'layer_count': 2}
('alice', 'work'): {'degree': 1, 'layer': 'work', 'layer_count': 2}
('bob', 'work'): {'degree': 1, 'layer': 'work', 'layer_count': 2}
Tip
Loading from files
To load from a file in the repository:
# Using a file from the datasets/ directory
network.load_network("datasets/multiedgelist.txt", input_type="multiedgelist")
# Or using an absolute path
import os
path = os.path.join(os.path.dirname(__file__), "datasets", "multiedgelist.txt")
network.load_network(path, input_type="multiedgelist")
('diana', 'social'): {'degree': 9, 'layer': 'social', 'layer_count': 3}
('eve', 'work'): {'degree': 4, 'layer': 'work', 'layer_count': 1}
Note: Keys are (node, layer) tuples representing node-layer pairs. The layer_count attribute indicates how many layers the node appears in across the entire network.
Filter by Layer
Restrict queries to nodes in a specific layer:
# Get nodes in the 'friends' layer only
result = execute_query(
network,
'SELECT nodes WHERE layer="friends"'
)
print(f"Nodes in 'friends' layer: {len(result)}")
Understanding Layer Filters:
layer="friends"selects only the node-layer pairs wherelayer == "friends"This does not select all occurrences of nodes across layers—only their representation in the specified layer
Use
layer_count >= 2to find nodes appearing in multiple layers
Example with statistics:
result = execute_query(
network,
'SELECT nodes WHERE layer="friends" COMPUTE degree'
)
df = result.to_pandas()
print(f"Nodes in 'friends': {len(df)}")
print(f"Average degree in 'friends': {df['degree'].mean():.2f}")
print(f"Max degree in 'friends': {df['degree'].max()}")
Expected output:
Nodes in 'friends': 42
Average degree in 'friends': 5.23
Max degree in 'friends': 15
Filter by Property
Use comparisons to filter nodes by computed or intrinsic attributes:
# High-degree nodes
result = execute_query(
network,
'SELECT nodes WHERE degree > 5'
)
print(f"High-degree nodes: {len(result)}")
# Multilayer nodes with high degree
result = execute_query(
network,
'SELECT nodes WHERE degree > 5 AND layer_count >= 2'
)
print(f"High-degree multilayer nodes: {len(result)}")
Supported operators: >, >=, <, <=, =, !=
Multiple conditions are combined with AND. For more complex logic, use the builder API (see below).
Expected output:
High-degree nodes: 34
High-degree multilayer nodes: 18
Compute Statistics
The COMPUTE clause calculates network metrics and attaches them to result rows. This is where the DSL becomes powerful for analysis:
# Compute degree and betweenness centrality for nodes in 'social' layer
result = execute_query(
network,
'SELECT nodes WHERE layer="social" '
'COMPUTE degree COMPUTE betweenness_centrality'
)
# Convert to pandas for analysis
df = result.to_pandas()
print("Top nodes by betweenness centrality:")
print(df[['id', 'degree', 'betweenness_centrality']].head())
print("\nSummary statistics:")
print(df[['degree', 'betweenness_centrality']].describe())
Expected output:
Top nodes by betweenness centrality:
id degree betweenness_centrality
0 (alice, social) 12 0.245
1 (bob, social) 8 0.189
2 (eve, social) 15 0.301
3 (frank, social) 7 0.134
4 (grace, social) 11 0.221
Summary statistics:
degree betweenness_centrality
count 65.000000 65.000000
mean 6.846154 0.112308
std 3.241057 0.089542
min 1.000000 0.000000
25% 4.000000 0.045000
50% 7.000000 0.089000
75% 10.000000 0.167000
max 15.000000 0.301000
Available measures include: degree, betweenness_centrality, closeness_centrality, eigenvector_centrality, pagerank, clustering, communities. See DSL Reference for the complete list.
Use case: This pattern is ideal for generating summary statistics for papers, reports, or further statistical analysis.
Builder API (Type-Safe)
The builder API is the recommended approach for production code. It provides:
IDE autocompletion and inline documentation
Type checking with tools like mypy
Clearer error messages
Easier refactoring and composition of queries
All builder queries compile to the same AST as string queries, ensuring consistent semantics.
Basic Queries
Create and execute queries using the Q and L imports:
from py3plex.dsl import Q, L
# Get all nodes
result = Q.nodes().execute(network)
print(f"Total nodes: {len(result)}")
# Get nodes from a specific layer
result = (
Q.nodes()
.from_layers(L["friends"])
.execute(network)
)
print(f"Nodes in 'friends' layer: {len(result)}")
Query reusability: You can define a query once and execute it with different networks:
high_degree_query = Q.nodes().where(degree__gt=10).compute("betweenness_centrality")
# Execute on multiple networks
result_network1 = high_degree_query.execute(network1)
result_network2 = high_degree_query.execute(network2)
Filtering
Use where() to add filter conditions. The builder API uses Django-style __ suffixes for comparisons:
# Filter by property
result = (
Q.nodes()
.where(degree__gt=5)
.execute(network)
)
print(f"Nodes with degree > 5: {len(result)}")
# Multiple conditions (combined with AND)
result = (
Q.nodes()
.from_layers(L["work"])
.where(degree__gt=3, layer_count__gte=2)
.execute(network)
)
print(f"Multilayer high-degree nodes in 'work': {len(result)}")
Supported comparison suffixes:
__gt: greater than (>)__gteor__ge: greater than or equal (>=)__lt: less than (<)__lteor__le: less than or equal (<=)__eq: equal (=)__neor__neq: not equal (!=)
Understanding layer_count:
In multilayer networks, a node may appear in multiple layers. The layer_count attribute indicates how many layers the node participates in:
layer_count__gte=2: nodes appearing in at least 2 layerslayer_count__eq=1: nodes appearing in exactly 1 layer (layer-specific nodes)
This is useful for identifying “connector” nodes that bridge multiple contexts.
Computing Metrics
Use compute() to calculate network metrics. Metrics are computed efficiently and attached to result rows:
# Compute multiple metrics
result = (
Q.nodes()
.compute("degree", "betweenness_centrality", "clustering")
.execute(network)
)
# Convert to DataFrame and analyze
df = result.to_pandas()
print(df.head(10))
# Get top nodes by a metric
top_by_betweenness = df.nlargest(10, 'betweenness_centrality')
print("\nTop 10 nodes by betweenness centrality:")
print(top_by_betweenness[['id', 'betweenness_centrality', 'degree']])
Order of operations:
compute()can be called at any point in the chainFilters (
where()) can reference computed metrics only if the metric is computed before the filterFor best performance, filter first, then compute:
# Good: Filter first, then compute
result = (
Q.nodes()
.from_layers(L["social"])
.where(degree__gt=5)
.compute("betweenness_centrality")
.execute(network)
)
Expected output:
id degree betweenness_centrality clustering
0 (alice, social) 12 0.245000 0.545455
1 (bob, social) 8 0.189000 0.642857
2 (eve, social) 15 0.301000 0.428571
3 (frank, social) 7 0.134000 0.666667
4 (grace, social) 11 0.221000 0.509091
...
Top 10 nodes by betweenness centrality:
id betweenness_centrality degree
2 (eve, social) 0.301000 15
0 (alice, social) 0.245000 12
4 (grace, social) 0.221000 11
1 (bob, social) 0.189000 8
...
Computing Metrics with Uncertainty
New in py3plex 1.0: The DSL now supports first-class uncertainty for computed metrics. This allows you to estimate statistical uncertainty (confidence intervals, standard deviations) for network statistics via bootstrap, perturbation, or Monte Carlo methods.
Why uncertainty matters:
Networks are often noisy or sampled (e.g., social networks with missing edges)
Centrality metrics can be sensitive to small perturbations
Uncertainty quantification helps distinguish signal from noise
Required for robust statistical inference and hypothesis testing
Basic usage:
# Compute degree with uncertainty estimation
result = (
Q.nodes()
.compute(
"degree",
"betweenness_centrality",
uncertainty=True,
method="perturbation", # or "bootstrap", "seed"
n_samples=100, # number of resamples
ci=0.95 # confidence interval level
)
.execute(network)
)
# Access uncertainty information
df = result.to_pandas()
print(df.head())
# Results contain mean, std, and quantiles for each metric
# The 'degree' column now has dict values with uncertainty info
Uncertainty methods:
"perturbation": Drop a small fraction of edges/nodes randomly (default: 5%)"bootstrap": Resample nodes/edges with replacement"seed": Run stochastic algorithms with different random seeds"jackknife": Leave-one-out resampling
Parameters:
uncertainty(bool): Enable uncertainty estimation (default: False)method(str): Resampling strategy (default: “perturbation”)n_samples(int): Number of resamples (default: 50)ci(float): Confidence interval level, e.g., 0.95 for 95% CI (default: 0.95)
Example with confidence intervals:
# Find hubs with uncertainty bounds
hubs = (
Q.nodes()
.compute(
"degree",
"betweenness_centrality",
uncertainty=True,
method="perturbation",
n_samples=200,
ci=0.95
)
.order_by("-betweenness_centrality")
.limit(10)
.execute(network)
)
# Extract uncertainty information
df = hubs.to_pandas()
# When uncertainty=True, values are dicts with mean, std, quantiles
for idx, row in df.head().iterrows():
node_id = row['id']
bc_info = row['betweenness_centrality']
if isinstance(bc_info, dict):
mean = bc_info['mean']
std = bc_info.get('std', 0)
ci_low = bc_info.get('quantiles', {}).get(0.025, mean)
ci_high = bc_info.get('quantiles', {}).get(0.975, mean)
print(f"{node_id}:")
print(f" Betweenness: {mean:.4f} ± {std:.4f}")
print(f" 95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
Expected output:
('eve', 'social'):
Betweenness: 0.3010 ± 0.0234
95% CI: [0.2589, 0.3442]
('alice', 'social'):
Betweenness: 0.2450 ± 0.0198
95% CI: [0.2087, 0.2821]
('grace', 'social'):
Betweenness: 0.2210 ± 0.0176
95% CI: [0.1901, 0.2534]
Backward compatibility:
When uncertainty=False (the default), metrics return scalar values as before. Your existing queries work unchanged:
# Traditional deterministic computation
result = Q.nodes().compute("degree").execute(network)
# 'degree' values are scalars (int/float)
# With uncertainty
result_unc = Q.nodes().compute("degree", uncertainty=True).execute(network)
# 'degree' values are dicts with mean, std, quantiles
Use cases:
Comparing networks: Test if centrality differences between networks are statistically significant
Robust ranking: Identify nodes that consistently rank high across perturbations
Network inference: Quantify uncertainty when inferring networks from noisy data
Hypothesis testing: Generate null distributions for significance testing
Performance notes:
Uncertainty estimation is opt-in and only runs when explicitly requested
Cost scales linearly with
n_samples(e.g., 100 samples ≈ 100× slower)Use smaller
n_samples(20-50) for exploration, larger (100-500) for publicationPerturbation is fastest; bootstrap and jackknife are more expensive
Further reading:
How to Compute Network Statistics: General guide to network statistics and uncertainty
examples/uncertainty/example_first_class_uncertainty.py: Complete examplespy3plex.uncertaintymodule: Low-level API for custom uncertainty workflows
Sorting and Limiting
Use order_by() and limit() to control result ordering and size:
# Get top 10 nodes by degree
result = (
Q.nodes()
.compute("degree")
.order_by("-degree") # "-" prefix for descending
.limit(10)
.execute(network)
)
print("Top 10 highest-degree nodes:")
for node, data in result.items():
print(f" {node}: degree={data['degree']}")
Sorting conventions:
order_by("degree"): ascending (low to high)order_by("-degree"): descending (high to low)Multiple keys:
order_by("-degree", "layer_count"): sort by degree descending, then layer_count ascending
Expected output:
Top 10 highest-degree nodes:
('eve', 'social'): degree=15
('alice', 'social'): degree=12
('grace', 'social'): degree=11
('charlie', 'work'): degree=10
('henry', 'friends'): degree=9
('diana', 'social'): degree=9
('bob', 'social'): degree=8
('frank', 'social'): degree=7
('iris', 'work'): degree=7
('jake', 'friends'): degree=6
Working with Results
DSL queries return a QueryResult object that provides multiple ways to access and export data. Understanding how to work with results is crucial for integrating DSL queries into analysis pipelines.
Access as Dictionary
QueryResult provides dictionary-like access via .items():
result = Q.nodes().compute("degree").execute(network)
# Iterate over all items
for node, data in result.items():
print(f"{node}: degree={data['degree']}")
# Inspect one sample entry
sample_key, sample_value = next(iter(result.items()))
print(f"Sample key type: {type(sample_key)}")
print(f"Sample key: {sample_key}")
print(f"Sample value: {sample_value}")
Result structure for nodes:
Keys:
(node_id, layer)tuples (for multilayer queries) ornode_id(for single-layer queries)Values: Dictionaries with computed attributes (
{"degree": 5, "betweenness_centrality": 0.23, ...})
Result structure for edges:
Keys:
((source, source_layer), (target, target_layer), {edge_data})tuplesValues: Dictionaries with edge attributes and computed metrics
Expected output:
Sample key type: <class 'tuple'>
Sample key: ('alice', 'social')
Sample value: {'degree': 12, 'layer': 'social', 'layer_count': 2}
Convert to Pandas
This is the recommended way to integrate DSL queries with statistical analysis and plotting libraries.
result = (
Q.nodes()
.from_layers(L["social"])
.compute("degree", "betweenness_centrality", "clustering")
.execute(network)
)
# Convert to DataFrame
df = result.to_pandas()
# Inspect structure
print(df.head())
print("\nColumn names:", df.columns.tolist())
print("\nSummary statistics:")
print(df[['degree', 'betweenness_centrality', 'clustering']].describe())
# Use pandas for further analysis
high_influence = df[
(df['degree'] > 10) &
(df['betweenness_centrality'] > 0.2)
]
print(f"\nHigh-influence nodes: {len(high_influence)}")
DataFrame structure:
For node queries: Columns include
id(the node-layer tuple or node ID), plus all computed attributesFor edge queries: Columns include
source,target,source_layer,target_layer,weight, plus computed attributes
Expected output:
id degree betweenness_centrality clustering
0 (alice, social) 12 0.245000 0.545455
1 (bob, social) 8 0.189000 0.642857
2 (eve, social) 15 0.301000 0.428571
3 (frank, social) 7 0.134000 0.666667
4 (grace, social) 11 0.221000 0.509091
Column names: ['id', 'degree', 'betweenness_centrality', 'clustering']
Summary statistics:
degree betweenness_centrality clustering
count 65.000000 65.000000 65.000000
mean 6.846154 0.112308 0.587692
std 3.241057 0.089542 0.145231
...
High-influence nodes: 8
Multi-index option:
For more complex analyses, you can reshape the id tuple into a multi-index:
df = result.to_pandas()
# Split 'id' tuple into separate columns
df[['node', 'layer']] = pd.DataFrame(df['id'].tolist(), index=df.index)
df = df.drop('id', axis=1)
df = df.set_index(['node', 'layer'])
print(df.head())
Filter Results
You can filter results in two ways: using the DSL’s where() clause (recommended) or post-processing with Python/pandas.
Option 1: Filter in the query (recommended for large networks):
# Filter before computation for efficiency
result = (
Q.nodes()
.compute("degree", "betweenness_centrality")
.where(degree__gt=5)
.execute(network)
)
Option 2: Filter the result dictionary (for small networks or ad-hoc filtering):
result = Q.nodes().compute("degree").execute(network)
# Pure Python filtering
high_degree = {
node: data
for node, data in result.items()
if data['degree'] > 5
}
print(f"High-degree nodes: {len(high_degree)}")
Option 3: Filter the DataFrame (most flexible for complex conditions):
df = result.to_pandas()
# Use pandas boolean indexing
filtered = df[df['degree'] > 5]
# Complex conditions
interesting_nodes = df[
(df['degree'] > 5) &
(df['betweenness_centrality'] > df['betweenness_centrality'].mean())
]
Performance note: For very large networks (millions of nodes), filtering in the DSL query (Option 1) is most efficient because it avoids materializing unnecessary results. For smaller networks, pandas filtering (Option 3) is often more convenient.
Advanced Queries
This section showcases the DSL’s power for sophisticated multilayer network analysis. These patterns are common in research and can be adapted to your specific needs.
Multiple Layer Selection
Use layer algebra to combine layers. The L object supports set operations:
from py3plex.dsl import Q, L
# Union: nodes/edges from EITHER layer
result = (
Q.nodes()
.from_layers(L["friends"] + L["work"])
.compute("degree")
.execute(network)
)
df = result.to_pandas()
print(f"Combined nodes from 'friends' and 'work': {len(df)}")
print(f"Average degree across both layers: {df['degree'].mean():.2f}")
Set semantics:
L["friends"] + L["work"]: Union of nodes/edges from both layers (nodes appearing in either layer)L["friends"] & L["work"]: Intersection (see next section)L["friends"] - L["work"]: Difference (nodes in friends but not work)
Use case: Compare activity across related contexts. For example, analyze user behavior across social and professional networks together.
Expected output:
Combined nodes from 'friends' and 'work': 87
Average degree across both layers: 6.12
Layer Intersection
Find nodes that appear in multiple specific layers:
# Nodes present in BOTH 'friends' AND 'work' layers
result = (
Q.nodes()
.from_layers(L["friends"] & L["work"])
.compute("degree", "betweenness_centrality")
.execute(network)
)
df = result.to_pandas()
print(f"Nodes in both 'friends' and 'work': {len(df)}")
print("\nThese are 'connector' nodes bridging social and professional contexts")
print(df.head(10))
Semantics:
L["friends"] & L["work"]selects nodes that have representations in both layersThis is different from
layer_count >= 2, which selects nodes in any two layersUse intersection to find nodes bridging specific contexts
Alternative approach using layer_count:
# More general: nodes in at least 2 layers (any layers)
result = (
Q.nodes()
.where(layer_count__gte=2)
.compute("degree")
.execute(network)
)
print(f"Multilayer nodes (any 2+ layers): {len(result)}")
Expected output:
Nodes in both 'friends' and 'work': 23
These are 'connector' nodes bridging social and professional contexts
id degree betweenness_centrality
0 (alice, friends) 12 0.245000
1 (alice, work) 8 0.189000
2 (charlie, friends) 10 0.201000
3 (charlie, work) 7 0.145000
...
Query Edges
The DSL supports edge queries with the same flexibility as node queries:
# Select edges from a layer with weight filter
edges = (
Q.edges()
.from_layers(L["social"])
.where(weight__gt=0.5)
.compute("edge_betweenness")
.execute(network)
)
df = edges.to_pandas()
print(f"High-weight edges in 'social' layer: {len(df)}")
print("\nSample edges:")
print(df.head())
# Analyze edge distribution
print(f"\nMean edge weight: {df['weight'].mean():.3f}")
print(f"Mean edge betweenness: {df['edge_betweenness'].mean():.3f}")
Edge result structure:
For edge queries, the DataFrame includes:
source,target: node identifierssource_layer,target_layer: layer names (same for intralayer edges)weight: edge weight (default 1.0 if not specified)Computed attributes:
edge_betweenness, etc.
Filter by edge type:
# Only intralayer edges (within a layer)
intralayer_edges = (
Q.edges()
.where(intralayer=True)
.execute(network)
)
print(f"Intralayer edges: {len(intralayer_edges)}")
# Only interlayer edges between specific layers
interlayer_edges = (
Q.edges()
.where(interlayer=("social", "work"))
.execute(network)
)
print(f"Edges between 'social' and 'work': {len(interlayer_edges)}")
Expected output:
High-weight edges in 'social' layer: 156
Sample edges:
source target source_layer target_layer weight edge_betweenness
0 alice bob social social 0.75 0.023400
1 bob charlie social social 0.80 0.034500
2 alice diana social social 0.92 0.019800
3 diana eve social social 0.65 0.028900
4 eve frank social social 0.88 0.041200
Mean edge weight: 0.723
Mean edge betweenness: 0.028
Smart Defaults and Error Messages
The DSL includes smart defaults that automatically compute commonly used centrality metrics when referenced but not explicitly computed. This feature makes queries more ergonomic while maintaining predictable behavior.
Auto-Computing Centrality Metrics
When you reference a centrality metric in operations like top_k(), order_by(), or other ranking operations, the DSL will automatically compute it if not already present:
from py3plex.dsl import Q, L
# The DSL auto-computes betweenness_centrality when needed
result = (
Q.nodes()
.from_layers(L["*"])
.per_layer()
.top_k(5, "betweenness_centrality") # Auto-computed here
.end_grouping()
.execute(network)
)
df = result.to_pandas()
# betweenness_centrality column is available even though
# we didn't explicitly call .compute("betweenness_centrality")
Supported centrality aliases:
degree,degree_centralitybetweenness,betweenness_centralitycloseness,closeness_centralityeigenvector,eigenvector_centralitypagerank
When auto-compute happens:
When the attribute is referenced in
top_k()When the attribute is used in
order_by()For both per-group (with grouping) and global operations
Example with multiple auto-computed metrics:
# Auto-compute degree for filtering and betweenness for ranking
result = (
Q.nodes()
.from_layers(L["social"])
.where(degree__gt=2) # degree auto-computed here
.order_by("betweenness_centrality", desc=True) # betweenness auto-computed here
.limit(10)
.execute(network)
)
Expected output:
node layer degree betweenness_centrality
0 alice social 8 0.143000
1 bob social 7 0.098000
2 carol social 6 0.067000
...
Controlling Autocompute Behavior
You can explicitly control whether metrics are automatically computed using the autocompute parameter:
# Disable autocompute - require explicit .compute() calls
result = (
Q.nodes(autocompute=False) # Autocompute disabled
.from_layers(L["social"])
.compute("degree") # Must explicitly compute
.where(degree__gt=5)
.execute(network)
)
# This would raise DslMissingMetricError because betweenness is not computed:
# Q.nodes(autocompute=False).order_by("betweenness_centrality").execute(net)
When to disable autocompute:
Performance-critical code: Avoid unexpected expensive computations
Explicit control: Make all metric computations visible in code
Debugging: Understand exactly which metrics are computed and when
Tracking computed metrics:
Query results include a computed_metrics attribute that tracks which metrics were computed during execution:
result = (
Q.nodes()
.from_layers(L["social"])
.compute("degree")
.order_by("betweenness_centrality") # Auto-computed
.execute(network)
)
# Check which metrics were computed
print(f"Computed metrics: {result.computed_metrics}")
# Output: Computed metrics: {'degree', 'betweenness_centrality'}
Use cases for computed_metrics:
Performance profiling: identify expensive operations
Query optimization: avoid redundant computations
Debugging: verify expected metrics were computed
Helpful Error Messages with Suggestions
When you reference an unknown attribute, the DSL provides did you mean? suggestions using fuzzy string matching:
# Typo in attribute name
try:
result = (
Q.nodes()
.from_layers(L["*"])
.per_layer()
.top_k(5, "betweness_centrality") # Typo: "betweness" instead of "betweenness"
.end_grouping()
.execute(network)
)
except UnknownAttributeError as e:
print(e)
Output:
Unknown attribute 'betweness_centrality'. Did you mean 'betweenness_centrality'?
Known attributes: betweenness, betweenness_centrality, closeness, closeness_centrality,
degree, degree_centrality, eigenvector, eigenvector_centrality, pagerank
The error includes:
The incorrect attribute name
A suggestion for the most similar correct name (using Levenshtein distance)
A list of all available attributes
Grouping Requirements and Clear Errors
Some operations require active grouping (via per_layer() or group_by()). The DSL raises GroupingError with clear guidance when these operations are used incorrectly:
from py3plex.dsl.errors import GroupingError
# This will raise GroupingError
try:
result = (
Q.nodes()
.from_layers(L["*"])
.coverage(mode="all") # Error: no grouping active
.execute(network)
)
except GroupingError as e:
print(e)
Output:
coverage() requires an active grouping (e.g. per_layer(), group_by('layer')).
No grouping is currently active.
Example:
Q.nodes().from_layers(L["*"])
.per_layer().top_k(5, "degree").end_grouping()
.coverage(mode="all")
Correct usage:
# With proper grouping
result = (
Q.nodes()
.from_layers(L["*"])
.per_layer() # Add grouping here
.top_k(5, "degree")
.end_grouping()
.coverage(mode="all") # Now works correctly
.execute(network)
)
When Smart Defaults DON’T Apply
Smart defaults are predictable and conservative. They only apply in specific scenarios:
Only for centrality metrics: Smart defaults work for recognized centrality metrics (degree, betweenness, etc.), not arbitrary attributes.
Explicit compute takes precedence: If you explicitly compute a metric, the DSL uses your computation and doesn’t auto-compute:
# Explicit compute - no auto-compute happens result = ( Q.nodes() .from_layers(L["*"]) .compute("betweenness_centrality") # Explicit .per_layer() .top_k(5, "betweenness_centrality") # Uses explicit computation .end_grouping() .execute(network) )
Edge attributes are not auto-computed: For edge queries, attributes like
weightare read from edge data, not auto-computed:# Edge weight is read from edge data, not computed result = ( Q.edges() .from_layers(L["*"]) .per_layer() .top_k(5, "weight") # Uses edge data['weight'] .end_grouping() .execute(network) )
Benefits of Smart Defaults
Ergonomics: Write less boilerplate for common patterns:
# Before smart defaults (verbose)
result = (
Q.nodes()
.from_layers(L["*"])
.compute("degree", "betweenness_centrality", "closeness_centrality")
.per_layer()
.top_k(5, "betweenness_centrality")
.end_grouping()
.execute(network)
)
# With smart defaults (concise)
result = (
Q.nodes()
.from_layers(L["*"])
.per_layer()
.top_k(5, "betweenness_centrality") # Auto-computes what's needed
.end_grouping()
.execute(network)
)
Teaching errors: When something goes wrong, you get actionable guidance instead of cryptic messages.
Predictability: Smart defaults only activate for well-known patterns. Your explicit operations always take precedence.
Temporal Queries
The DSL supports temporal filtering for networks with time-stamped edges or nodes. Four convenience methods provide intuitive temporal filtering: .at(t), .during(t0, t1), .before(t), and .after(t).
Prerequisites for temporal queries:
Edges or nodes must have temporal attributes:
Point-in-time:
tattribute (e.g.,{"t": 150.0})Intervals:
t_startandt_endattributes (e.g.,{"t_start": 100.0, "t_end": 200.0})
Time values are typically numeric (timestamps) or ISO date strings
Temporal Semantics Reference
The following table summarizes temporal query semantics:
Method |
Description |
Interval Semantics |
Inclusivity |
|---|---|---|---|
|
Snapshot at time t |
Entities active at exactly t |
Point (closed) |
|
Range from t0 to t1 |
Entities active during [t0, t1] |
[t0, t1] (closed interval) |
|
Before time t |
Equivalent to |
(-∞, t] (closed at t) |
|
After time t |
Equivalent to |
[t, +∞) (closed at t) |
Detailed semantics:
at(t): Selects entities active at a specific momentFor point-in-time edges: includes edges where
t_edge == tFor interval edges: includes edges where
tis in[t_start, t_end]
during(t0, t1): Selects entities active during a time windowFor point-in-time edges: includes edges where
tis in[t0, t1](closed interval)For interval edges: includes edges where the interval overlaps
[t0, t1]Nonevalues: Uset0=Nonefor open lower bound,t1=Nonefor open upper bound
before(t): Selects entities active before (and at) time tConvenience method equivalent to
.during(None, t)Inclusive of the boundary: includes entities at exactly time t
after(t): Selects entities active after (and at) time tConvenience method equivalent to
.during(t, None)Inclusive of the boundary: includes entities at exactly time t
Filter by Time (Snapshot)
Query the network state at a specific point in time:
# Nodes active at t=150.0
result = (
Q.nodes()
.at(150.0)
.compute("degree")
.execute(network)
)
df = result.to_pandas()
print(f"Nodes active at t=150: {len(df)}")
print(f"Average degree at t=150: {df['degree'].mean():.2f}")
print("\nTop nodes by degree at this snapshot:")
print(df.nlargest(5, 'degree')[['id', 'degree']])
Use case: Analyze network structure at specific moments (e.g., before and after an event, at regular intervals for time series).
Expected output:
Nodes active at t=150: 78
Average degree at t=150: 5.12
Top nodes by degree at this snapshot:
id degree
12 (eve, social) 14
5 (alice, social) 11
23 (grace, social) 10
8 (bob, social) 9
31 (henry, work) 9
Time Range
Query entities active during a time window:
# Nodes active during January 2024 (assuming numeric timestamps)
# For ISO dates, use strings: .during("2024-01-01", "2024-01-31")
result = (
Q.nodes()
.during(100.0, 200.0)
.compute("degree")
.execute(network)
)
df = result.to_pandas()
print(f"Nodes active during [100, 200]: {len(df)}")
# Compare to snapshot
snapshot_result = Q.nodes().at(150.0).execute(network)
print(f"Nodes at t=150 (snapshot): {len(snapshot_result)}")
print(f"Nodes during [100, 200] (range): {len(df)}")
print(f"Ratio: {len(df) / len(snapshot_result):.2f}x more nodes in range")
Open-ended ranges:
# From t=100 onwards (no upper limit)
result_after = Q.edges().during(100.0, None).execute(network)
# Up to t=200 (no lower limit)
result_before = Q.edges().during(None, 200.0).execute(network)
Use case: Study network evolution, identify persistent vs. transient connections, analyze activity bursts.
Expected output:
Nodes active during [100, 200]: 142
Nodes at t=150 (snapshot): 78
Ratio: 1.82x more nodes in range
Before and After (Convenience Methods)
The .before() and .after() methods provide intuitive alternatives for open-ended temporal queries:
# Get all edges before time 100 (inclusive)
early_edges = Q.edges().before(100.0).execute(network)
# Get all edges after time 200 (inclusive)
late_edges = Q.edges().after(200.0).execute(network)
# Common pattern: compare network before and after an event
event_time = 150.0
before_event = (
Q.nodes()
.before(event_time)
.compute("degree", "betweenness_centrality")
.execute(network)
)
after_event = (
Q.nodes()
.after(event_time)
.compute("degree", "betweenness_centrality")
.execute(network)
)
# Compare metrics
df_before = before_event.to_pandas()
df_after = after_event.to_pandas()
print(f"Average degree before event: {df_before['degree'].mean():.2f}")
print(f"Average degree after event: {df_after['degree'].mean():.2f}")
print(f"Network became {'denser' if df_after['degree'].mean() > df_before['degree'].mean() else 'sparser'}")
Expected output:
Average degree before event: 4.35
Average degree after event: 5.87
Network became denser
Temporal edges example:
# Edges active during a period
edges = (
Q.edges()
.during(100.0, 200.0)
.compute("edge_betweenness")
.execute(network)
)
df = edges.to_pandas()
print(f"Active edges during [100, 200]: {len(df)}")
print(f"Mean edge betweenness: {df['edge_betweenness'].mean():.4f}")
Note on implementation status:
Temporal queries are fully implemented for edge-level temporal data. Node-level temporal filtering depends on your network’s representation:
If nodes have explicit
tattributes,.at(),.during(),.before(), and.after()work directlyIf only edges are timestamped, node activity is inferred from edge presence
For most use cases, temporal edge queries are sufficient
See DSL Reference for complete temporal query syntax and examples with ISO date strings.
Common Patterns
This section presents end-to-end recipes for common multilayer network analysis tasks. These patterns are production-ready and can be adapted to your research questions.
Pattern: Find Influential Nodes
Identify nodes that are both well-connected (high degree) and structurally important (high betweenness centrality):
# High-degree nodes ranked by betweenness centrality
result = (
Q.nodes()
.compute("degree", "betweenness_centrality", "layer_count")
.where(degree__gt=10)
.order_by("-betweenness_centrality")
.limit(20)
.execute(network)
)
df = result.to_pandas()
print(f"Top 20 influential nodes (degree > 10):")
print(df[['id', 'degree', 'betweenness_centrality', 'layer_count']])
# Export for further analysis or publication
df.to_csv("influential_nodes.csv", index=False)
# Visualize
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.scatter(df['degree'], df['betweenness_centrality'],
s=df['layer_count']*50, alpha=0.6)
plt.xlabel("Degree")
plt.ylabel("Betweenness Centrality")
plt.title("Influential Nodes (size = layer_count)")
plt.tight_layout()
plt.savefig("influential_nodes.png", dpi=300)
Why this pattern works:
Degree measures local connectivity (how many neighbors)
Betweenness centrality measures global importance (how often the node appears on shortest paths)
Nodes high in both metrics are influential bridges in the network
Expected output:
Top 20 influential nodes (degree > 10):
id degree betweenness_centrality layer_count
0 (eve, social) 15 0.301000 3
1 (alice, social) 12 0.245000 2
2 (grace, social) 11 0.221000 2
3 (bob, social) 12 0.198000 1
4 (diana, work) 14 0.187000 3
...
Pattern: Compare Layer Activity
Compute summary statistics for each layer to understand layer-specific dynamics:
layers = network.get_layers()
layer_stats = []
for layer in layers:
result = (
Q.nodes()
.from_layers(L[layer])
.compute("degree", "clustering")
.execute(network)
)
df = result.to_pandas()
layer_stats.append({
'layer': layer,
'num_nodes': len(df),
'mean_degree': df['degree'].mean(),
'max_degree': df['degree'].max(),
'mean_clustering': df['clustering'].mean(),
})
print(f"{layer}: {len(df)} nodes, "
f"avg degree={df['degree'].mean():.2f}, "
f"avg clustering={df['clustering'].mean():.3f}")
# Create comparison DataFrame
import pandas as pd
comparison = pd.DataFrame(layer_stats)
print("\nLayer comparison:")
print(comparison)
# Visualize
comparison.plot(x='layer', y=['mean_degree', 'mean_clustering'],
kind='bar', figsize=(10, 5))
plt.ylabel("Value")
plt.title("Layer Activity Comparison")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("layer_comparison.png", dpi=300)
Use case: Understand how network structure varies across different contexts (e.g., online vs. offline interactions, different communication channels).
Expected output:
friends: 50 nodes, avg degree=4.20, avg clustering=0.623
work: 72 nodes, avg degree=3.15, avg clustering=0.512
social: 65 nodes, avg degree=5.01, avg clustering=0.587
family: 38 nodes, avg degree=6.84, avg clustering=0.701
Layer comparison:
layer num_nodes mean_degree max_degree mean_clustering
0 friends 50 4.20 15 0.623
1 work 72 3.15 12 0.512
2 social 65 5.01 18 0.587
3 family 38 6.84 21 0.701
Pattern: Export Subnetwork
Extract a subnetwork based on query criteria for focused analysis or visualization:
# Extract high-activity multilayer nodes
active_nodes = (
Q.nodes()
.where(layer_count__gt=2)
.compute("degree", "betweenness_centrality")
.execute(network)
)
print(f"Selected {len(active_nodes)} multilayer nodes")
# Create subnetwork containing only these nodes
subnetwork = network.subgraph(active_nodes.keys())
print(f"Subnetwork: {subnetwork.number_of_nodes()} nodes, "
f"{subnetwork.number_of_edges()} edges")
# Analyze subnetwork
df = active_nodes.to_pandas()
print(f"\nSubnetwork mean degree: {df['degree'].mean():.2f}")
print(f"Subnetwork mean betweenness: {df['betweenness_centrality'].mean():.4f}")
# Export for visualization or further analysis
from py3plex.visualization import draw_multilayer_default
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 10))
draw_multilayer_default(subnetwork, ax=ax, display=False)
plt.savefig("subnetwork_viz.png", dpi=300)
plt.close()
# Or export in various formats
subnetwork.save_network("subnetwork.edgelist", output_type="edgelist")
What layer_count__gt=2 means:
Selects nodes appearing in more than 2 layers
These are “connector” nodes that participate in multiple contexts
Useful criterion for studying nodes that bridge different social spheres
Alternative criteria:
# High betweenness nodes
influential = Q.nodes().compute("betweenness_centrality").where(
betweenness_centrality__gt=0.1
).execute(network)
# Nodes in specific community
community_nodes = Q.nodes().compute("communities").where(
communities__eq=3
).execute(network)
Expected output:
Selected 34 multilayer nodes
Subnetwork: 34 nodes, 127 edges
Subnetwork mean degree: 7.47
Subnetwork mean betweenness: 0.0892
Workflow integration:
This pattern is often combined with community detection, dynamics simulation, or centrality analysis:
# 1. Select subnetwork
core_nodes = Q.nodes().where(layer_count__gte=2, degree__gt=5).execute(network)
subnetwork = network.subgraph(core_nodes.keys())
# 2. Run community detection on subnetwork
from py3plex.algorithms.community_detection.community_wrapper import louvain_communities
communities = louvain_communities(subnetwork)
# 3. Analyze communities
print(f"Found {len(set(communities.values()))} communities")
# 4. Visualize or export
from py3plex.visualization import draw_multilayer_default
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 10))
# Note: communities dict can be used for node coloring if the visualization function supports it
draw_multilayer_default(subnetwork, ax=ax, display=False)
plt.savefig("core_network_communities.png", dpi=300)
plt.close()
Pattern: Per-Layer Top-K with Coverage (Multi-Layer Hub Detection)
Find nodes that are top-k hubs (by any centrality metric) across all layers. This pattern is essential for identifying nodes that maintain high influence in the entire multilayer structure, not just in isolated layers.
The Problem:
Traditional approaches require manual loops over layers:
# Old approach: manual iteration
layer_top = {}
for layer in network.layers:
res = (
Q.nodes()
.from_layers(L[str(layer)])
.where(degree__gt=1)
.compute("betweenness_centrality")
.order_by("-betweenness_centrality")
.limit(5)
.execute(network)
)
layer_top[layer] = set(res.to_pandas()["id"])
# Find intersection
multi_hubs = set.intersection(*layer_top.values())
The Solution: Grouping and Coverage API
The new DSL supports per-layer operations in a single query:
from py3plex.core import random_generators
from py3plex.dsl import Q, L
# Generate example network
net = random_generators.random_multilayer_ER(n=200, l=3, p=0.05, directed=False)
# Find nodes that are top-5 betweenness hubs in ALL layers (single query!)
multi_hubs = (
Q.nodes()
.from_layers(L["*"]) # wildcard: all layers
.where(degree__gt=1)
.compute("degree", "betweenness_centrality")
.per_layer() # group by layer
.top_k(5, "betweenness_centrality") # top 5 per layer
.end_grouping()
.coverage(mode="all") # nodes in top-5 in ALL layers
.execute(net)
)
df = multi_hubs.to_pandas()
print(f"Multi-layer hubs (in top-5 of ALL layers): {set(df['id'])}")
print(f"Count: {len(df['id'].unique())}")
print(f"\nDetailed results:")
print(df[['id', 'layer', 'degree', 'betweenness_centrality']].to_string())
Expected output:
Multi-layer hubs (in top-5 of ALL layers): {23, 45, 67}
Count: 3
Detailed results:
id layer degree betweenness_centrality
0 23 0 12 0.2456
1 23 1 14 0.2891
2 23 2 11 0.2234
3 45 0 13 0.2567
4 45 1 11 0.2123
5 45 2 15 0.3012
6 67 0 10 0.2001
7 67 1 12 0.2345
8 67 2 13 0.2678
Coverage Modes:
The coverage() method supports multiple modes for cross-layer analysis:
# Mode 1: "all" - intersection (nodes in top-k of ALL layers)
all_layers_hubs = (
Q.nodes()
.from_layers(L["*"])
.compute("degree")
.per_layer()
.top_k(10, "degree")
.end_grouping()
.coverage(mode="all")
.execute(net)
)
# Mode 2: "any" - union (nodes in top-k of AT LEAST ONE layer)
any_layer_hubs = (
Q.nodes()
.from_layers(L["*"])
.compute("degree")
.per_layer()
.top_k(10, "degree")
.end_grouping()
.coverage(mode="any")
.execute(net)
)
# Mode 3: "at_least" - nodes in top-k of at least K layers
two_layer_hubs = (
Q.nodes()
.from_layers(L["*"])
.compute("betweenness_centrality")
.per_layer()
.top_k(5, "betweenness_centrality")
.end_grouping()
.coverage(mode="at_least", k=2) # In at least 2 layers
.execute(net)
)
# Mode 4: "exact" - nodes in top-k of exactly K layers (layer-specific hubs)
single_layer_specialists = (
Q.nodes()
.from_layers(L["*"])
.compute("degree")
.per_layer()
.top_k(10, "degree")
.end_grouping()
.coverage(mode="exact", k=1) # Exactly 1 layer
.execute(net)
)
print(f"Hubs in ALL layers: {len(all_layers_hubs.to_pandas()['id'].unique())}")
print(f"Hubs in ANY layer: {len(any_layer_hubs.to_pandas()['id'].unique())}")
print(f"Hubs in ≥2 layers: {len(two_layer_hubs.to_pandas()['id'].unique())}")
print(f"Layer specialists (exactly 1): {len(single_layer_specialists.to_pandas()['id'].unique())}")
Expected output:
Hubs in ALL layers: 3
Hubs in ANY layer: 27
Hubs in ≥2 layers: 12
Layer specialists (exactly 1): 15
Wildcard Layer Selection:
The L["*"] wildcard automatically expands to all layers in the network:
# All layers
Q.nodes().from_layers(L["*"])
# All layers except one
Q.nodes().from_layers(L["*"] - L["bots"])
# All layers intersected with a specific one (same as selecting that layer)
Q.nodes().from_layers(L["*"] & L["social"])
Use Cases:
Identify persistent influencers: Nodes that maintain high centrality across all contexts (layers)
Find layer specialists: Nodes that are important in only one layer (
mode="exact", k=1)Detect multi-context bridges: Nodes in top-k in at least 2 layers connect different contexts
Community structure analysis: Compare
mode="all"vsmode="any"to understand layer cohesion
Why This Pattern Matters:
In real-world multilayer networks (social media, collaboration networks, biological systems), understanding cross-layer vs. layer-specific importance is crucial:
Email + Phone + Chat network: Who are the omnipresent communicators vs. email-only specialists?
Author collaboration network: Who publishes top papers in multiple fields vs. specialists in one domain?
Transportation network: Which locations are hubs in all modes (bus, train, bike) vs. single-mode hubs?
Performance Note:
The per-layer computation is optimized: measures are computed on the selected nodes after layer filtering, and grouping operations leverage efficient dictionaries. For large networks (>100K nodes), consider filtering with where() before computing expensive metrics like betweenness centrality.
DSL Result Interoperability
DSL query results integrate seamlessly with pandas for data transformation workflows. While QueryResult doesn’t implement pipeline verbs directly, it provides a clean .to_pandas() export that enables the same workflow patterns:
Start with a DSL query to filter and compute metrics
Export to pandas with
.to_pandas()Use pandas operations for additional transformations
Leverage the full pandas ecosystem for analysis and visualization
QueryResult to pandas Workflow
The recommended pattern for combining DSL queries with data transformations:
from py3plex.dsl import Q, L
from py3plex.core import multinet
# Create a sample network
network = multinet.multi_layer_network()
network.add_edges([
['A', 'layer1', 'B', 'layer1', 1],
['B', 'layer1', 'C', 'layer1', 1],
['A', 'layer2', 'C', 'layer2', 1],
], input_type="list")
# Start with DSL query
result = (
Q.nodes()
.from_layers(L["*"])
.compute("degree", "betweenness_centrality")
.execute(network)
)
# Export to pandas for flexible transformations
df = result.to_pandas()
# Continue with pandas operations
df = df[df["degree"] > 5] # Filter
df['influence_score'] = ( # Mutate
df["degree"] * df["betweenness_centrality"]
)
df = df.sort_values('influence_score', ascending=False) # Arrange
print(df.head(10))
What happens here:
DSL phase:
Q.nodes()...execute()filters nodes, computes centrality metricsExport:
.to_pandas()materializes as a DataFramepandas phase: Standard pandas operations for transformation and analysis
Pandas operations equivalent to pipeline verbs:
Filter rows:
df[df["degree"] > 5]Add columns:
df['new_col'] = ...Sort:
df.sort_values('col', ascending=False)Select columns:
df[['col1', 'col2']]Group and aggregate:
df.groupby('col').agg(...)
Verb Mapping Table
The following table shows how concepts map across the three interfaces:
Concept |
String DSL |
Builder DSL |
pandas |
|---|---|---|---|
Filter rows |
|
|
|
Select columns |
|
|
|
Sort/Order |
|
|
|
Group by field |
|
|
|
Add column |
(not available) |
(use pandas after export) |
|
Aggregate |
(not available) |
(use pandas after export) |
|
Limit results |
|
|
|
Design rationale:
DSL: Declarative, optimized for graph queries (layer algebra, centrality, grouping)
pandas: Procedural, flexible for data transformations (arbitrary computations, reshaping)
Workflow: Use DSL for graph-specific operations, export to pandas for data munging
Example: Combined Workflow
A realistic workflow combining both DSL and pandas operations:
from py3plex.dsl import Q, L
# Scenario: Find influential nodes in social network, normalize scores,
# rank within communities, export for visualization
# DSL: Query and compute graph metrics
result = (
Q.nodes()
.from_layers(L["social"])
.where(degree__gt=3)
.compute("degree", "betweenness_centrality", "clustering")
.execute(network)
)
# Export to pandas for transformations
df = result.to_pandas()
# pandas: Transform and enhance data
max_betweenness = df['betweenness_centrality'].max()
# Normalize centrality to [0, 1]
df['norm_betweenness'] = (
df['betweenness_centrality'] / max_betweenness
if max_betweenness > 0 else 0
)
# Composite influence score
df['influence'] = (
0.5 * df['degree'] +
0.3 * df['norm_betweenness'] +
0.2 * (1 - df['clustering'])
)
# Group by community and compute statistics
community_stats = df.groupby('community').agg({
'influence': ['count', 'mean', 'max']
}).round(2)
# Sort communities by average influence
community_stats = community_stats.sort_values(
('influence', 'mean'),
ascending=False
)
print(community_stats)
Expected output:
influence
count mean max
community
5 23 0.72 0.89
2 31 0.68 0.85
8 19 0.61 0.79
1 28 0.58 0.74
...
Why this matters:
Single pipeline: No need to export intermediate results to disk or juggle multiple DataFrames
Flexibility: DSL for graph operations, pandas for everything else
Performance: DSL computes centrality on the multilayer graph once, pandas transforms in-memory
Ecosystem: Full pandas ecosystem available (plotting, statistics, export formats)
When to use each:
DSL alone: Simple queries, need graph-specific operations (centrality, grouping, coverage)
pandas alone: Non-graph data, pure data transformations
Combined (DSL → pandas): Complex analytical workflows, need both graph metrics and custom computations
See How to Build Analysis Pipelines with Dplyr-style Operations for the dplyr-style pipeline API (nodes(), edges() functions) which provides an alternative approach using chainable operations directly on networks.
Next Steps
Now that you understand the DSL, explore these related resources:
DSL Reference (DSL Reference): Complete grammar, all operators, full list of built-in measures, and advanced features (EXPLAIN queries, parameter binding, custom operators)
Dplyr-Style Pipelines (How to Build Analysis Pipelines with Dplyr-style Operations): Combine DSL queries with pipeline operations for more complex data transformation workflows. The pipeline API (
nodes(),mutate(),arrange()) complements the DSL for when you need procedural transformations.Community Detection (How to Run Community Detection on Multilayer Networks): Use DSL queries to select nodes, then apply community detection algorithms. Pattern: query → detect communities → analyze community structure.
Network Dynamics (How to Simulate Multilayer Dynamics): Run dynamics simulations on DSL-selected subnetworks. Pattern: query → extract subnetwork → simulate → analyze outcomes.
Linting and Validation (DSL Reference): The DSL includes a linting subsystem (
py3plex dsl-lint) that checks queries for errors, performance issues, and suggests optimizations. Use it to validate complex queries.Examples Repository (Examples & Recipes): Full scripts showing DSL in context, including data loading, query composition, analysis, and visualization.
Key Takeaways:
Use the builder API (
Q,L) for production code—it’s type-safe, refactorable, and IDE-friendly.Filter early: Add
where()clauses beforecompute()for better performance on large networks.Embrace pandas: Use
.to_pandas()for result analysis—it integrates seamlessly with the scientific Python stack.Layer algebra is powerful:
L["a"] + L["b"](union),L["a"] & L["b"](intersection) enable sophisticated multilayer queries.Temporal queries require timestamped edges/nodes but unlock time-series network analysis.
Community and Support:
Report issues or request features: https://github.com/SkBlaz/py3plex/issues
Example notebooks: https://github.com/SkBlaz/py3plex/tree/main/examples
py3plex documentation: https://skblaz.github.io/py3plex/
Further Reading: The Py3plex Book
For a deeper theoretical and practical treatment of the DSL and multilayer network concepts, see the Py3plex Book:
Chapter 8 — Introduction to the DSL: Motivations, design principles, and comparison with alternatives
Chapter 9 — Builder API Deep Dive: Complete reference with advanced patterns
Chapter 10 — Advanced Queries & Workflows: Complex real-world query examples
The book is available as:
* PDF in the repository: docs/py3plex_book.pdf
* Online HTML (if built): docs/book/
The book provides: * Formal definitions of multilayer network operations * Detailed algorithmic complexity analysis * Extensive case studies with real datasets * Performance benchmarking and optimization strategies