Query Zoo: DSL Gallery for Multilayer Analysis
==============================================

.. meta::
   :description: A gallery of example queries showcasing the py3plex DSL for multilayer network analysis
   :keywords: DSL, multilayer networks, query examples, graph analysis

**The Query Zoo is a curated gallery of DSL queries that demonstrate the expressiveness and power of py3plex for multilayer network analysis.**

Each example:

* Solves a real multilayer analysis problem
* Uses the DSL end-to-end with idiomatic patterns
* Produces concrete, reproducible outputs
* Is fully tested and documented

.. admonition:: Why a Query Zoo?
   :class: note

   The DSL is most powerful when you see it in action on realistic problems. 
   This gallery shows you **how** to think about multilayer queries, not just 
   **what** the syntax is. Use these examples as recipes and starting points 
   for your own analyses.

.. contents::
   :local:
   :depth: 2

Overview
--------

The Query Zoo is organized around common multilayer analysis tasks:

1. **Basic Multilayer Exploration** — Understand layer statistics and structure
2. **Cross-Layer Hubs** — Find nodes that are important across multiple layers
3. **Layer Similarity** — Measure structural alignment between layers
4. **Community Structure** — Detect and analyze multilayer communities
5. **Multiplex PageRank** — Compute multilayer-aware centrality
6. **Robustness Analysis** — Assess network resilience to layer failures
7. **Advanced Centrality Comparison** — Identify versatile vs specialized hubs
8. **Edge Grouping and Coverage** — Analyze edges across layer pairs with top-k and coverage

All examples use small, reproducible multilayer networks from the ``examples/dsl_query_zoo/datasets.py`` module.

.. tip::
   **Running the Examples**
   
   All query functions are available in ``examples/dsl_query_zoo/queries.py``. 
   To run all queries and generate outputs::
   
       python examples/dsl_query_zoo/run_all.py
   
   Test the queries with::
   
       pytest tests/test_dsl_query_zoo.py

----

1. Basic Multilayer Exploration
--------------------------------

**Problem:** You've loaded a multilayer network and want to quickly understand its structure. Which layers are densest? How many nodes and edges does each layer have?

**Solution:** Compute basic statistics per layer using the DSL.

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_basic_exploration
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **First step in any analysis** — Before diving into complex queries, understand your data
* **Reveals layer diversity** — Different layers often have vastly different structures
* **Identifies sparse vs dense layers** — Helps decide which layers need special handling

Example Output
~~~~~~~~~~~~~~

Running on the ``social_work_network`` (12 people across social/work/family layers):

.. csv-table::
   :header: "Layer", "Nodes", "Edges", "Avg Degree"
   :widths: 25, 20, 20, 20

   "social", 12, 11, 1.83
   "work", 11, 9, 1.64
   "family", 11, 6, 1.09

.. image:: ../_static/query_zoo/basic_exploration_plot.png
   :alt: Layer Statistics
   :width: 100%
   :align: center

**Interpretation:** The social layer is densest (highest average degree), while family is sparsest. All layers have similar numbers of nodes, indicating good cross-layer coverage.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* ``Q.nodes().from_layers(L[name])`` — Select nodes from a specific layer
* ``.compute("degree")`` — Add computed attributes to results
* ``.execute(network)`` — Run the query and get results
* ``.to_pandas()`` — Convert to DataFrame for analysis

----

2. Cross-Layer Hubs
-------------------

**Problem:** Which nodes are consistently important across *multiple* layers? These "super hubs" are critical because they bridge different contexts.

**Solution:** Find top-k central nodes per layer, then identify which nodes appear in multiple layers' top lists.

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_cross_layer_hubs
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **Reveals cross-context influence** — Nodes central in one layer might be peripheral in another
* **Identifies key connectors** — Nodes that appear in multiple layers' top-k are especially important
* **Robust hub detection** — More reliable than single-layer centrality

Example Output
~~~~~~~~~~~~~~

Top cross-layer hubs (k=5):

.. csv-table::
   :header: "Node", "Layer", "Degree", "Betweenness", "Layer Count"
   :widths: 20, 20, 15, 20, 15

   "Bob", "social", 3, 0.0273, 3
   "Bob", "work", 2, 0.0, 3
   "Bob", "family", 1, 0.0, 3
   "Alice", "work", 3, 0.0889, 2
   "Charlie", "social", 3, 0.0273, 2

**Interpretation:** Bob appears as a top-5 hub in *all three layers* (layer_count=3), making him the most versatile connector. Alice and Charlie are hubs in two layers each.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* ``.compute("betweenness_centrality", "degree")`` — Compute multiple metrics at once
* ``.order_by("-betweenness_centrality")`` — Sort descending (``-`` prefix)
* ``.limit(k)`` — Take top-k results
* Per-layer iteration and aggregation across layers

----

3. Layer Similarity
-------------------

**Problem:** How similar are different layers structurally? Do they serve redundant or complementary roles?

**Solution:** Compute degree distributions per layer and measure pairwise correlations.

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_layer_similarity
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **Detects redundancy** — High correlation suggests layers capture similar structure
* **Guides simplification** — Nearly identical layers might be merged
* **Reveals specialization** — Low/negative correlation shows layers serve different roles

Example Output
~~~~~~~~~~~~~~

Correlation matrix for ``social_work_network``:

.. image:: ../_static/query_zoo/layer_similarity_heatmap.png
   :alt: Layer Similarity Heatmap
   :width: 80%
   :align: center

.. csv-table::
   :header: "", "social", "work", "family"
   :widths: 25, 25, 25, 25

   "social", "1.000", "0.159", "0.000"
   "work", "0.159", "1.000", "-0.267"
   "family", "0.000", "-0.267", "1.000"

**Interpretation:** Social and work layers have weak positive correlation (0.159), suggesting some structural overlap. Family and work are *negatively* correlated (-0.267), indicating they capture different connectivity patterns.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* Layer-by-layer degree computation
* Aggregating results across layers for meta-analysis
* Using computed attributes for layer-level comparisons

----

4. Community Structure
----------------------

**Problem:** What communities exist in the multilayer network? How do they manifest across layers?

**Solution:** Detect communities using multilayer community detection, then analyze their distribution across layers.

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_community_structure
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **Mesoscale structure** — Communities reveal organizational patterns
* **Cross-layer community tracking** — See if communities are layer-specific or global
* **Dominant layers** — Identify which layer best represents each community

Example Output
~~~~~~~~~~~~~~

Running on ``communication_network`` (email/chat/phone layers):

.. csv-table::
   :header: "Community", "Layer", "Size", "Avg Degree", "Dominant Layer"
   :widths: 15, 15, 15, 20, 25

   0, "email", 10, 1.8, "email"
   1, "chat", 6, 2.17, "chat"
   2, "chat", 3, 1.67, "chat"
   3, "phone", 7, 1.71, "phone"

**Interpretation:** Community 0 is email-dominated (10 nodes), while communities 1 and 2 are chat-specific. Community 3 appears primarily in phone communication.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* ``Q.nodes().from_layers(L["*"])`` — Select from all layers
* ``.compute("communities")`` — Built-in community detection
* Grouping by ``(community_id, layer)`` for analysis
* Identifying dominant layers via aggregation

----

5. Multiplex PageRank
---------------------

**Problem:** Standard PageRank treats each layer independently. How do we compute importance considering the full multiplex structure?

**Solution:** Compute PageRank per layer, then aggregate across layers. (Note: This is a simplified version; true multiplex PageRank uses supra-adjacency matrices.)

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_multiplex_pagerank
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **Multilayer-aware centrality** — Accounts for importance across all layers
* **More robust than single-layer** — Averages out layer-specific biases
* **Essential for multiplex influence** — Key for viral marketing, information diffusion

Example Output
~~~~~~~~~~~~~~

Top nodes by multiplex PageRank in ``transport_network``:

.. csv-table::
   :header: "Node", "Multiplex PR", "Total Degree", "Bus PR", "Metro PR", "Walking PR"
   :widths: 25, 15, 15, 15, 15, 15

   "ShoppingMall", 0.1811, 6, 0.1362, 0.1909, 0.2164
   "Park", 0.1806, 4, 0.1449, 0.0, 0.2164
   "CentralStation", 0.1683, 6, 0.1971, 0.1909, 0.117
   "BusinessDistrict", 0.1484, 4, 0.079, 0.1994, 0.1667

**Interpretation:** ShoppingMall has highest multiplex PageRank (0.1811) because it's central across all three transport modes. Park has high walking PageRank but zero metro, reflecting its limited accessibility.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* ``.compute("pagerank")`` — Built-in PageRank computation
* Per-layer iteration with result aggregation
* Pivot tables for layer-wise breakdowns
* Combining degree and PageRank for richer analysis

----

6. Robustness Analysis
----------------------

**Problem:** How robust is the network to layer failures? What happens if one layer goes offline?

**Solution:** Simulate removing each layer and measure connectivity loss.

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_robustness_analysis
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **Critical infrastructure identification** — Reveals which layers are essential
* **Redundancy assessment** — High robustness indicates good backup coverage
* **Failure planning** — Informs which layers need extra protection

Example Output
~~~~~~~~~~~~~~

Robustness of ``transport_network``:

.. image:: ../_static/query_zoo/robustness_analysis_plot.png
   :alt: Robustness Analysis
   :width: 90%
   :align: center

.. csv-table::
   :header: "Scenario", "Nodes", "Avg Degree", "Total Edges", "Connectivity Loss (%)"
   :widths: 30, 15, 15, 15, 20

   "baseline (all layers)", 14, 2.14, 15, 0.0
   "without bus", 11, 1.45, 8, 46.67
   "without metro", 11, 1.82, 10, 33.33
   "without walking", 14, 2.0, 14, 6.67

**Interpretation:** Removing the bus layer causes 46.67% connectivity loss — it's the most critical layer. Walking is least critical (only 6.67% loss), indicating good redundancy from other transport modes.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* Layer algebra: ``L["layer1"] + L["layer2"]`` — Combine layers
* ``Q.nodes().from_layers(layer_expr)`` — Query with dynamic layer selections
* Baseline vs scenario comparison
* Measuring connectivity metrics before/after perturbations

----

7. Advanced Centrality Comparison
----------------------------------

**Problem:** Different centralities capture different notions of importance. Which nodes are "versatile hubs" (high in many centralities) vs "specialized hubs" (high in only one)?

**Solution:** Compute multiple centralities, normalize them, and classify nodes by how many centralities place them in the top tier.

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_advanced_centrality_comparison
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **Centrality is multifaceted** — Degree ≠ betweenness ≠ closeness ≠ PageRank
* **Versatile hubs are robust** — High across many metrics means genuine importance
* **Specialized hubs reveal roles** — High in one metric reveals specific structural position

Example Output
~~~~~~~~~~~~~~

Running on ``communication_network`` (email layer):

.. csv-table::
   :header: "Node", "Degree", "Betweenness", "Closeness", "PageRank", "Versatility", "Type"
   :widths: 20, 12, 15, 15, 13, 12, 18

   "Manager", 9, 1.0, 1.0, 0.4676, 4, "versatile_hub"
   "Dev1", 1, 0.0, 0.5294, 0.0592, 0, "peripheral"
   "Dev2", 1, 0.0, 0.5294, 0.0592, 0, "peripheral"

**Interpretation:** Manager is a **versatile hub** (top 30% in all 4 centralities). All other nodes are peripheral in this star-topology email network.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* ``.compute("degree", "betweenness_centrality", "closeness_centrality", "pagerank")`` — Compute multiple centralities
* Normalizing centralities for comparison
* Derived metrics (versatility score)
* Classification based on computed attributes

----

8. Edge Grouping and Coverage
------------------------------

**Problem:** You want to analyze which edges (connections) are important within and between layers. Which edges consistently appear in the top-k across different layer-pair contexts?

**Solution:** Use the new ``.per_layer_pair()`` method to group edges by (src_layer, dst_layer) pairs, then apply top-k and coverage filtering.

Query Code
~~~~~~~~~~

.. literalinclude:: ../../examples/dsl_query_zoo/queries.py
   :pyobject: query_edge_grouping_and_coverage
   :language: python

Why It's Interesting
~~~~~~~~~~~~~~~~~~~~

* **Layer-pair-aware analysis** — Different layer pairs may have very different edge patterns
* **Universal edges** — Edges important across multiple contexts are more robust
* **Cross-layer dynamics** — Reveals how connections vary between intra-layer and inter-layer contexts
* **Edge-centric view** — Complements node-centric analyses like hub detection

Example Output
~~~~~~~~~~~~~~

Running on ``social_work_network`` with k=3:

**Edges Grouped by Layer Pair (top 3 per pair):**

.. csv-table::
   :header: "Source", "Target", "Source Layer", "Target Layer"
   :widths: 25, 25, 25, 25

   "Alice", "Bob", "social", "social"
   "Alice", "Carol", "social", "social"
   "Bob", "Carol", "social", "social"
   "Alice", "Bob", "work", "work"
   "Alice", "Carol", "work", "work"
   "Bob", "Carol", "work", "work"

**Group Summary:**

.. csv-table::
   :header: "Source Layer", "Target Layer", "# Edges"
   :widths: 35, 35, 25

   "social", "social", 3
   "work", "work", 3
   "family", "family", 3
   "social", "work", 1

**Interpretation:** The query reveals edge distribution across layer pairs. Each pair (e.g., social-social, work-work) contains up to k=3 edges. Inter-layer pairs (social-work) typically have fewer connections, showing the separation between layers. The family layer has sparser connectivity overall.

DSL Concepts Demonstrated
~~~~~~~~~~~~~~~~~~~~~~~~~~

* ``.per_layer_pair()`` — Group edges by (src_layer, dst_layer) pairs
* ``.top_k(k, "weight")`` — Select top-k items per group
* ``.coverage(mode="at_least", k=2)`` — Cross-group filtering
* ``.group_summary()`` — Get aggregate statistics per group
* Edge-specific grouping metadata in ``QueryResult.meta["grouping"]``

.. tip::
   **New in DSL v2**
   
   Edge grouping and coverage are new features that parallel the existing node 
   grouping capabilities. Use ``.per_layer_pair()`` for edges and ``.per_layer()`` 
   for nodes. Both support the same coverage modes and grouping operations.

----

Using the Query Zoo
-------------------

Getting Started
~~~~~~~~~~~~~~~

1. **Install py3plex** (if not already installed)::

       pip install py3plex

2. **Run a single query**::

       from examples.dsl_query_zoo.datasets import create_social_work_network
       from examples.dsl_query_zoo.queries import query_basic_exploration
       
       net = create_social_work_network(seed=42)
       result = query_basic_exploration(net)
       print(result)

3. **Run all queries**::

       cd examples/dsl_query_zoo
       python run_all.py

4. **Run tests**::

       pytest tests/test_dsl_query_zoo.py -v

Adapting Queries to Your Data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All queries are designed to work with any ``multi_layer_network`` object. To adapt:

1. **Replace the dataset**:

   .. code-block:: python

       from py3plex.core import multinet
       
       # Load your own network
       my_network = multinet.multi_layer_network()
       my_network.load_network("mydata.edgelist", input_type="edgelist_mx")
       
       # Run any query
       result = query_cross_layer_hubs(my_network, k=10)

2. **Adjust parameters**:

   * ``k`` in ``query_cross_layer_hubs`` — Number of top nodes per layer
   * Layer names in filters — Replace ``L["social"]`` with your layer names
   * Centrality thresholds — Adjust percentile cutoffs as needed

3. **Extend queries**:

   All query functions are in ``examples/dsl_query_zoo/queries.py``. Copy, modify, and experiment!

Datasets
~~~~~~~~

Three multilayer networks are provided:

1. **social_work_network**

   * **Layers:** social, work, family
   * **Nodes:** 12 people
   * **Structure:** Overlapping social circles with different connectivity patterns per layer

2. **communication_network**

   * **Layers:** email, chat, phone
   * **Nodes:** 10 people (Manager, Dev team, Marketing, Support, HR)
   * **Structure:** Star topology in email, distributed in chat/phone

3. **transport_network**

   * **Layers:** bus, metro, walking
   * **Nodes:** 8 locations (CentralStation, ShoppingMall, Park, etc.)
   * **Structure:** Bus covers most locations, metro is faster but selective, walking is local

All datasets use fixed random seeds (``seed=42``) for reproducibility.

Further Reading
---------------

* :doc:`query_with_dsl` — Complete DSL reference with syntax and operators
* :doc:`../concepts/multilayer_networks_101` — Theory of multilayer networks
* :doc:`../reference/dsl_reference` — Full DSL grammar and API reference
* :doc:`../tutorials/tutorial_10min` — Quick start tutorial

.. admonition:: Contributing Queries
   :class: tip

   Have an interesting multilayer query pattern? **Contribute it to the Query Zoo!**
   
   1. Add your query function to ``examples/dsl_query_zoo/queries.py``
   2. Add tests to ``tests/test_dsl_query_zoo.py``
   3. Update this documentation page
   4. Submit a pull request!
   
   See :doc:`../project/contributing` for details.