Dplyr-style Chainable Graph Operations ======================================= .. contents:: Table of Contents :local: :depth: 2 Overview -------- Py3plex provides a dplyr-style, fluent method-chaining interface for working with nodes and edges in multilayer networks. This API is inspired by R's dplyr package and provides verbs like ``filter``, ``select``, ``mutate``, ``arrange``, ``group_by``, and ``summarise``. The API enables you to express complex data manipulations as readable chains of operations:: from py3plex.graph_ops import nodes, edges import numpy as np df = ( nodes(multinet, layers=["ppi"]) .filter(lambda n: n["degree"] > 10) .mutate(k=lambda n: n["degree"] / (n["weight"] + 1)) .group_by("layer") .summarise(avg_degree=("degree", np.mean)) .to_pandas() ) The graph_ops module is particularly useful for: - **Fluent data manipulation**: Chain operations naturally without intermediate variables - **Aggregation and summarization**: Group data and compute statistics - **Integration with pandas**: Export results directly to DataFrames - **Filtering and transformation**: Apply complex logic to nodes and edges Key Concepts ------------ NodeFrame and EdgeFrame ~~~~~~~~~~~~~~~~~~~~~~~ The two main "frame" types wrap collections of nodes or edges: - **NodeFrame**: A chainable view over a collection of nodes - **EdgeFrame**: A chainable view over a collection of edges Both wrap: - A reference to the underlying py3plex graph object (multinet) - A current selection of nodes or edges (as a list of dicts) All verbs return a new NodeFrame/EdgeFrame, enabling method chaining. Top-level Helpers ~~~~~~~~~~~~~~~~~ Use the ``nodes()`` and ``edges()`` functions to create frames:: from py3plex.graph_ops import nodes, edges # Get all nodes from the network node_frame = nodes(multinet) # Get nodes from specific layers node_frame = nodes(multinet, layers=["layer1", "layer2"]) # Get all edges edge_frame = edges(multinet) # Get edges from specific layers edge_frame = edges(multinet, layers=["ppi"]) Quick Start Example ------------------- Here's a complete working example:: from py3plex.core import multinet from py3plex.graph_ops import nodes, edges import numpy as np # Create a multilayer network network = multinet.multi_layer_network(directed=False) # Add nodes network.add_nodes([ {'source': 'A', 'type': 'layer1'}, {'source': 'B', 'type': 'layer1'}, {'source': 'C', 'type': 'layer1'}, {'source': 'D', 'type': 'layer1'}, {'source': 'A', 'type': 'layer2'}, {'source': 'B', 'type': 'layer2'}, ]) # Add edges network.add_edges([ {'source': 'A', 'target': 'B', 'source_type': 'layer1', 'target_type': 'layer1', 'weight': 1.0}, {'source': 'B', 'target': 'C', 'source_type': 'layer1', 'target_type': 'layer1', 'weight': 2.0}, {'source': 'C', 'target': 'D', 'source_type': 'layer1', 'target_type': 'layer1', 'weight': 3.0}, {'source': 'A', 'target': 'B', 'source_type': 'layer2', 'target_type': 'layer2', 'weight': 0.5}, ]) # Example 1: Filter + mutate + to_pandas df = ( nodes(network, layers=["layer1"]) .filter(lambda n: n["degree"] > 1) .mutate(k=lambda n: n["degree"] / (n.get("weight", 1) + 1)) .to_pandas() ) print(df) # Example 2: Grouping and summarising df_summary = ( nodes(network) .group_by("layer") .summarise( avg_degree=("degree", np.mean), n=("id", len), ) .arrange("avg_degree", reverse=True) .to_pandas() ) print(df_summary) # Example 3: Edges df_edges = ( edges(network, layers=["layer1"]) .filter(lambda e: e.get("weight", 0) > 1.5) .head(10) .to_pandas() ) print(df_edges) Verb Reference -------------- Filter ~~~~~~ Filter nodes/edges using a predicate function:: # Using a lambda predicate result = nodes(network).filter(lambda n: n["degree"] > 5) # Filter edges by weight result = edges(network).filter(lambda e: e.get("weight", 0) > 0.8) Or use expression strings for simple conditions:: result = nodes(network).filter_expr("degree > 10 and layer == 'ppi'") **Signature:** .. code-block:: python def filter(self, predicate: Callable[[dict], bool]) -> "NodeFrame": ... def filter_expr(self, expr: str) -> "NodeFrame": ... Select ~~~~~~ Keep only the specified attributes:: result = nodes(network).select("id", "layer", "degree") **Signature:** .. code-block:: python def select(self, *fields: str) -> "NodeFrame": ... If no fields are passed, behaves as a no-op. Mutate ~~~~~~ Compute new attributes from existing values:: import math result = nodes(network).mutate( k=lambda n: n["degree"] / (n.get("weight", 1) + 1), log_degree=lambda n: math.log1p(n["degree"]), ) **Signature:** .. code-block:: python def mutate(self, **new_fields: Callable[[dict], Any]) -> "NodeFrame": ... Arrange (Sort) ~~~~~~~~~~~~~~ Sort nodes/edges by an attribute or custom key:: # Sort by attribute name result = nodes(network).arrange("degree", reverse=True) # Sort using a custom key function result = nodes(network).arrange(lambda n: -n["degree"]) **Signature:** .. code-block:: python def arrange(self, key: Union[str, Callable[[dict], Any]], reverse: bool = False) -> "NodeFrame": ... Head (Take) ~~~~~~~~~~~ Keep only the first n rows:: result = nodes(network).head(10) # First 10 nodes **Signature:** .. code-block:: python def head(self, n: int = 5) -> "NodeFrame": ... Group By + Summarise ~~~~~~~~~~~~~~~~~~~~ Group data and compute aggregations:: import numpy as np result = ( nodes(network) .group_by("layer") .summarise( avg_degree=("degree", np.mean), n=("id", len), ) ) **Signatures:** .. code-block:: python def group_by(self, *fields: str) -> "GroupedNodeFrame": ... def summarise(self, **aggregations: tuple[str, Callable[[list[Any]], Any]]) -> "NodeFrame": ... The aggregations are tuples of ``(field_name, aggregation_function)``. Export Methods -------------- to_pandas ~~~~~~~~~ Convert the current selection to a pandas DataFrame:: df = nodes(network).to_pandas() # Chain with other operations df = ( nodes(network) .filter(lambda n: n["degree"] > 5) .select("id", "layer", "degree") .to_pandas() ) **Signature:** .. code-block:: python def to_pandas(self) -> pandas.DataFrame: ... to_subgraph ~~~~~~~~~~~ Build a py3plex subgraph containing only the selected nodes:: subgraph = ( nodes(network) .filter(lambda n: n["layer"] == "ppi") .to_subgraph() ) **Signature:** .. code-block:: python def to_subgraph(self) -> Any: ... Mapping to dplyr Concepts ------------------------- The following table shows the correspondence between dplyr verbs and graph_ops methods: +-------------------+------------------------------------------+ | dplyr Verb | graph_ops Method | +===================+==========================================+ | ``dplyr::filter`` | ``NodeFrame.filter / EdgeFrame.filter`` | +-------------------+------------------------------------------+ | ``dplyr::select`` | ``NodeFrame.select / EdgeFrame.select`` | +-------------------+------------------------------------------+ | ``dplyr::mutate`` | ``NodeFrame.mutate / EdgeFrame.mutate`` | +-------------------+------------------------------------------+ | ``dplyr::arrange``| ``NodeFrame.arrange / EdgeFrame.arrange``| +-------------------+------------------------------------------+ | ``dplyr::head`` | ``NodeFrame.head / EdgeFrame.head`` | +-------------------+------------------------------------------+ | ``dplyr::group_by``| ``NodeFrame.group_by`` | +-------------------+------------------------------------------+ | ``dplyr::summarise``| ``GroupedNodeFrame.summarise`` | +-------------------+------------------------------------------+ .. note:: There is no direct equivalent of joins or relational joins yet. The design is open for future extensions like joining on node ID, layer, etc. Complete Examples ----------------- Example 1: Hub Identification ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Find and analyze hub nodes across layers:: from py3plex.core import multinet from py3plex.graph_ops import nodes import numpy as np network = multinet.multi_layer_network(directed=False) # ... add nodes and edges ... # Find high-degree nodes (hubs) in each layer hub_summary = ( nodes(network) .filter(lambda n: n["degree"] >= 5) # Only high-degree nodes .group_by("layer") .summarise( hub_count=("id", len), avg_hub_degree=("degree", np.mean), max_degree=("degree", max), ) .arrange("hub_count", reverse=True) .to_pandas() ) print(hub_summary) Example 2: Layer Comparison ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compare network properties across different layers:: from py3plex.graph_ops import nodes import numpy as np # Get per-layer statistics layer_stats = ( nodes(network) .group_by("layer") .summarise( node_count=("id", len), avg_degree=("degree", np.mean), total_degree=("degree", sum), ) .to_pandas() ) print(layer_stats) Example 3: Edge Filtering and Analysis ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Filter edges and compute statistics:: from py3plex.graph_ops import edges # Find high-weight edges high_weight_edges = ( edges(network, layers=["ppi", "coexpr"]) .filter(lambda e: e.get("weight", 0) > 0.8) .mutate( is_intra_layer=lambda e: e["source_layer"] == e["target_layer"], weight_class=lambda e: "high" if e.get("weight", 0) > 0.9 else "medium", ) .arrange("weight", reverse=True) .head(100) .to_pandas() ) print(high_weight_edges) Example 4: Subgraph Extraction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a subgraph from filtered nodes:: from py3plex.graph_ops import nodes # Extract a subgraph of high-degree nodes in layer1 subgraph = ( nodes(network) .filter(lambda n: n["layer"] == "layer1") .filter(lambda n: n["degree"] >= 3) .to_subgraph() ) # Analyze the subgraph print(f"Subgraph has {len(list(subgraph.get_nodes()))} nodes") API Reference ------------- nodes() ~~~~~~~ .. code-block:: python def nodes(multinet: Any, layers: Optional[List[str]] = None) -> NodeFrame: """Create a NodeFrame from a py3plex multi_layer_network. Args: multinet: py3plex multi_layer_network object layers: Optional list of layers to restrict to Returns: A NodeFrame wrapping the network's nodes """ edges() ~~~~~~~ .. code-block:: python def edges(multinet: Any, layers: Optional[List[str]] = None) -> EdgeFrame: """Create an EdgeFrame from a py3plex multi_layer_network. Args: multinet: py3plex multi_layer_network object layers: Optional list of layers to restrict to Returns: An EdgeFrame wrapping the network's edges """ NodeFrame ~~~~~~~~~ .. code-block:: python @dataclass class NodeFrame: """A chainable view over a collection of nodes. Attributes: multinet: Reference to the underlying py3plex multi_layer_network data: Current selection of nodes as a list of dicts """ def filter(self, predicate: Callable[[dict], bool]) -> "NodeFrame": ... def filter_expr(self, expr: str) -> "NodeFrame": ... def select(self, *fields: str) -> "NodeFrame": ... def mutate(self, **new_fields: Callable[[dict], Any]) -> "NodeFrame": ... def arrange(self, key: Union[str, Callable], reverse: bool = False) -> "NodeFrame": ... def head(self, n: int = 5) -> "NodeFrame": ... def group_by(self, *fields: str) -> "GroupedNodeFrame": ... def to_pandas(self) -> pandas.DataFrame: ... def to_subgraph(self) -> Any: ... EdgeFrame ~~~~~~~~~ .. code-block:: python @dataclass class EdgeFrame: """A chainable view over a collection of edges. Attributes: multinet: Reference to the underlying py3plex multi_layer_network data: Current selection of edges as a list of dicts """ def filter(self, predicate: Callable[[dict], bool]) -> "EdgeFrame": ... def filter_expr(self, expr: str) -> "EdgeFrame": ... def select(self, *fields: str) -> "EdgeFrame": ... def mutate(self, **new_fields: Callable[[dict], Any]) -> "EdgeFrame": ... def arrange(self, key: Union[str, Callable], reverse: bool = False) -> "EdgeFrame": ... def head(self, n: int = 5) -> "EdgeFrame": ... def group_by(self, *fields: str) -> "GroupedEdgeFrame": ... def to_pandas(self) -> pandas.DataFrame: ... GroupedNodeFrame ~~~~~~~~~~~~~~~~ .. code-block:: python @dataclass class GroupedNodeFrame: """A grouped view of nodes for aggregation operations. Attributes: parent: The parent NodeFrame from which this was created group_fields: Tuple of field names to group by """ def summarise(self, **aggregations: tuple) -> "NodeFrame": ... def summarize(self, **aggregations: tuple) -> "NodeFrame": ... # Alias Best Practices -------------- 1. **Start simple**: Begin with basic filters and add complexity incrementally 2. **Use method chaining**: Chain operations for readable, fluent code 3. **Handle missing values**: Use ``.get()`` for optional attributes:: frame.mutate(x=lambda n: n.get("weight", 1) * 2) 4. **Export early for debugging**: Use ``.to_pandas()`` to inspect intermediate results 5. **Filter before grouping**: Reduce data volume before aggregation for better performance 6. **Name aggregations clearly**: Use descriptive names in ``summarise()`` Comparison with DSL ------------------- The graph_ops module complements the existing SQL-like DSL: .. list-table:: DSL vs graph_ops Comparison :header-rows: 1 :widths: 20 40 40 * - Feature - DSL - graph_ops * - **Syntax** - SQL-like strings or Builder API - Python method chaining * - **Filtering** - WHERE clauses - ``.filter()`` with lambdas * - **Aggregation** - COMPUTE measures - ``.group_by().summarise()`` * - **Custom logic** - Limited - Full Python expressiveness * - **Type safety** - None (strings) / Partial (Builder API) - Full with type hints * - **Best for** - Quick queries, exploration - Complex transformations, data pipelines When to Use Which ~~~~~~~~~~~~~~~~~ **Use DSL when:** - You need quick, exploratory queries - You want SQL-like syntax for familiar, readable code - You're doing simple filtering and centrality computation - You need to quickly prototype an analysis **Use graph_ops when:** - You need complex data transformations with custom logic - You want full type hints and IDE autocompletion - You're building reusable analysis pipelines - You need grouping and aggregation operations - You're integrating with pandas workflows See Also -------- - :doc:`dsl` - SQL-like DSL for network queries (use for quick exploratory queries) - :doc:`networks` - Working with multilayer networks - :doc:`statistics` - Network statistics and measures