I/O and Serialization ====================== py3plex provides a comprehensive I/O system for reading and writing multilayer graphs in various formats. The system is designed to be extensible, efficient, and easy to use. Supported Formats ----------------- The I/O system supports multiple file formats, each with different trade-offs: * **JSON** - Human-readable, widely compatible, good for small to medium networks * **JSONL** - Streaming JSON format, efficient for large networks * **CSV** - Spreadsheet-compatible, easy to edit manually * **Arrow/Feather** - High-performance columnar format (requires pyarrow) * **Parquet** - Compressed columnar format, best for storage (requires pyarrow) Basic Usage ----------- The I/O system provides two main functions: ``read()`` and ``write()``. Reading Graphs ~~~~~~~~~~~~~~ .. code-block:: python from py3plex.io import read # Auto-detect format from extension graph = read('network.json') graph = read('network.csv') graph = read('network.arrow') # Or specify format explicitly graph = read('myfile.dat', format='json') Writing Graphs ~~~~~~~~~~~~~~ .. code-block:: python from py3plex.io import write # Auto-detect format from extension write(graph, 'network.json') write(graph, 'network.arrow') write(graph, 'network.parquet') # Or specify format explicitly write(graph, 'myfile.dat', format='json') Creating Graphs with the Schema API ------------------------------------ The modern I/O system uses a schema-based API for creating graphs: .. code-block:: python from py3plex.io import MultiLayerGraph, Node, Layer, Edge # Create graph graph = MultiLayerGraph( directed=True, attributes={'name': 'Social Network'} ) # Add layers graph.add_layer(Layer(id='facebook', attributes={'type': 'social'})) graph.add_layer(Layer(id='twitter', attributes={'type': 'social'})) # Add nodes graph.add_node(Node(id='alice', attributes={'age': 30})) graph.add_node(Node(id='bob', attributes={'age': 25})) # Add edges graph.add_edge(Edge( src='alice', dst='bob', src_layer='facebook', dst_layer='facebook', attributes={'weight': 0.8} )) Apache Arrow Format ------------------- Apache Arrow is a high-performance columnar format designed for efficient data interchange. py3plex supports Arrow through two sub-formats: * **Feather** - Fast, uncompressed format ideal for temporary storage * **Parquet** - Compressed format ideal for long-term storage Installing Arrow Support ~~~~~~~~~~~~~~~~~~~~~~~~~ Arrow support requires the pyarrow package: .. code-block:: bash pip install 'py3plex[arrow]' # or directly pip install pyarrow Using Arrow Format ~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.io import read, write # Feather format (fast, uncompressed) write(graph, 'network.arrow') graph = read('network.arrow') # Parquet format (compressed) write(graph, 'network.parquet', format='parquet') graph = read('network.parquet', format='parquet') Benefits of Arrow Format ~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Performance**: Columnar storage enables fast read/write operations 2. **Compression**: Parquet format provides excellent compression ratios 3. **Interoperability**: Arrow is an industry-standard format supported by: - pandas, polars (Python data analysis) - Apache Spark (big data processing) - R, Julia (statistical computing) - DuckDB (analytical database) 4. **Type Safety**: Schema preservation with strong typing 5. **Zero-Copy**: Efficient in-memory representation Performance Comparison ~~~~~~~~~~~~~~~~~~~~~~ For a typical multilayer network with 1000 nodes and ~5000 edges: +---------+------------+-----------+-------------+ | Format | Write Time | Read Time | File Size | +=========+============+===========+=============+ | Arrow | 0.016s | 0.008s | 0.46 MB | +---------+------------+-----------+-------------+ | Parquet | 0.020s | 0.010s | 0.35 MB | +---------+------------+-----------+-------------+ | JSON | 0.046s | 0.030s | 1.09 MB | +---------+------------+-----------+-------------+ Arrow format is **2-3x faster** for writes and provides **2-3x better compression** compared to JSON. When to Use Each Format ~~~~~~~~~~~~~~~~~~~~~~~ **Use Arrow/Feather when:** - You need maximum read/write performance - Working with large networks (>10k nodes) - Interoperating with data science tools (pandas, polars) - Building data pipelines **Use Parquet when:** - Long-term storage is important - Minimizing storage costs - Sharing data across platforms - Archiving networks **Use JSON when:** - Human readability is important - Working with small networks - Debugging or manual editing - Maximum compatibility needed **Use CSV when:** - Working with spreadsheet tools (Excel) - Simple edge lists - Manual data entry/editing CSV Format with Sidecars ------------------------- CSV format supports optional sidecar files for node and layer attributes: .. code-block:: python from py3plex.io import read, write # Write with sidecars write(graph, 'edges.csv', format='csv', write_sidecars=True) # Creates: edges.csv, nodes.csv, layers.csv # Read with sidecars graph = read('edges.csv', format='csv', nodes_file='nodes.csv', layers_file='layers.csv') Integration with NetworkX -------------------------- Convert between py3plex I/O format and NetworkX: .. code-block:: python from py3plex.io import read, to_networkx, from_networkx # Load graph graph = read('network.json') # Convert to NetworkX G = to_networkx(graph, mode='union') # Merge all layers # or G = to_networkx(graph, mode='multiplex') # Preserve layers as (node, layer) # Convert back from NetworkX graph = from_networkx(G, mode='multiplex') Example: Complete Workflow --------------------------- Here's a complete example demonstrating the I/O system: .. code-block:: python from py3plex.io import ( MultiLayerGraph, Node, Layer, Edge, read, write, to_networkx ) # Create a multilayer network graph = MultiLayerGraph(directed=True) # Add layers for layer_id in ['social', 'work', 'family']: graph.add_layer(Layer(id=layer_id)) # Add nodes for name in ['alice', 'bob', 'charlie']: graph.add_node(Node(id=name)) # Add edges edges = [ ('alice', 'bob', 'social', 'social', 0.8), ('bob', 'charlie', 'work', 'work', 0.6), ('alice', 'charlie', 'family', 'family', 0.9), ] for src, dst, src_layer, dst_layer, weight in edges: graph.add_edge(Edge( src=src, dst=dst, src_layer=src_layer, dst_layer=dst_layer, attributes={'weight': weight} )) # Save in multiple formats write(graph, 'network.json') write(graph, 'network.arrow') write(graph, 'network.parquet') # Load back loaded = read('network.arrow') # Convert to NetworkX for analysis G = to_networkx(loaded, mode='union') # Use NetworkX algorithms import networkx as nx centrality = nx.degree_centrality(G) print(f"Most central node: {max(centrality, key=centrality.get)}") Checking Supported Formats --------------------------- You can query which formats are available at runtime: .. code-block:: python from py3plex.io import supported_formats formats = supported_formats() print(f"Read formats: {formats['read']}") print(f"Write formats: {formats['write']}") This is useful for checking if optional dependencies (like pyarrow) are installed. Schema Validation ----------------- The I/O system includes automatic validation: .. code-block:: python from py3plex.io import ( MultiLayerGraph, Node, Edge, ReferentialIntegrityError ) graph = MultiLayerGraph() graph.add_node(Node(id='alice')) try: # This will fail - bob doesn't exist graph.add_edge(Edge( src='alice', dst='bob', src_layer='l1', dst_layer='l1' )) except ReferentialIntegrityError as e: print(f"Validation error: {e}") Validation ensures: 1. All edge endpoints reference existing nodes 2. All edge layers reference existing layers 3. All attributes are JSON-serializable 4. No duplicate edges (by src, dst, src_layer, dst_layer, key) Advanced: Custom Formats ------------------------- The I/O system is extensible. You can register custom format readers/writers: .. code-block:: python from py3plex.io import register_reader, register_writer def my_reader(filepath, **kwargs): # Custom reading logic graph = MultiLayerGraph() # ... populate graph ... return graph def my_writer(graph, filepath, **kwargs): # Custom writing logic with open(filepath, 'w') as f: # ... write graph ... pass # Register register_reader('myformat', my_reader) register_writer('myformat', my_writer) # Now you can use it write(graph, 'network.myformat') graph = read('network.myformat') Examples -------- Complete examples are available in ``examples/io_and_data/``: * ``example_new_io.py`` - Comprehensive I/O demonstration * ``example_save_to_arrow.py`` - Apache Arrow format usage * ``example_save_to_gpickle.py`` - NetworkX pickle format * ``example_save_to_edgelist.py`` - Edge list format * ``example_schema_validation.py`` - Schema validation examples See Also -------- * :doc:`../getting_started/quickstart_5min` - Getting started guide * :doc:`networks` - Basic network operations * :doc:`../concepts/py3plex_core_model` - NetworkX integration details * :doc:`../deployment/performance_scalability` - Performance optimization tips