I/O and Serialization

py3plex provides a comprehensive I/O system for reading and writing multilayer graphs in various formats. The system is designed to be extensible, efficient, and easy to use.

Supported Formats

The I/O system supports multiple file formats, each with different trade-offs:

JSON - Human-readable, widely compatible, good for small to medium networks
JSONL - Streaming JSON format, efficient for large networks
CSV - Spreadsheet-compatible, easy to edit manually
Arrow/Feather - High-performance columnar format (requires pyarrow)
Parquet - Compressed columnar format, best for storage (requires pyarrow)

Basic Usage

The I/O system provides two main functions: read() and write().

Reading Graphs

from py3plex.io import read

# Auto-detect format from extension
graph = read('network.json')
graph = read('network.csv')
graph = read('network.arrow')

# Or specify format explicitly
graph = read('myfile.dat', format='json')

Writing Graphs

from py3plex.io import write

# Auto-detect format from extension
write(graph, 'network.json')
write(graph, 'network.arrow')
write(graph, 'network.parquet')

# Or specify format explicitly
write(graph, 'myfile.dat', format='json')

Creating Graphs with the Schema API

The modern I/O system uses a schema-based API for creating graphs:

from py3plex.io import MultiLayerGraph, Node, Layer, Edge

# Create graph
graph = MultiLayerGraph(
    directed=True,
    attributes={'name': 'Social Network'}
)

# Add layers
graph.add_layer(Layer(id='facebook', attributes={'type': 'social'}))
graph.add_layer(Layer(id='twitter', attributes={'type': 'social'}))

# Add nodes
graph.add_node(Node(id='alice', attributes={'age': 30}))
graph.add_node(Node(id='bob', attributes={'age': 25}))

# Add edges
graph.add_edge(Edge(
    src='alice',
    dst='bob',
    src_layer='facebook',
    dst_layer='facebook',
    attributes={'weight': 0.8}
))

Apache Arrow Format

Apache Arrow is a high-performance columnar format designed for efficient data interchange. py3plex supports Arrow through two sub-formats:

Feather - Fast, uncompressed format ideal for temporary storage
Parquet - Compressed format ideal for long-term storage

Installing Arrow Support

Arrow support requires the pyarrow package:

pip install 'py3plex[arrow]'
# or directly
pip install pyarrow

Using Arrow Format

from py3plex.io import read, write

# Feather format (fast, uncompressed)
write(graph, 'network.arrow')
graph = read('network.arrow')

# Parquet format (compressed)
write(graph, 'network.parquet', format='parquet')
graph = read('network.parquet', format='parquet')

Benefits of Arrow Format

Performance: Columnar storage enables fast read/write operations
Compression: Parquet format provides excellent compression ratios
Interoperability: Arrow is an industry-standard format supported by:
- pandas, polars (Python data analysis)
- Apache Spark (big data processing)
- R, Julia (statistical computing)
- DuckDB (analytical database)
Type Safety: Schema preservation with strong typing
Zero-Copy: Efficient in-memory representation

Performance Comparison

For a typical multilayer network with 1000 nodes and ~5000 edges:

Format	Write Time	Read Time	File Size
Arrow	0.016s	0.008s	0.46 MB
Parquet	0.020s	0.010s	0.35 MB
JSON	0.046s	0.030s	1.09 MB

Arrow format is 2-3x faster for writes and provides 2-3x better compression compared to JSON.

When to Use Each Format

Use Arrow/Feather when:

You need maximum read/write performance
Working with large networks (>10k nodes)
Interoperating with data science tools (pandas, polars)
Building data pipelines

Use Parquet when:

Long-term storage is important
Minimizing storage costs
Sharing data across platforms
Archiving networks

Use JSON when:

Human readability is important
Working with small networks
Debugging or manual editing
Maximum compatibility needed

Use CSV when:

Working with spreadsheet tools (Excel)
Simple edge lists
Manual data entry/editing

CSV Format with Sidecars

CSV format supports optional sidecar files for node and layer attributes:

from py3plex.io import read, write

# Write with sidecars
write(graph, 'edges.csv', format='csv', write_sidecars=True)
# Creates: edges.csv, nodes.csv, layers.csv

# Read with sidecars
graph = read('edges.csv', format='csv',
             nodes_file='nodes.csv',
             layers_file='layers.csv')

Integration with NetworkX

Convert between py3plex I/O format and NetworkX:

from py3plex.io import read, to_networkx, from_networkx

# Load graph
graph = read('network.json')

# Convert to NetworkX
G = to_networkx(graph, mode='union')  # Merge all layers
# or
G = to_networkx(graph, mode='multiplex')  # Preserve layers as (node, layer)

# Convert back from NetworkX
graph = from_networkx(G, mode='multiplex')

Example: Complete Workflow

Here’s a complete example demonstrating the I/O system:

from py3plex.io import (
    MultiLayerGraph, Node, Layer, Edge,
    read, write, to_networkx
)

# Create a multilayer network
graph = MultiLayerGraph(directed=True)

# Add layers
for layer_id in ['social', 'work', 'family']:
    graph.add_layer(Layer(id=layer_id))

# Add nodes
for name in ['alice', 'bob', 'charlie']:
    graph.add_node(Node(id=name))

# Add edges
edges = [
    ('alice', 'bob', 'social', 'social', 0.8),
    ('bob', 'charlie', 'work', 'work', 0.6),
    ('alice', 'charlie', 'family', 'family', 0.9),
]

for src, dst, src_layer, dst_layer, weight in edges:
    graph.add_edge(Edge(
        src=src, dst=dst,
        src_layer=src_layer, dst_layer=dst_layer,
        attributes={'weight': weight}
    ))

# Save in multiple formats
write(graph, 'network.json')
write(graph, 'network.arrow')
write(graph, 'network.parquet')

# Load back
loaded = read('network.arrow')

# Convert to NetworkX for analysis
G = to_networkx(loaded, mode='union')

# Use NetworkX algorithms
import networkx as nx
centrality = nx.degree_centrality(G)
print(f"Most central node: {max(centrality, key=centrality.get)}")

Checking Supported Formats

You can query which formats are available at runtime:

from py3plex.io import supported_formats

formats = supported_formats()
print(f"Read formats: {formats['read']}")
print(f"Write formats: {formats['write']}")

This is useful for checking if optional dependencies (like pyarrow) are installed.

Schema Validation

The I/O system includes automatic validation:

from py3plex.io import (
    MultiLayerGraph, Node, Edge,
    ReferentialIntegrityError
)

graph = MultiLayerGraph()
graph.add_node(Node(id='alice'))

try:
    # This will fail - bob doesn't exist
    graph.add_edge(Edge(
        src='alice', dst='bob',
        src_layer='l1', dst_layer='l1'
    ))
except ReferentialIntegrityError as e:
    print(f"Validation error: {e}")

Validation ensures:

All edge endpoints reference existing nodes
All edge layers reference existing layers
All attributes are JSON-serializable
No duplicate edges (by src, dst, src_layer, dst_layer, key)

Advanced: Custom Formats

The I/O system is extensible. You can register custom format readers/writers:

from py3plex.io import register_reader, register_writer

def my_reader(filepath, **kwargs):
    # Custom reading logic
    graph = MultiLayerGraph()
    # ... populate graph ...
    return graph

def my_writer(graph, filepath, **kwargs):
    # Custom writing logic
    with open(filepath, 'w') as f:
        # ... write graph ...
        pass

# Register
register_reader('myformat', my_reader)
register_writer('myformat', my_writer)

# Now you can use it
write(graph, 'network.myformat')
graph = read('network.myformat')

Examples

Complete examples are available in examples/io_and_data/:

example_new_io.py - Comprehensive I/O demonstration
example_save_to_arrow.py - Apache Arrow format usage
example_save_to_gpickle.py - NetworkX pickle format
example_save_to_edgelist.py - Edge list format
example_schema_validation.py - Schema validation examples