I/O and Serialization
py3plex provides a comprehensive I/O system for reading and writing multilayer graphs in various formats. The system is designed to be extensible, efficient, and easy to use.
Supported Formats
The I/O system supports multiple file formats, each with different trade-offs:
JSON - Human-readable, widely compatible, good for small to medium networks
JSONL - Streaming JSON format, efficient for large networks
CSV - Spreadsheet-compatible, easy to edit manually
Arrow/Feather - High-performance columnar format (requires pyarrow)
Parquet - Compressed columnar format, best for storage (requires pyarrow)
Basic Usage
The I/O system provides two main functions: read() and write().
Reading Graphs
from py3plex.io import read
# Auto-detect format from extension
graph = read('network.json')
graph = read('network.csv')
graph = read('network.arrow')
# Or specify format explicitly
graph = read('myfile.dat', format='json')
Writing Graphs
from py3plex.io import write
# Auto-detect format from extension
write(graph, 'network.json')
write(graph, 'network.arrow')
write(graph, 'network.parquet')
# Or specify format explicitly
write(graph, 'myfile.dat', format='json')
Creating Graphs with the Schema API
The modern I/O system uses a schema-based API for creating graphs:
from py3plex.io import MultiLayerGraph, Node, Layer, Edge
# Create graph
graph = MultiLayerGraph(
directed=True,
attributes={'name': 'Social Network'}
)
# Add layers
graph.add_layer(Layer(id='facebook', attributes={'type': 'social'}))
graph.add_layer(Layer(id='twitter', attributes={'type': 'social'}))
# Add nodes
graph.add_node(Node(id='alice', attributes={'age': 30}))
graph.add_node(Node(id='bob', attributes={'age': 25}))
# Add edges
graph.add_edge(Edge(
src='alice',
dst='bob',
src_layer='facebook',
dst_layer='facebook',
attributes={'weight': 0.8}
))
Apache Arrow Format
Apache Arrow is a high-performance columnar format designed for efficient data interchange. py3plex supports Arrow through two sub-formats:
Feather - Fast, uncompressed format ideal for temporary storage
Parquet - Compressed format ideal for long-term storage
Installing Arrow Support
Arrow support requires the pyarrow package:
pip install 'py3plex[arrow]'
# or directly
pip install pyarrow
Using Arrow Format
from py3plex.io import read, write
# Feather format (fast, uncompressed)
write(graph, 'network.arrow')
graph = read('network.arrow')
# Parquet format (compressed)
write(graph, 'network.parquet', format='parquet')
graph = read('network.parquet', format='parquet')
Benefits of Arrow Format
Performance: Columnar storage enables fast read/write operations
Compression: Parquet format provides excellent compression ratios
Interoperability: Arrow is an industry-standard format supported by:
pandas, polars (Python data analysis)
Apache Spark (big data processing)
R, Julia (statistical computing)
DuckDB (analytical database)
Type Safety: Schema preservation with strong typing
Zero-Copy: Efficient in-memory representation
Performance Comparison
For a typical multilayer network with 1000 nodes and ~5000 edges:
Format |
Write Time |
Read Time |
File Size |
|---|---|---|---|
Arrow |
0.016s |
0.008s |
0.46 MB |
Parquet |
0.020s |
0.010s |
0.35 MB |
JSON |
0.046s |
0.030s |
1.09 MB |
Arrow format is 2-3x faster for writes and provides 2-3x better compression compared to JSON.
When to Use Each Format
Use Arrow/Feather when:
You need maximum read/write performance
Working with large networks (>10k nodes)
Interoperating with data science tools (pandas, polars)
Building data pipelines
Use Parquet when:
Long-term storage is important
Minimizing storage costs
Sharing data across platforms
Archiving networks
Use JSON when:
Human readability is important
Working with small networks
Debugging or manual editing
Maximum compatibility needed
Use CSV when:
Working with spreadsheet tools (Excel)
Simple edge lists
Manual data entry/editing
CSV Format with Sidecars
CSV format supports optional sidecar files for node and layer attributes:
from py3plex.io import read, write
# Write with sidecars
write(graph, 'edges.csv', format='csv', write_sidecars=True)
# Creates: edges.csv, nodes.csv, layers.csv
# Read with sidecars
graph = read('edges.csv', format='csv',
nodes_file='nodes.csv',
layers_file='layers.csv')
Integration with NetworkX
Convert between py3plex I/O format and NetworkX:
from py3plex.io import read, to_networkx, from_networkx
# Load graph
graph = read('network.json')
# Convert to NetworkX
G = to_networkx(graph, mode='union') # Merge all layers
# or
G = to_networkx(graph, mode='multiplex') # Preserve layers as (node, layer)
# Convert back from NetworkX
graph = from_networkx(G, mode='multiplex')
Example: Complete Workflow
Here’s a complete example demonstrating the I/O system:
from py3plex.io import (
MultiLayerGraph, Node, Layer, Edge,
read, write, to_networkx
)
# Create a multilayer network
graph = MultiLayerGraph(directed=True)
# Add layers
for layer_id in ['social', 'work', 'family']:
graph.add_layer(Layer(id=layer_id))
# Add nodes
for name in ['alice', 'bob', 'charlie']:
graph.add_node(Node(id=name))
# Add edges
edges = [
('alice', 'bob', 'social', 'social', 0.8),
('bob', 'charlie', 'work', 'work', 0.6),
('alice', 'charlie', 'family', 'family', 0.9),
]
for src, dst, src_layer, dst_layer, weight in edges:
graph.add_edge(Edge(
src=src, dst=dst,
src_layer=src_layer, dst_layer=dst_layer,
attributes={'weight': weight}
))
# Save in multiple formats
write(graph, 'network.json')
write(graph, 'network.arrow')
write(graph, 'network.parquet')
# Load back
loaded = read('network.arrow')
# Convert to NetworkX for analysis
G = to_networkx(loaded, mode='union')
# Use NetworkX algorithms
import networkx as nx
centrality = nx.degree_centrality(G)
print(f"Most central node: {max(centrality, key=centrality.get)}")
Checking Supported Formats
You can query which formats are available at runtime:
from py3plex.io import supported_formats
formats = supported_formats()
print(f"Read formats: {formats['read']}")
print(f"Write formats: {formats['write']}")
This is useful for checking if optional dependencies (like pyarrow) are installed.
Schema Validation
The I/O system includes automatic validation:
from py3plex.io import (
MultiLayerGraph, Node, Edge,
ReferentialIntegrityError
)
graph = MultiLayerGraph()
graph.add_node(Node(id='alice'))
try:
# This will fail - bob doesn't exist
graph.add_edge(Edge(
src='alice', dst='bob',
src_layer='l1', dst_layer='l1'
))
except ReferentialIntegrityError as e:
print(f"Validation error: {e}")
Validation ensures:
All edge endpoints reference existing nodes
All edge layers reference existing layers
All attributes are JSON-serializable
No duplicate edges (by src, dst, src_layer, dst_layer, key)
Advanced: Custom Formats
The I/O system is extensible. You can register custom format readers/writers:
from py3plex.io import register_reader, register_writer
def my_reader(filepath, **kwargs):
# Custom reading logic
graph = MultiLayerGraph()
# ... populate graph ...
return graph
def my_writer(graph, filepath, **kwargs):
# Custom writing logic
with open(filepath, 'w') as f:
# ... write graph ...
pass
# Register
register_reader('myformat', my_reader)
register_writer('myformat', my_writer)
# Now you can use it
write(graph, 'network.myformat')
graph = read('network.myformat')
Examples
Complete examples are available in examples/io_and_data/:
example_new_io.py- Comprehensive I/O demonstrationexample_save_to_arrow.py- Apache Arrow format usageexample_save_to_gpickle.py- NetworkX pickle formatexample_save_to_edgelist.py- Edge list formatexample_schema_validation.py- Schema validation examples
See Also
5-Minute Quickstart - Getting started guide
Working with Networks - Basic network operations
py3plex Core Model - NetworkX integration details
Performance and Scalability Best Practices - Performance optimization tips