CSV Data Loading Examples ========================== This guide provides practical examples for loading multilayer network data from CSV files, which is one of the most common formats for network data. .. contents:: Table of Contents :local: :depth: 2 Standard CSV Schema ------------------- Multilayer Edge List Format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most common format for multilayer networks is a CSV with these columns: **Required columns:** ``source``, ``target``, ``layer`` **Optional columns:** ``weight`` (defaults to 1.0) **Example CSV file** (save as ``network.csv``): .. code-block:: text source,target,layer,weight A,B,collaboration,1.0 A,C,dependency,0.8 B,C,collaboration,1.0 A,B,dependency,0.5 C,D,collaboration,0.9 Loading Multilayer CSV ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.core import multinet # Create network object network = multinet.multi_layer_network() # Load from CSV network.load_network( "network.csv", input_type="multiedgelist", directed=False ) # Display basic statistics network.basic_stats() **Expected Output:** .. code-block:: text Number of nodes: 4 Number of edges: 5 Number of unique nodes (as node-layer tuples): 7 Number of unique node IDs (across all layers): 4 Nodes per layer: Layer 'collaboration': 4 nodes Layer 'dependency': 3 nodes Simple Edge List Format ~~~~~~~~~~~~~~~~~~~~~~~~ For single-layer networks, use a simpler format: **Required columns:** ``source``, ``target`` **Optional columns:** ``weight`` (defaults to 1.0) **Example CSV file** (save as ``simple_network.csv``): .. code-block:: text source,target,weight A,B,1.0 B,C,0.8 C,D,1.5 D,A,0.9 Loading Simple CSV ~~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.core import multinet network = multinet.multi_layer_network() # Load simple edge list network.load_network( "simple_network.csv", input_type="edgelist", directed=False ) network.basic_stats() Creating CSV from Pandas DataFrame ----------------------------------- Building Network from Existing Data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you have data in a Pandas DataFrame, convert it to CSV format: .. code-block:: python import pandas as pd from py3plex.core import multinet # Create sample data data = { 'source': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie'], 'target': ['Bob', 'Charlie', 'Charlie', 'David', 'David'], 'layer': ['friendship', 'work', 'friendship', 'work', 'friendship'], 'weight': [1.0, 0.8, 1.0, 0.6, 0.9] } df = pd.DataFrame(data) # Save to CSV df.to_csv('social_network.csv', index=False) # Load into py3plex network = multinet.multi_layer_network() network.load_network('social_network.csv', input_type="multiedgelist") print(f"Loaded {network.core_network.number_of_nodes()} nodes") print(f"Loaded {network.core_network.number_of_edges()} edges") Alternative Formats ------------------- Space-Delimited Multiedgelist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Py3plex also supports space-delimited files (no header): **Format:** ``source layer1 target layer2 [weight]`` **Example file** (save as ``network.txt``): .. code-block:: text A layer1 B layer1 1.0 B layer1 C layer1 1.0 A layer2 B layer2 0.8 B layer2 D layer2 0.6 .. code-block:: python # Load space-delimited format network = multinet.multi_layer_network() network.load_network( "network.txt", input_type="multiedgelist", directed=False ) Common Issues and Solutions ---------------------------- Issue: "Layer name missing" error ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Problem:** CSV is missing the 'layer' column **Solution 1:** Add layer column to CSV .. code-block:: python import pandas as pd # Read CSV without layer df = pd.read_csv('network.csv') # Add default layer df['layer'] = 'layer1' # Save updated CSV df.to_csv('network_with_layer.csv', index=False) **Solution 2:** Use 'edgelist' format for single-layer networks .. code-block:: python # If your data has only source/target columns network.load_network( "network.csv", input_type="edgelist", # Use edgelist instead of multiedgelist directed=False ) Issue: "Could not load network" error ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Problem:** CSV format doesn't match expected schema **Solution:** Verify CSV structure .. code-block:: python import pandas as pd # Check CSV columns df = pd.read_csv('network.csv') print("Columns:", df.columns.tolist()) print("\nFirst few rows:") print(df.head()) # Verify required columns exist required = ['source', 'target', 'layer'] missing = [col for col in required if col not in df.columns] if missing: print(f"\nMissing columns: {missing}") else: print("\n[OK] All required columns present") Issue: Encoding errors with special characters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Problem:** CSV contains non-ASCII characters (é, ñ, 中文, etc.) **Solution:** Specify encoding when creating CSV .. code-block:: python import pandas as pd # Save with UTF-8 encoding df.to_csv('network.csv', index=False, encoding='utf-8') # If loading fails, try reading with explicit encoding df = pd.read_csv('network.csv', encoding='utf-8') Complete Example Workflow -------------------------- End-to-End Pipeline: CSV → Analysis → Visualization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import pandas as pd from py3plex.core import multinet from py3plex.visualization.multilayer import draw_multilayer_default from py3plex.algorithms.community_detection import community_louvain import matplotlib.pyplot as plt # Step 1: Create sample CSV data data = { 'source': ['A', 'A', 'B', 'B', 'C', 'C', 'D'], 'target': ['B', 'C', 'C', 'D', 'D', 'E', 'E'], 'layer': ['social', 'work', 'social', 'work', 'social', 'work', 'social'], 'weight': [1.0, 0.8, 1.0, 0.7, 0.9, 0.6, 0.8] } df = pd.DataFrame(data) df.to_csv('example_network.csv', index=False) # Step 2: Load network network = multinet.multi_layer_network() network.load_network('example_network.csv', input_type="multiedgelist") # Step 3: Basic analysis print("=== Network Statistics ===") network.basic_stats() # Step 4: Community detection communities = community_louvain.best_partition(network.core_network) print(f"\nDetected {len(set(communities.values()))} communities") # Step 5: Centrality analysis from py3plex.algorithms.statistics import basic_statistics hubs = basic_statistics.identify_n_hubs(network.core_network, top_n=3) print("\nTop 3 hub nodes:") for node, degree in hubs.items(): print(f" {node}: degree {degree}") # Step 6: Visualization fig, ax = plt.subplots(1, 1, figsize=(10, 8)) draw_multilayer_default( network.get_layers(), display=False, axis=ax, labels=True, background_shape="circle" ) plt.title("Multilayer Network from CSV") plt.savefig('network_visualization.png', dpi=300, bbox_inches='tight') print("\n[OK] Visualization saved to network_visualization.png") # Step 7: Export to NetworkX for further analysis nx_graph = network.to_nx_network() print(f"\n[OK] Exported to NetworkX: {nx_graph.number_of_nodes()} nodes, " f"{nx_graph.number_of_edges()} edges") **Expected Output:** .. code-block:: text === Network Statistics === Number of nodes: 5 Number of edges: 7 Number of unique nodes (as node-layer tuples): 9 Number of unique node IDs (across all layers): 5 Nodes per layer: Layer 'social': 5 nodes Layer 'work': 5 nodes Detected 2 communities Top 3 hub nodes: ('C', 'social'): degree 3 ('D', 'social'): degree 3 ('B', 'social'): degree 2 [OK] Visualization saved to network_visualization.png [OK] Exported to NetworkX: 9 nodes, 7 edges Validation Before Loading -------------------------- Pre-validate CSV Format ~~~~~~~~~~~~~~~~~~~~~~~ Use the validation module to check CSV format before loading: .. code-block:: python from py3plex.validation import validate_network_data, ParsingError try: # Validate before loading validate_network_data('network.csv', 'multiedgelist') print("[OK] Validation passed") # Safe to load network = multinet.multi_layer_network() network.load_network('network.csv', input_type='multiedgelist') except ParsingError as e: print(f"[X] Validation failed:\n{e}") # Fix CSV and try again This performs checks for: - File exists and is readable - Required columns are present - CSV format is valid - Data types are appropriate Tips and Best Practices ------------------------ 1. **Always use column names** in CSV (first row should be header) 2. **Use UTF-8 encoding** for files with special characters 3. **Validate data** before loading large networks 4. **Use meaningful layer names** (e.g., 'friendship', 'work' instead of 'layer1', 'layer2') 5. **Normalize weights** if needed (e.g., scale to [0, 1] range) 6. **Handle missing values** appropriately: .. code-block:: python import pandas as pd # Load CSV and handle missing values df = pd.read_csv('network.csv') # Fill missing weights with default df['weight'] = df['weight'].fillna(1.0) # Remove rows with missing source/target df = df.dropna(subset=['source', 'target', 'layer']) # Save cleaned data df.to_csv('network_cleaned.csv', index=False) Next Steps ---------- After loading CSV data: * :doc:`../getting_started/tutorial_10min` - Learn basic operations and analysis * :doc:`../user_guide/visualization` - Create publication-quality visualizations * :doc:`../user_guide/community_detection` - Detect communities and clusters For more examples, see the `examples/ directory `_ in the GitHub repository.