Community Detection Tutorial ============================= This tutorial demonstrates community detection algorithms for multilayer networks using py3plex. Overview -------- Community detection identifies groups of nodes that are more densely connected to each other than to the rest of the network. For multilayer networks, communities can span multiple layers, accounting for both intra-layer and inter-layer structure. Supported Algorithms -------------------- py3plex provides several community detection algorithms: * **Louvain** - Fast modularity optimization (recommended for most use cases) * **Infomap** - Flow-based community detection (requires external binary) * **Label Propagation** - Semi-supervised approach with known seed communities * **Multilayer Modularity** - True multilayer community detection (Mucha et al. 2010) Louvain Algorithm ----------------- Basic Usage ~~~~~~~~~~~ Fastest algorithm for large networks: .. code-block:: python from py3plex.core import multinet from py3plex.algorithms.community_detection import community_louvain # Create or load network network = multinet.multi_layer_network() network.load_network("data.graphml", input_type="graphml") # Detect communities using Louvain communities = community_louvain.best_partition(network.core_network) # Print results for node, community_id in communities.items(): print(f"Node {node} -> Community {community_id}") Parameters ~~~~~~~~~~ .. code-block:: python # With custom resolution parameter communities = community_louvain.best_partition( network.core_network, resolution=1.0 # Higher = more communities, Lower = fewer communities ) **Resolution parameter:** * ``resolution=1.0`` - Standard modularity * ``resolution>1.0`` - More, smaller communities * ``resolution<1.0`` - Fewer, larger communities Advantages ~~~~~~~~~~ * Very fast: :math:`O(n \\log n)` * Scales to millions of nodes * BSD license (commercial-friendly) * Well-established and widely used Disadvantages ~~~~~~~~~~~~~ * Non-deterministic (random initialization) * Cannot find overlapping communities * Resolution limit issues Infomap Algorithm ----------------- Basic Usage ~~~~~~~~~~~ Flow-based approach for detecting communities: .. code-block:: python from py3plex.algorithms.community_detection import community_wrapper # Detect communities using Infomap communities = community_wrapper.infomap_communities( network.core_network, binary_path="/path/to/infomap" # Path to Infomap binary ) With Hierarchical Structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get hierarchical community structure hierarchical_communities = community_wrapper.infomap_communities( network.core_network, binary_path="/path/to/infomap", hierarchical=True ) Advantages ~~~~~~~~~~ * Can detect overlapping communities * Flow-based (natural for many applications) * Hierarchical structure * Information-theoretic foundation Disadvantages ~~~~~~~~~~~~~ * Requires external binary * AGPLv3 license (viral copyleft - problematic for commercial use) * Slower than Louvain Label Propagation ----------------- Semi-Supervised Detection ~~~~~~~~~~~~~~~~~~~~~~~~~ Use when you have some known community memberships: .. code-block:: python from py3plex.algorithms.community_detection import label_propagation # Provide seed labels for some nodes seed_labels = { 'node1': 0, 'node2': 0, 'node3': 1, 'node4': 1 } # Propagate labels to unlabeled nodes communities = label_propagation.propagate( network.core_network, seed_labels=seed_labels, max_iter=100 ) Fully Unsupervised ~~~~~~~~~~~~~~~~~~ .. code-block:: python # Without seed labels (random initialization) communities = label_propagation.propagate( network.core_network, max_iter=100 ) Advantages ~~~~~~~~~~ * Very fast: :math:`O(m)` linear in edges * Can incorporate prior knowledge * MIT license * Simple and interpretable Disadvantages ~~~~~~~~~~~~~ * Non-deterministic * Sensitive to initialization * May not converge * Lower quality than Louvain/Infomap Multilayer Modularity --------------------- True Multilayer Detection ~~~~~~~~~~~~~~~~~~~~~~~~~ Accounts for multilayer structure following Mucha et al. (2010): .. code-block:: python from py3plex.algorithms.community_detection import multilayer_modularity as mlm # Get supra-adjacency matrix supra_adj = network.get_supra_adjacency_matrix(sparse=True) # Detect communities with multilayer modularity communities = mlm.multilayer_louvain( supra_adj, gamma=1.0, # Resolution parameter omega=1.0 # Inter-layer coupling strength ) Parameter Tuning ~~~~~~~~~~~~~~~~ .. code-block:: python # Emphasize layer-specific structure communities = mlm.multilayer_louvain( supra_adj, gamma=1.0, omega=0.1 # Low coupling = layer-specific communities ) # Emphasize cross-layer structure communities = mlm.multilayer_louvain( supra_adj, gamma=1.0, omega=10.0 # High coupling = cross-layer communities ) Mathematical Formulation ~~~~~~~~~~~~~~~~~~~~~~~~ Multilayer modularity is defined as: .. math:: Q^{ML} = \\frac{1}{2\\mu} \\sum_{ij\\alpha\\beta} \\left[ (A_{ij}^{[\\alpha]} - \\gamma_{\\alpha} P_{ij}^{[\\alpha]})\\delta_{\\alpha\\beta} + \\omega_{\\alpha\\beta}\\delta_{ij} \\right] \\delta(g_{i}^{[\\alpha]}, g_{j}^{[\\beta]}) Where: * :math:`A_{ij}^{[\\alpha]}` is the adjacency matrix of layer :math:`\\alpha` * :math:`P_{ij}^{[\\alpha]}` is the null model (e.g., configuration model) * :math:`\\gamma_{\\alpha}` is the resolution parameter for layer :math:`\\alpha` * :math:`\\omega_{\\alpha\\beta}` is the coupling strength between layers * :math:`\\delta(g_{i}^{[\\alpha]}, g_{j}^{[\\beta]})` is 1 if nodes are in the same community, 0 otherwise Advantages ~~~~~~~~~~ * Accounts for multilayer structure * Implements state-of-the-art algorithm * Configurable inter-layer coupling * Published in Science (Mucha et al. 2010) Disadvantages ~~~~~~~~~~~~~ * More computationally expensive * Requires parameter tuning * May not scale to very large networks (>100k nodes) Evaluating Community Quality ----------------------------- Modularity Score ~~~~~~~~~~~~~~~~ .. code-block:: python import networkx as nx # Compute modularity modularity = nx.community.modularity(network.core_network, communities) print(f"Modularity: {modularity:.3f}") **Interpretation:** * :math:`Q > 0.3`: Good community structure * :math:`Q > 0.5`: Strong community structure * :math:`Q < 0.3`: Weak or no community structure Coverage and Performance ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Coverage: fraction of edges within communities coverage = nx.community.coverage(network.core_network, communities) # Performance: fraction of correctly classified node pairs performance = nx.community.performance(network.core_network, communities) print(f"Coverage: {coverage:.3f}") print(f"Performance: {performance:.3f}") Visualizing Communities ----------------------- Color by Community ~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.visualization.multilayer import hairball_plot import matplotlib.pyplot as plt # Map communities to colors node_colors = [communities.get(node, 0) for node in network.core_network.nodes()] # Visualize with community colors hairball_plot( network.core_network, node_color=node_colors, layout_algorithm='force', cmap='tab10' ) plt.show() Community Size Distribution ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from collections import Counter import matplotlib.pyplot as plt # Count community sizes community_sizes = Counter(communities.values()) sizes = list(community_sizes.values()) # Plot distribution plt.hist(sizes, bins=20) plt.xlabel('Community Size') plt.ylabel('Frequency') plt.title('Community Size Distribution') plt.show() Comparing Algorithms -------------------- Run Multiple Algorithms ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Run different algorithms louvain_comms = community_louvain.best_partition(network.core_network) label_prop_comms = label_propagation.propagate(network.core_network) # Compare number of communities print(f"Louvain: {len(set(louvain_comms.values()))} communities") print(f"Label Prop: {len(set(label_prop_comms.values()))} communities") # Compare modularity louvain_mod = nx.community.modularity(network.core_network, [set(n for n, c in louvain_comms.items() if c == i) for i in set(louvain_comms.values())]) label_mod = nx.community.modularity(network.core_network, [set(n for n, c in label_prop_comms.items() if c == i) for i in set(label_prop_comms.values())]) print(f"Louvain modularity: {louvain_mod:.3f}") print(f"Label Prop modularity: {label_mod:.3f}") Normalized Mutual Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compare similarity between community structures: .. code-block:: python from sklearn.metrics import normalized_mutual_info_score # Convert to lists louvain_list = [louvain_comms[node] for node in network.core_network.nodes()] label_list = [label_prop_comms[node] for node in network.core_network.nodes()] # Compute NMI nmi = normalized_mutual_info_score(louvain_list, label_list) print(f"NMI between Louvain and Label Prop: {nmi:.3f}") **Interpretation:** * NMI = 1.0: Identical community structures * NMI = 0.0: Completely different structures * NMI > 0.5: Similar structures Best Practices -------------- Algorithm Selection ~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 20 20 30 30 * - Network Size - Speed Priority - Quality Priority - Recommendation * - Small (<1K) - Any - Any - Try all algorithms * - Medium (1K-10K) - Louvain - Louvain/Infomap - Louvain (good balance) * - Large (10K-100K) - Louvain/Label Prop - Louvain - Louvain * - Very Large (>100K) - Label Prop - Louvain - Label Prop or sample Parameter Guidelines ~~~~~~~~~~~~~~~~~~~~ **Louvain resolution:** * Start with ``resolution=1.0`` * Increase if communities are too large * Decrease if communities are too fragmented **Multilayer coupling (omega):** * ``omega=1.0`` - Default, balanced * ``omega<1.0`` - Emphasize layer-specific structure * ``omega>1.0`` - Emphasize cross-layer structure Validation ~~~~~~~~~~ Always validate community detection results: 1. **Visual inspection** - Plot and examine communities 2. **Modularity** - Check modularity score (>0.3 is good) 3. **Size distribution** - Check for giant communities or singletons 4. **Domain knowledge** - Do communities make sense for your application? References ---------- **Louvain:** Blondel, V. D., et al. (2008). Fast unfolding of communities in large networks. *Journal of Statistical Mechanics*. **Infomap:** Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. *PNAS*, 105(4), 1118-1123. **Multilayer Modularity:** Mucha, P. J., et al. (2010). Community structure in time-dependent, multiscale, and multiplex networks. *Science*, 328(5980), 876-878. See :doc:`../citation` for complete citations with DOIs. Next Steps ---------- * :doc:`multilayer_centrality` - Centrality measures tutorial * :doc:`multilayer_modularity` - Multilayer modularity details * :doc:`../algorithm_guide` - Algorithm selection guide * Examples in ``examples/example_community_detection.py``