Community Detection Tutorial ============================= This tutorial demonstrates **community detection algorithms** for **multilayer networks** using py3plex. Overview -------- Community detection identifies **groups of nodes** that are more **densely connected** to each other than to the rest of the network. For **multilayer networks**, communities can **span multiple layers**, accounting for both **intra-layer** and **inter-layer** structure. Supported Algorithms -------------------- py3plex provides **several community detection algorithms**: * **Louvain** - **Fast modularity optimization** (recommended for most use cases) * **Infomap** - **Flow-based** community detection (requires external binary) * **Label Propagation** - **Semi-supervised** approach with known seed communities * **Multilayer Modularity** - **True multilayer** community detection (Mucha et al. 2010) Louvain Algorithm ----------------- Basic Usage ~~~~~~~~~~~ **Fastest algorithm** for large networks: .. code-block:: python from py3plex.core import multinet from py3plex.algorithms.community_detection import community_louvain # Create or load network network = multinet.multi_layer_network() network.load_network("data.graphml", input_type="graphml") # Detect communities using Louvain communities = community_louvain.best_partition(network.core_network) # Print results for node, community_id in communities.items(): print(f"Node {node} -> Community {community_id}") Parameters ~~~~~~~~~~ .. code-block:: python # With custom resolution parameter communities = community_louvain.best_partition( network.core_network, resolution=1.0 # Higher = more communities, Lower = fewer communities ) **Resolution parameter:** * ``resolution=1.0`` - **Standard modularity** * ``resolution>1.0`` - **More, smaller communities** * ``resolution<1.0`` - **Fewer, larger communities** Advantages ~~~~~~~~~~ * **Very fast**: :math:`O(n \log n)` * **Scales to millions** of nodes * **BSD license** (commercial-friendly) * **Well-established** and widely used Disadvantages ~~~~~~~~~~~~~ * **Non-deterministic** (random initialization) * **Cannot find overlapping** communities * **Resolution limit** issues Infomap Algorithm ----------------- Basic Usage ~~~~~~~~~~~ Flow-based approach for detecting communities: .. code-block:: python from py3plex.algorithms.community_detection import community_wrapper # Detect communities using Infomap communities = community_wrapper.infomap_communities( network.core_network, binary_path="/path/to/infomap" # Path to Infomap binary ) With Hierarchical Structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get hierarchical community structure hierarchical_communities = community_wrapper.infomap_communities( network.core_network, binary_path="/path/to/infomap", hierarchical=True ) Advantages ~~~~~~~~~~ * Can detect overlapping communities * Flow-based (natural for many applications) * Hierarchical structure * Information-theoretic foundation Disadvantages ~~~~~~~~~~~~~ * Requires external binary * AGPLv3 license (viral copyleft - problematic for commercial use) * Slower than Louvain Label Propagation ----------------- Semi-Supervised Detection ~~~~~~~~~~~~~~~~~~~~~~~~~ Use when you have some known community memberships: .. code-block:: python from py3plex.algorithms.community_detection import label_propagation # Provide seed labels for some nodes seed_labels = { 'node1': 0, 'node2': 0, 'node3': 1, 'node4': 1 } # Propagate labels to unlabeled nodes communities = label_propagation.propagate( network.core_network, seed_labels=seed_labels, max_iter=100 ) Fully Unsupervised ~~~~~~~~~~~~~~~~~~ .. code-block:: python # Without seed labels (random initialization) communities = label_propagation.propagate( network.core_network, max_iter=100 ) Advantages ~~~~~~~~~~ * Very fast: :math:`O(m)` linear in edges * Can incorporate prior knowledge * MIT license * Simple and interpretable Disadvantages ~~~~~~~~~~~~~ * Non-deterministic * Sensitive to initialization * May not converge * Lower quality than Louvain/Infomap Multilayer Modularity --------------------- True Multilayer Detection ~~~~~~~~~~~~~~~~~~~~~~~~~ Accounts for multilayer structure following Mucha et al. (2010): .. code-block:: python from py3plex.algorithms.community_detection import multilayer_modularity as mlm # Get supra-adjacency matrix supra_adj = network.get_supra_adjacency_matrix(sparse=True) # Detect communities with multilayer modularity communities = mlm.multilayer_louvain( supra_adj, gamma=1.0, # Resolution parameter omega=1.0 # Inter-layer coupling strength ) Parameter Tuning ~~~~~~~~~~~~~~~~ .. code-block:: python # Emphasize layer-specific structure communities = mlm.multilayer_louvain( supra_adj, gamma=1.0, omega=0.1 # Low coupling = layer-specific communities ) # Emphasize cross-layer structure communities = mlm.multilayer_louvain( supra_adj, gamma=1.0, omega=10.0 # High coupling = cross-layer communities ) Mathematical Formulation ~~~~~~~~~~~~~~~~~~~~~~~~ Multilayer modularity is defined as: .. math:: Q^{ML} = \frac{1}{2\mu} \sum_{ij\alpha\beta} \left[ (A_{ij}^{[\alpha]} - \gamma_{\alpha} P_{ij}^{[\alpha]})\delta_{\alpha\beta} + \omega_{\alpha\beta}\delta_{ij} \right] \delta(g_{i}^{[\alpha]}, g_{j}^{[\beta]}) Where: * :math:`A_{ij}^{[\alpha]}` is the adjacency matrix of layer :math:`\alpha` * :math:`P_{ij}^{[\alpha]}` is the null model (e.g., configuration model) * :math:`\gamma_{\alpha}` is the resolution parameter for layer :math:`\alpha` * :math:`\omega_{\alpha\beta}` is the coupling strength between layers * :math:`\delta(g_{i}^{[\alpha]}, g_{j}^{[\beta]})` is 1 if nodes are in the same community, 0 otherwise Advantages ~~~~~~~~~~ * Accounts for multilayer structure * Implements state-of-the-art algorithm * Configurable inter-layer coupling * Published in Science (Mucha et al. 2010) Disadvantages ~~~~~~~~~~~~~ * More computationally expensive * Requires parameter tuning * May not scale to very large networks (>100k nodes) Evaluating Community Quality ----------------------------- Modularity Score ~~~~~~~~~~~~~~~~ .. code-block:: python import networkx as nx # Compute modularity modularity = nx.community.modularity(network.core_network, communities) print(f"Modularity: {modularity:.3f}") **Interpretation:** * :math:`Q > 0.3`: Good community structure * :math:`Q > 0.5`: Strong community structure * :math:`Q < 0.3`: Weak or no community structure Coverage and Performance ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Coverage: fraction of edges within communities coverage = nx.community.coverage(network.core_network, communities) # Performance: fraction of correctly classified node pairs performance = nx.community.performance(network.core_network, communities) print(f"Coverage: {coverage:.3f}") print(f"Performance: {performance:.3f}") Visualizing Communities ----------------------- Color by Community ~~~~~~~~~~~~~~~~~~ .. code-block:: python from py3plex.visualization.multilayer import hairball_plot import matplotlib.pyplot as plt # Map communities to colors node_colors = [communities.get(node, 0) for node in network.core_network.nodes()] # Visualize with community colors hairball_plot( network.core_network, node_color=node_colors, layout_algorithm='force', cmap='tab10' ) plt.show() Community Size Distribution ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from collections import Counter import matplotlib.pyplot as plt # Count community sizes community_sizes = Counter(communities.values()) sizes = list(community_sizes.values()) # Plot distribution plt.hist(sizes, bins=20) plt.xlabel('Community Size') plt.ylabel('Frequency') plt.title('Community Size Distribution') plt.show() Comparing Algorithms -------------------- Run Multiple Algorithms ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Run different algorithms louvain_comms = community_louvain.best_partition(network.core_network) label_prop_comms = label_propagation.propagate(network.core_network) # Compare number of communities print(f"Louvain: {len(set(louvain_comms.values()))} communities") print(f"Label Prop: {len(set(label_prop_comms.values()))} communities") # Compare modularity louvain_mod = nx.community.modularity(network.core_network, [set(n for n, c in louvain_comms.items() if c == i) for i in set(louvain_comms.values())]) label_mod = nx.community.modularity(network.core_network, [set(n for n, c in label_prop_comms.items() if c == i) for i in set(label_prop_comms.values())]) print(f"Louvain modularity: {louvain_mod:.3f}") print(f"Label Prop modularity: {label_mod:.3f}") Normalized Mutual Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compare similarity between community structures: .. code-block:: python from sklearn.metrics import normalized_mutual_info_score # Convert to lists louvain_list = [louvain_comms[node] for node in network.core_network.nodes()] label_list = [label_prop_comms[node] for node in network.core_network.nodes()] # Compute NMI nmi = normalized_mutual_info_score(louvain_list, label_list) print(f"NMI between Louvain and Label Prop: {nmi:.3f}") **Interpretation:** * NMI = 1.0: Identical community structures * NMI = 0.0: Completely different structures * NMI > 0.5: Similar structures Best Practices -------------- Algorithm Selection ~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 20 20 30 30 * - Network Size - Speed Priority - Quality Priority - Recommendation * - Small (<1K) - Any - Any - Try all algorithms * - Medium (1K-10K) - Louvain - Louvain/Infomap - Louvain (good balance) * - Large (10K-100K) - Louvain/Label Prop - Louvain - Louvain * - Very Large (>100K) - Label Prop - Louvain - Label Prop or sample Parameter Guidelines ~~~~~~~~~~~~~~~~~~~~ **Louvain resolution:** * Start with ``resolution=1.0`` * Increase if communities are too large * Decrease if communities are too fragmented **Multilayer coupling (omega):** * ``omega=1.0`` - Default, balanced * ``omega<1.0`` - Emphasize layer-specific structure * ``omega>1.0`` - Emphasize cross-layer structure Validation ~~~~~~~~~~ Always validate community detection results: 1. **Visual inspection** - Plot and examine communities 2. **Modularity** - Check modularity score (>0.3 is good) 3. **Size distribution** - Check for giant communities or singletons 4. **Domain knowledge** - Do communities make sense for your application? References ---------- **Louvain:** Blondel, V. D., et al. (2008). Fast unfolding of communities in large networks. *Journal of Statistical Mechanics*. **Infomap:** Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. *PNAS*, 105(4), 1118-1123. **Multilayer Modularity:** Mucha, P. J., et al. (2010). Community structure in time-dependent, multiscale, and multiplex networks. *Science*, 328(5980), 876-878. See :doc:`../citation` for complete citations with DOIs. Next Steps ---------- * :doc:`./multilayer_centrality` - Centrality measures tutorial * :doc:`./multilayer_modularity` - Multilayer modularity details * :doc:`../algorithm_guide` - Algorithm selection guide * Examples in ``examples/communities/example_community_detection.py``