Community Detection Tutorial
=============================

This tutorial demonstrates **community detection algorithms** for **multilayer networks** using py3plex.

Overview
--------

Community detection identifies **groups of nodes** that are more **densely connected** to each other than to the rest of the network. For **multilayer networks**, communities can **span multiple layers**, accounting for both **intra-layer** and **inter-layer** structure.

Supported Algorithms
--------------------

py3plex provides **several community detection algorithms**:

* **Louvain** - **Fast modularity optimization** (recommended for most use cases)
* **Infomap** - **Flow-based** community detection (requires external binary)
* **Label Propagation** - **Semi-supervised** approach with known seed communities
* **Multilayer Modularity** - **True multilayer** community detection (Mucha et al. 2010)

Louvain Algorithm
-----------------

Basic Usage
~~~~~~~~~~~

**Fastest algorithm** for large networks:

.. code-block:: python

    from py3plex.core import multinet
    from py3plex.algorithms.community_detection import community_louvain
    
    # Create or load network
    network = multinet.multi_layer_network()
    network.load_network("data.graphml", input_type="graphml")
    
    # Detect communities using Louvain
    communities = community_louvain.best_partition(network.core_network)
    
    # Print results
    for node, community_id in communities.items():
        print(f"Node {node} -> Community {community_id}")

Parameters
~~~~~~~~~~

.. code-block:: python

    # With custom resolution parameter
    communities = community_louvain.best_partition(
        network.core_network,
        resolution=1.0  # Higher = more communities, Lower = fewer communities
    )

**Resolution parameter:**

* ``resolution=1.0`` - **Standard modularity**
* ``resolution>1.0`` - **More, smaller communities**
* ``resolution<1.0`` - **Fewer, larger communities**

Advantages
~~~~~~~~~~

* **Very fast**: :math:`O(n \log n)`
* **Scales to millions** of nodes
* **BSD license** (commercial-friendly)
* **Well-established** and widely used

Disadvantages
~~~~~~~~~~~~~

* **Non-deterministic** (random initialization)
* **Cannot find overlapping** communities
* **Resolution limit** issues

Infomap Algorithm
-----------------

Basic Usage
~~~~~~~~~~~

Flow-based approach for detecting communities:

.. code-block:: python

    from py3plex.algorithms.community_detection import community_wrapper
    
    # Detect communities using Infomap
    communities = community_wrapper.infomap_communities(
        network.core_network,
        binary_path="/path/to/infomap"  # Path to Infomap binary
    )

With Hierarchical Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Get hierarchical community structure
    hierarchical_communities = community_wrapper.infomap_communities(
        network.core_network,
        binary_path="/path/to/infomap",
        hierarchical=True
    )

Advantages
~~~~~~~~~~

* Can detect overlapping communities
* Flow-based (natural for many applications)
* Hierarchical structure
* Information-theoretic foundation

Disadvantages
~~~~~~~~~~~~~

* Requires external binary
* AGPLv3 license (viral copyleft - problematic for commercial use)
* Slower than Louvain

Label Propagation
-----------------

Semi-Supervised Detection
~~~~~~~~~~~~~~~~~~~~~~~~~

Use when you have some known community memberships:

.. code-block:: python

    from py3plex.algorithms.community_detection import label_propagation
    
    # Provide seed labels for some nodes
    seed_labels = {
        'node1': 0,
        'node2': 0,
        'node3': 1,
        'node4': 1
    }
    
    # Propagate labels to unlabeled nodes
    communities = label_propagation.propagate(
        network.core_network,
        seed_labels=seed_labels,
        max_iter=100
    )

Fully Unsupervised
~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Without seed labels (random initialization)
    communities = label_propagation.propagate(
        network.core_network,
        max_iter=100
    )

Advantages
~~~~~~~~~~

* Very fast: :math:`O(m)` linear in edges
* Can incorporate prior knowledge
* MIT license
* Simple and interpretable

Disadvantages
~~~~~~~~~~~~~

* Non-deterministic
* Sensitive to initialization
* May not converge
* Lower quality than Louvain/Infomap

Multilayer Modularity
---------------------

True Multilayer Detection
~~~~~~~~~~~~~~~~~~~~~~~~~

Accounts for multilayer structure following Mucha et al. (2010):

.. code-block:: python

    from py3plex.algorithms.community_detection import multilayer_modularity as mlm
    
    # Get supra-adjacency matrix
    supra_adj = network.get_supra_adjacency_matrix(sparse=True)
    
    # Detect communities with multilayer modularity
    communities = mlm.multilayer_louvain(
        supra_adj,
        gamma=1.0,      # Resolution parameter
        omega=1.0       # Inter-layer coupling strength
    )

Parameter Tuning
~~~~~~~~~~~~~~~~

.. code-block:: python

    # Emphasize layer-specific structure
    communities = mlm.multilayer_louvain(
        supra_adj,
        gamma=1.0,
        omega=0.1  # Low coupling = layer-specific communities
    )
    
    # Emphasize cross-layer structure
    communities = mlm.multilayer_louvain(
        supra_adj,
        gamma=1.0,
        omega=10.0  # High coupling = cross-layer communities
    )

Mathematical Formulation
~~~~~~~~~~~~~~~~~~~~~~~~

Multilayer modularity is defined as:

.. math::

    Q^{ML} = \frac{1}{2\mu} \sum_{ij\alpha\beta} \left[ (A_{ij}^{[\alpha]} - \gamma_{\alpha} P_{ij}^{[\alpha]})\delta_{\alpha\beta} + \omega_{\alpha\beta}\delta_{ij} \right] \delta(g_{i}^{[\alpha]}, g_{j}^{[\beta]})

Where:

* :math:`A_{ij}^{[\alpha]}` is the adjacency matrix of layer :math:`\alpha`
* :math:`P_{ij}^{[\alpha]}` is the null model (e.g., configuration model)
* :math:`\gamma_{\alpha}` is the resolution parameter for layer :math:`\alpha`
* :math:`\omega_{\alpha\beta}` is the coupling strength between layers
* :math:`\delta(g_{i}^{[\alpha]}, g_{j}^{[\beta]})` is 1 if nodes are in the same community, 0 otherwise

Advantages
~~~~~~~~~~

* Accounts for multilayer structure
* Implements state-of-the-art algorithm
* Configurable inter-layer coupling
* Published in Science (Mucha et al. 2010)

Disadvantages
~~~~~~~~~~~~~

* More computationally expensive
* Requires parameter tuning
* May not scale to very large networks (>100k nodes)

Evaluating Community Quality
-----------------------------

Modularity Score
~~~~~~~~~~~~~~~~

.. code-block:: python

    import networkx as nx
    
    # Compute modularity
    modularity = nx.community.modularity(network.core_network, communities)
    print(f"Modularity: {modularity:.3f}")

**Interpretation:**

* :math:`Q > 0.3`: Good community structure
* :math:`Q > 0.5`: Strong community structure
* :math:`Q < 0.3`: Weak or no community structure

Coverage and Performance
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Coverage: fraction of edges within communities
    coverage = nx.community.coverage(network.core_network, communities)
    
    # Performance: fraction of correctly classified node pairs
    performance = nx.community.performance(network.core_network, communities)
    
    print(f"Coverage: {coverage:.3f}")
    print(f"Performance: {performance:.3f}")

Visualizing Communities
-----------------------

Color by Community
~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from py3plex.visualization.multilayer import hairball_plot
    import matplotlib.pyplot as plt
    
    # Map communities to colors
    node_colors = [communities.get(node, 0) for node in network.core_network.nodes()]
    
    # Visualize with community colors
    hairball_plot(
        network.core_network,
        node_color=node_colors,
        layout_algorithm='force',
        cmap='tab10'
    )
    plt.show()

Community Size Distribution
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from collections import Counter
    import matplotlib.pyplot as plt
    
    # Count community sizes
    community_sizes = Counter(communities.values())
    sizes = list(community_sizes.values())
    
    # Plot distribution
    plt.hist(sizes, bins=20)
    plt.xlabel('Community Size')
    plt.ylabel('Frequency')
    plt.title('Community Size Distribution')
    plt.show()

Comparing Algorithms
--------------------

Run Multiple Algorithms
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Run different algorithms
    louvain_comms = community_louvain.best_partition(network.core_network)
    label_prop_comms = label_propagation.propagate(network.core_network)
    
    # Compare number of communities
    print(f"Louvain: {len(set(louvain_comms.values()))} communities")
    print(f"Label Prop: {len(set(label_prop_comms.values()))} communities")
    
    # Compare modularity
    louvain_mod = nx.community.modularity(network.core_network, 
                                          [set(n for n, c in louvain_comms.items() if c == i) 
                                           for i in set(louvain_comms.values())])
    label_mod = nx.community.modularity(network.core_network,
                                        [set(n for n, c in label_prop_comms.items() if c == i)
                                         for i in set(label_prop_comms.values())])
    
    print(f"Louvain modularity: {louvain_mod:.3f}")
    print(f"Label Prop modularity: {label_mod:.3f}")

Normalized Mutual Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Compare similarity between community structures:

.. code-block:: python

    from sklearn.metrics import normalized_mutual_info_score
    
    # Convert to lists
    louvain_list = [louvain_comms[node] for node in network.core_network.nodes()]
    label_list = [label_prop_comms[node] for node in network.core_network.nodes()]
    
    # Compute NMI
    nmi = normalized_mutual_info_score(louvain_list, label_list)
    print(f"NMI between Louvain and Label Prop: {nmi:.3f}")

**Interpretation:**

* NMI = 1.0: Identical community structures
* NMI = 0.0: Completely different structures
* NMI > 0.5: Similar structures

Best Practices
--------------

Algorithm Selection
~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 20 20 30 30

   * - Network Size
     - Speed Priority
     - Quality Priority
     - Recommendation
   * - Small (<1K)
     - Any
     - Any
     - Try all algorithms
   * - Medium (1K-10K)
     - Louvain
     - Louvain/Infomap
     - Louvain (good balance)
   * - Large (10K-100K)
     - Louvain/Label Prop
     - Louvain
     - Louvain
   * - Very Large (>100K)
     - Label Prop
     - Louvain
     - Label Prop or sample

Parameter Guidelines
~~~~~~~~~~~~~~~~~~~~

**Louvain resolution:**

* Start with ``resolution=1.0``
* Increase if communities are too large
* Decrease if communities are too fragmented

**Multilayer coupling (omega):**

* ``omega=1.0`` - Default, balanced
* ``omega<1.0`` - Emphasize layer-specific structure
* ``omega>1.0`` - Emphasize cross-layer structure

Validation
~~~~~~~~~~

Always validate community detection results:

1. **Visual inspection** - Plot and examine communities
2. **Modularity** - Check modularity score (>0.3 is good)
3. **Size distribution** - Check for giant communities or singletons
4. **Domain knowledge** - Do communities make sense for your application?

References
----------

**Louvain:**

Blondel, V. D., et al. (2008). Fast unfolding of communities in large networks. *Journal of Statistical Mechanics*.

**Infomap:**

Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. *PNAS*, 105(4), 1118-1123.

**Multilayer Modularity:**

Mucha, P. J., et al. (2010). Community structure in time-dependent, multiscale, and multiplex networks. *Science*, 328(5980), 876-878.

See :doc:`../citation` for complete citations with DOIs.

Next Steps
----------

* :doc:`./multilayer_centrality` - Centrality measures tutorial
* :doc:`./multilayer_modularity` - Multilayer modularity details
* :doc:`../algorithm_guide` - Algorithm selection guide
* Examples in ``examples/communities/example_community_detection.py``