Attributed Node Clustering

Overview

class cdlib.AttrNodeClustering(communities: list, graph: object, method_name: str = '', coms_labels: dict | None = None, method_parameters: dict | None = None, overlap: bool = False)

Attribute Node Communities representation.

Parameters:
  • communities – list of communities

  • graph – a networkx/igraph object

  • method_name – community discovery algorithm name

  • coms_labels – dictionary specifying for each community the frequency of the attribute values

  • method_parameters – configuration for the community discovery algorithm used

  • overlap – boolean, whether the partition is overlapping or not

adjusted_mutual_information(clustering: Clustering) MatchingResult

Adjusted Mutual Information between two clusterings.

Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:

AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [max(H(U), H(V)) - E(MI(U, V))]

This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.

This metric is furthermore symmetric: switching label_true with label_pred will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.

Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.

Parameters:

clustering – NodeClustering object

Returns:

AMI score

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.adjusted_mutual_information(leiden_communities)
Reference:

  1. Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.

adjusted_rand_index(clustering: Clustering) MatchingResult

Rand index adjusted for chance.

The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.

The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:

ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)

The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).

ARI is a symmetric measure:

adjusted_rand_index(a, b) == adjusted_rand_index(b, a)
Parameters:

clustering – NodeClustering object

Returns:

ARI score

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.adjusted_rand_index(leiden_communities)
Reference:

  1. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193-218.

average_internal_degree(**kwargs: dict) FitnessResult

The average internal degree of the algorithms set.

\[f(S) = \frac{2m_S}{n_S}\]

where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.average_internal_degree()
avg_distance(**kwargs: dict) FitnessResult

Average distance.

The average distance of a community is defined average path length across all possible pair of nodes composing it.

Parameters:

summary – boolean. If True it is returned an aggregated score for the partition is returned, otherwise individual-community ones. Default True.

Returns:

If summary==True a FitnessResult object, otherwise a list of floats.

Example:

>>> from cdlib.algorithms import louvain
>>> from cdlib import evaluation
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> scd = communities.avg_distance()
avg_embeddedness(**kwargs: dict) FitnessResult

Average embeddedness of nodes within the community.

The embeddedness of a node n w.r.t. a community C is the ratio of its degree within the community and its overall degree.

\[emb(n,C) = \frac{k_n^C}{k_n}\]

The average embeddedness of a community C is:

\[avg_embd(c) = \frac{1}{|C|} \sum_{i \in C} \frac{k_n^C}{k_n}\]
Parameters:

summary – boolean. If True it is returned an aggregated score for the partition is returned, otherwise individual-community ones. Default True.

Returns:

If summary==True a FitnessResult object, otherwise a list of floats.

Example:

>>> from cdlib.algorithms import louvain
>>> from cdlib import evaluation
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> ave = communities.avg_embeddedness()
avg_odf(**kwargs: dict) FitnessResult

Average fraction of edges of a node of a algorithms that point outside the algorithms itself.

\[\frac{1}{n_S} \sum_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]

where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.avg_odf()
avg_transitivity(**kwargs: dict) FitnessResult

Average transitivity.

The average transitivity of a community is defined the as the average clustering coefficient of its nodes w.r.t. their connection within the community itself.

Parameters:

summary – boolean. If True it is returned an aggregated score for the partition is returned, otherwise individual-community ones. Default True.

Returns:

If summary==True a FitnessResult object, otherwise a list of floats.

Example:

>>> from cdlib.algorithms import louvain
>>> from cdlib import evaluation
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> scd = communities.avg_transitivity()
classification_error(clustering: object) MatchingResult

This function calculates the Jaccard index between two clusterings.

\[CE = 1 - PI\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.classification_error(leiden_communities)

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

conductance(**kwargs: dict) FitnessResult

Fraction of total edge volume that points outside the algorithms.

\[f(S) = \frac{c_S}{2 m_S+c_S}\]

where \(c_S\) is the number of algorithms nodes and, \(m_S\) is the number of algorithms edges

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.conductance()
cut_ratio(**kwargs: dict) FitnessResult

Fraction of existing edges (out of all possible edges) leaving the algorithms.

..math:: f(S) = frac{c_S}{n_S (n − n_S)}

where \(c_S\) is the number of algorithms nodes and, \(n_S\) is the number of edges on the algorithms boundary

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.cut_ratio()
czekanowski_index(clustering: object) MatchingResult

This function calculates the Czekanowski between two clusterings.

Also known as: Dice Symmetric index Sorensen index

\[F = \frac{2*N11}{(2*N11 + N10 + N01)}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.czekanowski_index(leiden_communities)

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

dice_index(clustering: object) MatchingResult

This function calculates the Czekanowski between two clusterings.

Also known as: Czekanowski index Sorensen index

\[F = \frac{2*N11}{(2*N11 + N10 + N01)}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.dice_index(leiden_communities)

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

ecs(clustering: object, alpha: float = 0.9, r: float = 1.0, r2: float | None = None, rescale_path_type: str = 'max', ppr_implementation: str = 'prpack') MatchingResult

The element-centric clustering similarity.

Parameters:
  • clustering – NodeClustering object

  • alpha – The personalized page-rank return probability as a float in [0,1]. float, default 0.9

  • r – The hierarchical scaling parameter for clustering1. float, default 1.0

  • r2 – The hierarchical scaling parameter for clustering2. float, default None

  • rescale_path_type – rescale the hierarchical height by: ‘max’ the maximum path from the root; ‘min’ the minimum path form the root; ‘linkage’ use the linkage distances in the clustering.

  • ppr_implementation – Choose an implementation for personalized page-rank calculation: ‘prpack’ use PPR algorithms in igraph; ‘power_iteration’: use power_iteration method.

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.ecs(leiden_communities)
Reference:

A.J. Gates, I.B. Wood, W.P. Hetrick, and YY Ahn [2019]. “Element-centric clustering comparison unifies overlaps and hierarchy”. Scientific Reports 9, 8574

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

edges_inside(**kwargs: dict) FitnessResult

Number of edges internal to the algorithms.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.edges_inside()
erdos_renyi_modularity() FitnessResult

Erdos-Renyi modularity is a variation of the Newman-Girvan one. It assumes that vertices in a network are connected randomly with a constant probability \(p\).

\[Q(S) = \frac{1}{m}\sum_{c \in S} (m_S − \frac{mn_S(n_S −1)}{n(n−1)})\]

where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.

Returns:

the Erdos-Renyi modularity score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.erdos_renyi_modularity()
References:

Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen, 6, 290-297.

expansion(**kwargs: dict) FitnessResult

Number of edges per algorithms node that point outside the cluster.

\[f(S) = \frac{c_S}{n_S}\]

where \(n_S\) is the number of edges on the algorithms boundary, \(c_S\) is the number of algorithms nodes.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.expansion()
f1(clustering: Clustering) MatchingResult

Compute the average F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.

Parameters:

clustering – NodeClustering object

Returns:

F1 score (harmonic mean of precision and recall)

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.f1(leiden_communities)
Reference:

  1. Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth. In Complex Networks VII (pp. 133-144). Springer, Cham.

flake_odf(**kwargs: dict) FitnessResult

Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms.

\[f(S) = \frac{| \{ u:u \in S,| \{(u,v) \in E: v \in S \}| < d(u)/2 \}|}{n_S}\]

where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.flake_odf()
fowlkes_mallows_index(clustering: object) MatchingResult

This function calculates the Fowlkes and Mallows index between two clusterings

\[FM = \frac{N11}{ \sqrt{ (N11 + N10) * (N11 + N01) }}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.fowlkes_mallows_index(leiden_communities)
Reference:

Edward B. Fowlkes and Colin L. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553–569, 1983.

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

fraction_over_median_degree(**kwargs: dict) FitnessResult

Fraction of algorithms nodes of having internal degree higher than the median degree value.

\[f(S) = \frac{|\{u: u \in S,| \{(u,v): v \in S\}| > d_m\}| }{n_S}\]

where \(d_m\) is the internal degree median value

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.fraction_over_median_degree()
geometric_accuracy(clustering: object) MatchingResult

This function calculates the geometric accuracy between two (overlapping) clusterings.

Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.geometric_accuracy(leiden_communities)
Reference:

Tamás Nepusz, Haiyuan Yu, and Alberto Paccanaro. Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods, 9(5):471–472, 2012.

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

get_description(parameters_to_display: list | None = None, precision: int = 3) str

Return a description of the clustering, with the name of the method and its numeric parameters.

Parameters:
  • parameters_to_display – parameters to display. By default, all float parameters.

  • precision – precision used to plot parameters. default: 3

Returns:

a string description of the method.

hub_dominance(**kwargs: dict) FitnessResult

Hub dominance.

The hub dominance of a community is defined as the ratio of the degree of its most connected node w.r.t. the theoretically maximal degree within the community.

Parameters:

summary – boolean. If True it is returned an aggregated score for the partition is returned, otherwise individual-community ones. Default True.

Returns:

If summary==True a FitnessResult object, otherwise a list of floats.

Example:

>>> from cdlib.algorithms import louvain
>>> from cdlib import evaluation
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> scd = communities.hub_dominance()
internal_edge_density(**kwargs: dict) FitnessResult

The internal density of the algorithms set.

\[f(S) = \frac{m_S}{n_S(n_S−1)/2}\]

where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.internal_edge_density()
jaccard_index(clustering: object) MatchingResult

This function calculates the Jaccard index between two clusterings.

\[J = \frac{N11}{(N11+N10+N01)}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.jaccard_index(leiden_communities)
Reference:

Paul Jaccard. The distribution of the flora in the alpine zone. New Phytologist, 11(2):37–50, 1912.

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

Quality function designed for directed graphs with overlapping communities.

Returns:

the link modularity score

Example:

>>> from cdlib import evaluation
>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.link_modularity()
max_odf(**kwargs: dict) FitnessResult

Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself.

\[max_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]

where \(E\) is the graph edge set, \(v\) is a node in \(S\) and \(d(u)\) is the degree of \(u\)

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.max_odf()
mi(clustering: object) MatchingResult

This function calculates the Mutual Information (MI) between two clusterings.

\[MI = (S(c1) + S(c2) - S(c1, c2))\]

where S(c1) is the Shannon Entropy of the clustering size distribution, S(c1, c2) is the Shannon Entropy of the join clustering size distribution,

Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.mi(leiden_communities)
Reference:

Leon Danon, Albert D ıaz-Guilera, Jordi Duch, and Alex Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005(09):P09008–P09008, September 2005.

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

modularity_density() FitnessResult

The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. The idea of this metric is to include the information about algorithms size into the expected density of algorithms to avoid the negligence of small and dense communities. For each algorithms \(C\) in partition \(S\), it uses the average modularity degree calculated by \(d(C) = d^{int(C)} − d^{ext(C)}\) where \(d^{int(C)}\) and \(d^{ext(C)}\) are the average internal and external degrees of \(C\) respectively to evaluate the fitness of \(C\) in its network. Finally, the modularity density can be calculated as follows:

\[Q(S) = \sum_{C \in S} \frac{1}{n_C} ( \sum_{i \in C} k^{int}_{iC} - \sum_{i \in C} k^{out}_{iC})\]

where \(n_C\) is the number of nodes in C, \(k^{int}_{iC}\) is the degree of node i within \(C\) and \(k^{out}_{iC}\) is the deree of node i outside \(C\).

Returns:

the modularity density score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.modularity_density()
References:

Li, Z., Zhang, S., Wang, R. S., Zhang, X. S., & Chen, L. (2008). Quantitative function for algorithms detection. Physical review E, 77(3), 036109.

modularity_overlap(weight: str | None = None) FitnessResult

Determines the Overlapping Modularity of a partition C on a graph G.

Overlapping Modularity is defined as

\[M_{c_{r}}^{ov} = \sum_{i \in c_{r}} \frac{\sum_{j \in c_{r}, i \neq j}a_{ij} - \sum_{j \not \in c_{r}}a_{ij}}{d_{i} \cdot s_{i}} \cdot \frac{n_{c_{r}}^{e}}{n_{c_{r}} \cdot \binom{n_{c_{r}}}{2}}\]
Parameters:

weight – label identifying the edge weight parameter name (if present), default None

Returns:

FitnessResult object

Example:

>>> from cdlib.algorithms import louvain
>>> from cdlib import evaluation
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.modularity_overlap()
References:

    1. Lazar, D. Abel and T. Vicsek, “Modularity measure of networks with overlapping communities” EPL, 90 (2010) 18001 doi: 10.1209/0295-5075/90/18001

newman_girvan_modularity() FitnessResult

Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model.

In the standard version of modularity, the null model preserves the expected degree sequence of the graph under consideration. In other words, the modularity compares the real network structure with a corresponding one where nodes are connected without any preference about their neighbors.

\[Q(S) = \frac{1}{m}\sum_{c \in S}(m_S - \frac{(2 m_S + l_S)^2}{4m})\]

where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.

Returns:

the Newman-Girvan modularity score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.newman_girvan_modularity()
References:

Newman, M.E.J. & Girvan, M. Finding and evaluating algorithms structure in networks. Physical Review E 69, 26113(2004).

nf1(clustering: Clustering) MatchingResult

Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.

Parameters:

clustering – NodeClustering object

Returns:

MatchingResult instance

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.nf1(leiden_communities)
Reference:

  1. Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth.

  2. Rossetti, G. (2017). : RDyn: graph benchmark handling algorithms dynamics. Journal of Complex Networks. 5(6), 893-912.

normalized_cut(**kwargs: dict) FitnessResult

Normalized variant of the Cut-Ratio

\[: f(S) = \frac{c_S}{2m_S+c_S} + \frac{c_S}{2(m−m_S )+c_S}\]

where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms internal edges and \(c_S\) is the number of algorithms nodes.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.normalized_cut()
normalized_mutual_information(clustering: Clustering) MatchingResult

Normalized Mutual Information between two clusterings.

Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by sqrt(H(labels_true) * H(labels_pred))

Parameters:

clustering – NodeClustering object

Returns:

normalized mutual information score

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.normalized_mutual_information(leiden_communities)
omega(clustering: Clustering) MatchingResult

Index of resemblance for overlapping, complete coverage, network clusterings.

Parameters:

clustering – NodeClustering object

Returns:

omega index

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.omega(leiden_communities)
Reference:

  1. Gabriel Murray, Giuseppe Carenini, and Raymond Ng. 2012. Using the omega index for evaluating abstractive algorithms detection. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, Stroudsburg, PA, USA, 10-18.

overlap_quality(clustering: object) MatchingResult

This function calculates the overlap quality between two (overlapping) clusterings.

Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.overlap_quality(leiden_communities)
Reference:

Yong-Yeol Ahn, James P Bagrow, and Sune Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466(7307):761–764, June 2010.

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

overlapping_normalized_mutual_information_LFK(clustering: Clustering) MatchingResult

Overlapping Normalized Mutual Information between two clusterings.

Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by Lancichinetti et al.

Parameters:

clustering – NodeClustering object

Returns:

onmi score

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.overlapping_normalized_mutual_information_LFK(leiden_communities)
Reference:

  1. Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.

overlapping_normalized_mutual_information_MGH(clustering: Clustering, normalization: str = 'max') MatchingResult

Overlapping Normalized Mutual Information between two clusterings.

Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by McDaid et al. using a different normalization than the original LFR one. See ref. for more details.

Parameters:
  • clustering – NodeClustering object

  • normalization – one of “max” or “LFK”. Default “max” (corresponds to the main method described in the article)

Returns:

onmi score

Example:

>>> from cdlib import evaluation, algorithms
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> evaluation.overlapping_normalized_mutual_information_MGH(louvain_communities,leiden_communities)
:Reference:
  1. McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515. Chicago

partition_closeness_simple(clustering: Clustering) MatchingResult

Community size density closeness. Simple implementation that does not leverage kernel density estimator.

$$ S_G(A,B) = frac{1}{2} Sum_{i=1}^{r}Sum_{j=1}^{s} min(frac{n^a(x^a_i)}{N^a}, frac{n^b_j(x^b_j)}{N^b}) delta(x_i^a,x_j^b) $$

where:

$$ N^a $$ total number of communities in A of any size; $$ x^a $$ ordered list of community sizes for A; $$ n^a $$ multiplicity of community sizes for A.

(symmetrically for B)

Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.partition_closeness_simple(leiden_communities)
Reference:

  1. Dao, Vinh-Loc, Cécile Bothorel, and Philippe Lenca. “Estimating the similarity of community detection methods based on cluster size distribution.” International Conference on Complex Networks and their Applications. Springer, Cham, 2018.

purity() FitnessResult

Purity is the product of the frequencies of the most frequent labels carried by the nodes within the communities :return: FitnessResult object

rand_index(clustering: object) MatchingResult

This function calculates the Rand index between two clusterings.

\[RI = \frac{(N11 + N00)}{(N11 + N10 + N01 + N00)}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.rand_index(leiden_communities)
Reference:

William M Rand. Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66(336):846, 1971.

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

rmi(clustering: object, norm_type: str = 'none', logbase: int = 2) MatchingResult

This function calculates the Reduced Mutual Information (RMI) between two clusterings.

\[RMI = MI(c1, c2) - \log \frac{Omega(a, b)}{n}\]

where MI(c1, c2) is mutual information of the clusterings c1 and c2, and Omega(a, b) is the number of contingency tables with row and column sums equal to a and b.

Parameters:
  • clustering – NodeClustering object

  • norm_type – The normalization types are: ‘none’ returns the RMI without a normalization; ‘normalized’ returns the RMI with upper bound equals to 1.

  • logbase – int, default 2

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.rmi(leiden_communities)
Reference:

      1. Newman, George T. Cantwell, and Jean-Gabriel Young. Improved mutual information measure for classification and community detection. arXiv:1907.12581, 2019.

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

rogers_tanimoto_index(clustering: object) MatchingResult

This function calculates the Rogers and Tanimoto index between two clusterings.

\[RT = \frac{(N11 + N00)}{(N11 + 2*(N10+N01) + N00)}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.rogers_tanimoto_index(leiden_communities)

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

sample_expected_sim(clustering: object, measure: str = 'jaccard_index', random_model: str = 'perm', n_samples: int = 1, keep_samples: bool = False) MatchingResult

This function calculates the expected Similarity for all pair-wise comparisons between Clusterings drawn from one of six random models.

Note

Clustering 2 is considered the gold-standard clustering for one-sided expectations

Parameters:
  • clustering – NodeClustering object

  • measure – The similarity measure to evaluate. Must be one of [ecs, jaccard_index, rand_index, fowlkes_mallows_index, classification_error, czekanowski_index, dice_index, sorensen_index, rogers_tanimoto_index, southwood_index, mi, rmi, vi, geometric_accuracy, overlap_quality, sample_expected_sim]

  • random_model

    The random model to use:

    ’all’uniform distribution over the set of all clusterings of

    n_elements

    ’all1’one-sided selection from the uniform distribution over the set

    of all clusterings of n_elements

    ’num’uniform distribution over the set of all clusterings of

    n_elements in n_clusters

    ’num1’one-sided selection from the uniform distribution over the set

    of all clusterings of n_elements in n_clusters

    ’perm’ : the permutation model for a fixed cluster size sequence

    ’perm1’one-sided selection from the permutation model for a fixed

    cluster size sequence, same as ‘perm’

  • n_samples – The number of random Clusterings sampled to determine the expected similarity.

  • keep_samples – If True, returns the Similarity samples themselves, otherwise return their mean.

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.sample_expected_sim(leiden_communities)

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

scaled_density(**kwargs: dict) FitnessResult

Scaled density.

The scaled density of a community is defined as the ratio of the community density w.r.t. the complete graph density.

Parameters:

summary – boolean. If True it is returned an aggregated score for the partition is returned, otherwise individual-community ones. Default True.

Returns:

If summary==True a FitnessResult object, otherwise a list of floats.

Example:

>>> from cdlib.algorithms import louvain
>>> from cdlib import evaluation
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> scd = communities.scaled_density()
significance() FitnessResult

Significance estimates how likely a partition of dense communities appear in a random graph.

Returns:

the significance score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.significance()
References:

Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.

size(**kwargs: dict) FitnessResult

Size is the number of nodes in the community

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.size()
sorensen_index(clustering: object) MatchingResult

This function calculates the Sorensen between two clusterings.

Also known as: Czekanowski index Dice index

\[F = \frac{2*N11}{(2*N11 + N10 + N01)}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.sorensen_index(leiden_communities)

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

southwood_index(clustering: object) MatchingResult

This function calculates the Southwood index between two clusterings.

\[\frac{N11}{(N10 + N01)}\]
Parameters:

clustering – NodeClustering object

Returns:

MatchingResult object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> g = nx.karate_club_graph()
>>> louvain_communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> louvain_communities.southwood_index(leiden_communities)

Note

The function requires the clusim library to be installed. You can install it via pip: pip install clusim

surprise() FitnessResult

Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution.

According to the Surprise metric, the higher the score of a partition, the less likely it is resulted from a random realization, the better the quality of the algorithms structure.

Returns:

the surprise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.surprise()
References:

Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.

to_json() str

Generate a JSON representation of the algorithms object

Returns:

a JSON formatted string representing the object

to_node_community_map() dict

Generate a <node, list(communities)> representation of the current clustering

Returns:

dict of the form <node, list(communities)>

triangle_participation_ratio(**kwargs: dict) FitnessResult

Fraction of algorithms nodes that belong to a triad.

\[f(S) = \frac{ | \{ u: u \in S,\{(v,w):v, w \in S,(u,v) \in E,(u,w) \in E,(v,w) \in E \} \not = \emptyset \} |}{n_S}\]

where \(n_S\) is the set of algorithms nodes.

Parameters:

summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score

Returns:

a FitnessResult object/a list of community-wise score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.triangle_participation_ratio()
variation_of_information(clustering: Clustering) MatchingResult

Variation of Information among two nodes partitions.

$$ H(p)+H(q)-2MI(p, q) $$

where MI is the mutual information, H the partition entropy and p,q are the algorithms sets

Parameters:

clustering – NodeClustering object

Returns:

VI score

Example:

>>> from cdlib import algorithms
>>> g = nx.karate_club_graph()
>>> communities = algorithms.louvain(g)
>>> leiden_communities = algorithms.leiden(g)
>>> mod = communities.variation_of_information(leiden_communities)
Reference:

  1. Meila, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98, 873-895. doi:10.1016/j.jmva.2006.11.013

z_modularity() FitnessResult

Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. The concept of this version is based on an observation that the difference between the fraction of edges inside communities and the expected number of such edges in a null model should not be considered as the only contribution to the final quality of algorithms structure.

Returns:

the z-modularity score

Example:

>>> from cdlib.algorithms import louvain
>>> g = nx.karate_club_graph()
>>> communities = louvain(g)
>>> mod = communities.z_modularity()
References:

Miyauchi, Atsushi, and Yasushi Kawase. Z-score-based modularity for algorithms detection in networks. PloS one 11.1 (2016): e0147805.

Methods

Evaluating Node Clustering

AttrNodeClustering.purity()

Purity is the product of the frequencies of the most frequent labels carried by the nodes within the communities :return: FitnessResult object