py3plex.algorithms.statistics package

Submodules

py3plex.algorithms.statistics.basic_statistics module

py3plex.algorithms.statistics.basic_statistics.core_network_statistics(G, labels=None, name='example')
py3plex.algorithms.statistics.basic_statistics.identify_n_hubs(G, top_n=100, node_type=None)

py3plex.algorithms.statistics.bayesian_distances module

py3plex.algorithms.statistics.bayesian_distances.generate_bayesian_diagram(result_matrices, algo_names=['algo1', 'algo2'], rope=0.01, rho=0.2, show_diagram=True, save_diagram=None)

py3plex.algorithms.statistics.bayesiantests module

py3plex.algorithms.statistics.bayesiantests.correlated_ttest(x, rope, runs=1, verbose=False, names=('C1', 'C2'))
py3plex.algorithms.statistics.bayesiantests.correlated_ttest_MC(x, rope, runs=1, nsamples=50000)

See correlated_ttest module for explanations

py3plex.algorithms.statistics.bayesiantests.heaviside(X)
py3plex.algorithms.statistics.bayesiantests.hierarchical(diff, rope, rho, upperAlpha=2, lowerAlpha=1, lowerBeta=0.01, upperBeta=0.1, std_upper_bound=1000, verbose=False, names=('C1', 'C2'))
py3plex.algorithms.statistics.bayesiantests.hierarchical_MC(diff, rope, rho, upperAlpha=2, lowerAlpha=1, lowerBeta=0.01, upperBeta=0.1, std_upper_bound=1000, names=('C1', 'C2'))
py3plex.algorithms.statistics.bayesiantests.plot_posterior(samples, names=('C1', 'C2'), proba_triplet=None)
Parameters
  • x (array) – a vector of differences or a 2d array with pairs of scores.

  • names (pair of str) – the names of the two classifiers

Returns

matplotlib.pyplot.figure

py3plex.algorithms.statistics.bayesiantests.plot_simplex(points, names=('C1', 'C2'), proba_triplet=None)
py3plex.algorithms.statistics.bayesiantests.signrank(x, rope, prior_strength=0.6, prior_place=1, nsamples=50000, verbose=False, names=('C1', 'C2'))
Parameters
  • x (array) – a vector of differences or a 2d array with pairs of scores.

  • rope (float) – the width of the rope

  • prior_strength (float) – prior strength (default: 0.6)

  • prior_place (LEFT, ROPE or RIGHT) – the region to which the prior is assigned (default: ROPE)

  • nsamples (int) – the number of Monte Carlo samples

  • verbose (bool) – report the computed probabilities

  • names (pair of str) – the names of the two classifiers

Returns

p_left, p_rope, p_right

py3plex.algorithms.statistics.bayesiantests.signrank_MC(x, rope, prior_strength=0.6, prior_place=1, nsamples=50000)
Parameters
  • x (array) – a vector of differences or a 2d array with pairs of scores.

  • rope (float) – the width of the rope

  • prior_strength (float) – prior strength (default: 0.6)

  • prior_place (LEFT, ROPE or RIGHT) – the region to which the prior is assigned (default: ROPE)

  • nsamples (int) – the number of Monte Carlo samples

Returns

2-d array with rows corresponding to samples and columns to probabilities [p_left, p_rope, p_right]

py3plex.algorithms.statistics.bayesiantests.signtest(x, rope, prior_strength=1, prior_place=1, nsamples=50000, verbose=False, names=('C1', 'C2'))
Parameters
  • x (array) – a vector of differences or a 2d array with pairs of scores.

  • rope (float) – the width of the rope

  • prior_strength (float) – prior strength (default: 1)

  • prior_place (LEFT, ROPE or RIGHT) – the region to which the prior is assigned (default: ROPE)

  • nsamples (int) – the number of Monte Carlo samples

  • verbose (bool) – report the computed probabilities

  • names (pair of str) – the names of the two classifiers

Returns

p_left, p_rope, p_right

py3plex.algorithms.statistics.bayesiantests.signtest_MC(x, rope, prior_strength=1, prior_place=1, nsamples=50000)
Parameters
  • x (array) – a vector of differences or a 2d array with pairs of scores.

  • rope (float) – the width of the rope

  • prior_strength (float) – prior strength (default: 1)

  • prior_place (LEFT, ROPE or RIGHT) – the region to which the prior is assigned (default: ROPE)

  • nsamples (int) – the number of Monte Carlo samples

Returns

2-d array with rows corresponding to samples and columns to probabilities [p_left, p_rope, p_right]

py3plex.algorithms.statistics.correlation_networks module

py3plex.algorithms.statistics.correlation_networks.default_correlation_to_network(matrix, input_type='matrix', preprocess='standard')
py3plex.algorithms.statistics.correlation_networks.pick_threshold(matrix)

py3plex.algorithms.statistics.critical_distances module

py3plex.algorithms.statistics.critical_distances.center(width, n)

Computes free space on the figure on both sides. :param width: :param n: number of algorithms :return:

py3plex.algorithms.statistics.critical_distances.diagram(list_of_algorithms, the_algorithm_candidate, output_figure_file, fontsize=10)

Draws critical distance diagram for Nemenyi or Bonferroni-Dunn post-hoc test. The diagram is shown if output_figure_file is None, and saved otherwise to the file. :param list_of_algorithms: [[(alg_name1, avg_rank1)], …] :param critical_distance: :param output_figure_file: If not none, the diagram produced is saved to the specified file. Otherwise, the diagram is shown. :param the_algorithm_candidate: If we were performing Bonferroni-Dunn post-hcc test (1 vs all), this is the algorithm from the list list_of_algorithms, which the other algorithms are compared to. If we were performing Nemenyi post-hoc test (all vs all), this should be None. :return: output_figure_file

py3plex.algorithms.statistics.critical_distances.plot_critical_distance(fname, groupby=['dataset', 'setting'], groupby_target='macro_F', outfile='./micro_cd.pdf', aggregator='mean', fontsize=10)
py3plex.algorithms.statistics.critical_distances.remove_backslash(file_name)

py3plex.algorithms.statistics.enrichment module

py3plex.algorithms.statistics.enrichment_modules module

py3plex.algorithms.statistics.enrichment_modules.calculate_pval(term, alternative='two-sided')

Parallel kernel for computation of p vals. All partitions are considered with respect to agiven GO term! Counts in a given partition are compared to population.

py3plex.algorithms.statistics.enrichment_modules.compute_enrichment(term_dataset, term_database, topology_map, all_counts, whole_term_list=False, pvalue=0.05, multitest_method='fdr_bh', alternative='two-sided', intra_community=False)

The main method for computing the enrichment of a subnetwork. This work in parallel and also offers methods for multiple test correction.

py3plex.algorithms.statistics.enrichment_modules.fet_enrichment_generic(term_dataset, term_database, all_counts, topology_map)

A generic enrichment method useful for arbitrary partition enrichment (CBSSD baseline).

term_dataset = defaultdict(list) of node:[a1..an] mappings term_datset = Counter object of individual annotation occurrences all_counts = number of all annotation occurences

py3plex.algorithms.statistics.enrichment_modules.fet_enrichment_terms(partition_mappings, annotation_mappings, alternative='two-sided', intra_community=False, pvalue=0.1, multitest_method='fdr_bh')

This is the most generic enrichment process.

py3plex.algorithms.statistics.enrichment_modules.fet_enrichment_uniprot(partition_mappings, annotation_mappings)

This is a pre-designed wrapper for uniprot-like annotation.

py3plex.algorithms.statistics.enrichment_modules.multiple_test_correction(input_dataset)

Multiple test correction. Given a dataset with corresponding significance levels, perform MTC.

py3plex.algorithms.statistics.enrichment_modules.parallel_enrichment(term)

A simple kernel for parallel computation of p-values. (assuming independence of experiments)

py3plex.algorithms.statistics.powerlaw module

class py3plex.algorithms.statistics.powerlaw.Distribution(xmin=1, xmax=None, discrete=False, fit_method='Likelihood', data=None, parameters=None, parameter_range=None, initial_parameters=None, discrete_approximation='round', parent_Fit=None, **kwargs)

Bases: object

An abstract class for theoretical probability distributions. Can be created with particular parameter values, or fitted to a dataset. Fitting is by maximum likelihood estimation by default.

Parameters
  • xmin (int or float, optional) – The data value beyond which distributions should be fitted. If None an optimal one will be calculated.

  • xmax (int or float, optional) – The maximum value of the fitted distributions.

  • discrete (boolean, optional) – Whether the distribution is discrete (integers).

  • data (list or array, optional) – The data to which to fit the distribution. If provided, the fit will be created at initialization.

  • fit_method ("Likelihood" or "KS", optional) – Method for fitting the distribution. “Likelihood” is maximum Likelihood estimation. “KS” is minimial distance estimation using The Kolmogorov-Smirnov test.

  • parameters (tuple or list, optional) – The parameters of the distribution. Will be overridden if data is given or the fit method is called.

  • parameter_range (dict, optional) – Dictionary of valid parameter ranges for fitting. Formatted as a dictionary of parameter names (‘alpha’ and/or ‘sigma’) and tuples of their lower and upper limits (ex. (1.5, 2.5), (None, .1)

  • initial_parameters (tuple or list, optional) – Initial values for the parameter in the fitting search.

  • discrete_approximation ("round", "xmax" or int, optional) – If the discrete form of the theoeretical distribution is not known, it can be estimated. One estimation method is “round”, which sums the probability mass from x-.5 to x+.5 for each data point. The other option is to calculate the probability for each x from 1 to N and normalize by their sum. N can be “xmax” or an integer.

  • parent_Fit (Fit object, optional) – A Fit object from which to use data, if it exists.

KS(data=None)

Returns the Kolmogorov-Smirnov distance D between the distribution and the data. Also sets the properties D+, D-, V (the Kuiper testing statistic), and Kappa (1 + the average difference between the theoretical and empirical distributions).

Parameters

data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

ccdf(data=None, survival=True)

The complementary cumulative distribution function (CCDF) of the theoretical distribution. Calculated for the values given in data within xmin and xmax, if present.

Parameters
  • data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

  • survival (bool, optional) – Whether to calculate a CDF (False) or CCDF (True). True by default.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is less than or equal to X.

cdf(data=None, survival=False)

The cumulative distribution function (CDF) of the theoretical distribution. Calculated for the values given in data within xmin and xmax, if present.

Parameters
  • data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

  • survival (bool, optional) – Whether to calculate a CDF (False) or CCDF (True). False by default.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is less than or equal to X.

fit(data=None, suppress_output=False)

Fits the parameters of the distribution to the data. Uses options set at initialization.

generate_random(n=1, estimate_discrete=None)

Generates random numbers from the theoretical probability distribution. If xmax is present, it is currently ignored.

Parameters
  • n (int or float) – The number of random numbers to generate

  • estimate_discrete (boolean) – For discrete distributions, whether to use a faster approximation of the random number generator. If None, attempts to inherit the estimate_discrete behavior used for fitting from the Distribution object or the parent Fit object, if present. Approximations only exist for some distributions (namely the power law). If an approximation does not exist an estimate_discrete setting of True will not be inherited.

Returns

r – Random numbers drawn from the distribution

Return type

array

in_range()

Whether the current parameters of the distribution are within the range of valid parameters.

initial_parameters(data)

Return previously user-provided initial parameters or, if never provided, calculate new ones. Default initial parameter estimates are unique to each theoretical distribution.

likelihoods(data)

The likelihoods of the observed data from the theoretical distribution. Another name for the probabilities or probability density function.

loglikelihoods(data)

The logarithm of the likelihoods of the observed data from the theoretical distribution.

parameter_range(r, initial_parameters=None)

Set the limits on the range of valid parameters to be considered while fitting.

Parameters
  • r (dict) – A dictionary of the parameter range. Restricted parameter names are keys, and with tuples of the form (lower_bound, upper_bound) as values.

  • initial_parameters (tuple or list, optional) – Initial parameter values to start the fitting search from.

pdf(data=None)

Returns the probability density function (normalized histogram) of the theoretical distribution for the values in data within xmin and xmax, if present.

Parameters

data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

Returns

probabilities

Return type

array

plot_ccdf(data=None, ax=None, survival=True, **kwargs)

Plots the complementary cumulative distribution function (CDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. Plots to a new figure or to axis ax if provided.

Parameters
  • data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

  • survival (bool, optional) – Whether to plot a CDF (False) or CCDF (True). True by default.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

plot_cdf(data=None, ax=None, survival=False, **kwargs)

Plots the cumulative distribution function (CDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. Plots to a new figure or to axis ax if provided.

Parameters
  • data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

  • survival (bool, optional) – Whether to plot a CDF (False) or CCDF (True). False by default.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

plot_pdf(data=None, ax=None, **kwargs)

Plots the probability density function (PDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. Plots to a new figure or to axis ax if provided.

Parameters
  • data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

class py3plex.algorithms.statistics.powerlaw.Distribution_Fit(data, name, xmin, discrete=False, xmax=None, method='Likelihood', estimate_discrete=True)

Bases: object

class py3plex.algorithms.statistics.powerlaw.Exponential(xmin=1, xmax=None, discrete=False, fit_method='Likelihood', data=None, parameters=None, parameter_range=None, initial_parameters=None, discrete_approximation='round', parent_Fit=None, **kwargs)

Bases: py3plex.algorithms.statistics.powerlaw.Distribution

loglikelihoods(data=None)

The logarithm of the likelihoods of the observed data from the theoretical distribution.

property name
parameters(params)
pdf(data=None)

Returns the probability density function (normalized histogram) of the theoretical distribution for the values in data within xmin and xmax, if present.

Parameters

data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

Returns

probabilities

Return type

array

class py3plex.algorithms.statistics.powerlaw.Fit(data, discrete=False, xmin=None, xmax=None, fit_method='Likelihood', estimate_discrete=True, discrete_approximation='round', sigma_threshold=None, parameter_range=None, fit_optimizer=None, xmin_distance='D', **kwargs)

Bases: object

A fit of a data set to various probability distributions, namely power laws. For fits to power laws, the methods of Clauset et al. 2007 are used. These methods identify the portion of the tail of the distribution that follows a power law, beyond a value xmin. If no xmin is provided, the optimal one is calculated and assigned at initialization.

Parameters
  • data (list or array) –

  • discrete (boolean, optional) – Whether the data is discrete (integers).

  • xmin (int or float, optional) – The data value beyond which distributions should be fitted. If None an optimal one will be calculated.

  • xmax (int or float, optional) – The maximum value of the fitted distributions.

  • estimate_discrete (bool, optional) – Whether to estimate the fit of a discrete power law using fast analytical methods, instead of calculating the fit exactly with slow numerical methods. Very accurate with xmin>6

  • sigma_threshold (float, optional) – Upper limit on the standard error of the power law fit. Used after fitting, when identifying valid xmin values.

  • parameter_range (dict, optional) – Dictionary of valid parameter ranges for fitting. Formatted as a dictionary of parameter names (‘alpha’ and/or ‘sigma’) and tuples of their lower and upper limits (ex. (1.5, 2.5), (None, .1)

ccdf(original_data=False, survival=True, **kwargs)

Returns the complementary cumulative distribution function of the data.

Parameters
  • original_data (bool, optional) – Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.)

  • survival (bool, optional) – Whether to return the complementary cumulative distribution function, also known as the survival function, or the cumulative distribution function, 1-CCDF.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is greater than or equal to X.

cdf(original_data=False, survival=False, **kwargs)

Returns the cumulative distribution function of the data.

Parameters
  • original_data (bool, optional) – Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.)

  • survival (bool, optional) – Whether to return the complementary cumulative distribution function, 1-CDF, also known as the survival function.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is less than or equal to X.

distribution_compare(dist1, dist2, nested=None, **kwargs)

Returns the loglikelihood ratio, and its p-value, between the two distribution fits, assuming the candidate distributions are nested.

Parameters
  • dist1 (string) – Name of the first candidate distribution (ex. ‘power_law’)

  • dist2 (string) – Name of the second candidate distribution (ex. ‘exponential’)

  • nested (bool or None, optional) – Whether to assume the candidate distributions are nested versions of each other. None assumes not unless the name of one distribution is a substring of the other.

Returns

  • R (float) – Loglikelihood ratio of the two distributions’ fit to the data. If greater than 0, the first distribution is preferred. If less than 0, the second distribution is preferred.

  • p (float) – Significance of R

find_xmin(xmin_distance=None)

Returns the optimal xmin beyond which the scaling regime of the power law fits best. The attribute self.xmin of the Fit object is also set.

The optimal xmin beyond which the scaling regime of the power law fits best is identified by minimizing the Kolmogorov-Smirnov distance between the data and the theoretical power law fit. This is the method of Clauset et al. 2007.

loglikelihood_ratio(dist1, dist2, nested=None, **kwargs)

Another name for distribution_compare.

nested_distribution_compare(dist1, dist2, nested=True, **kwargs)

Returns the loglikelihood ratio, and its p-value, between the two distribution fits, assuming the candidate distributions are nested.

Parameters
  • dist1 (string) – Name of the first candidate distribution (ex. ‘power_law’)

  • dist2 (string) – Name of the second candidate distribution (ex. ‘exponential’)

  • nested (bool or None, optional) – Whether to assume the candidate distributions are nested versions of each other. None assumes not unless the name of one distribution is a substring of the other. True by default.

Returns

  • R (float) – Loglikelihood ratio of the two distributions’ fit to the data. If greater than 0, the first distribution is preferred. If less than 0, the second distribution is preferred.

  • p (float) – Significance of R

pdf(original_data=False, **kwargs)

Returns the probability density function (normalized histogram) of the data.

Parameters

original_data (bool, optional) – Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.)

Returns

  • bin_edges (array) – The edges of the bins of the probability density function.

  • probabilities (array) – The portion of the data that is within the bin. Length 1 less than bin_edges, as it corresponds to the spaces between them.

plot_ccdf(ax=None, original_data=False, survival=True, **kwargs)

Plots the CCDF to a new figure or to axis ax if provided.

Parameters
  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

  • original_data (bool, optional) – Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.)

  • survival (bool, optional) – Whether to plot a CDF (False) or CCDF (True). True by default.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

plot_cdf(ax=None, original_data=False, survival=False, **kwargs)

Plots the CDF to a new figure or to axis ax if provided.

Parameters
  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

  • original_data (bool, optional) – Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.)

  • survival (bool, optional) – Whether to plot a CDF (False) or CCDF (True). False by default.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

plot_pdf(ax=None, original_data=False, linear_bins=False, **kwargs)

Plots the probability density function (PDF) or the data to a new figure or to axis ax if provided.

Parameters
  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

  • original_data (bool, optional) – Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.)

  • linear_bins (bool, optional) – Whether to use linearly spaced bins (True) or logarithmically spaced bins (False). False by default.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

class py3plex.algorithms.statistics.powerlaw.Lognormal(xmin=1, xmax=None, discrete=False, fit_method='Likelihood', data=None, parameters=None, parameter_range=None, initial_parameters=None, discrete_approximation='round', parent_Fit=None, **kwargs)

Bases: py3plex.algorithms.statistics.powerlaw.Distribution

cdf(data=None, survival=False)

The cumulative distribution function (CDF) of the lognormal distribution. Calculated for the values given in data within xmin and xmax, if present. Calculation was reformulated to avoid underflow errors

Parameters
  • data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

  • survival (bool, optional) – Whether to calculate a CDF (False) or CCDF (True). False by default.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is less than or equal to X.

property name
parameters(params)
pdf(data=None)

Returns the probability density function (normalized histogram) of the theoretical distribution for the values in data within xmin and xmax, if present.

Parameters

data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

Returns

probabilities

Return type

array

class py3plex.algorithms.statistics.powerlaw.Lognormal_Positive(xmin=1, xmax=None, discrete=False, fit_method='Likelihood', data=None, parameters=None, parameter_range=None, initial_parameters=None, discrete_approximation='round', parent_Fit=None, **kwargs)

Bases: py3plex.algorithms.statistics.powerlaw.Lognormal

property name
class py3plex.algorithms.statistics.powerlaw.Power_Law(estimate_discrete=True, **kwargs)

Bases: py3plex.algorithms.statistics.powerlaw.Distribution

fit(data=None)

Fits the parameters of the distribution to the data. Uses options set at initialization.

property name
parameters(params)
property sigma
class py3plex.algorithms.statistics.powerlaw.Stretched_Exponential(xmin=1, xmax=None, discrete=False, fit_method='Likelihood', data=None, parameters=None, parameter_range=None, initial_parameters=None, discrete_approximation='round', parent_Fit=None, **kwargs)

Bases: py3plex.algorithms.statistics.powerlaw.Distribution

loglikelihoods(data=None)

The logarithm of the likelihoods of the observed data from the theoretical distribution.

property name
parameters(params)
pdf(data=None)

Returns the probability density function (normalized histogram) of the theoretical distribution for the values in data within xmin and xmax, if present.

Parameters

data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

Returns

probabilities

Return type

array

class py3plex.algorithms.statistics.powerlaw.Truncated_Power_Law(xmin=1, xmax=None, discrete=False, fit_method='Likelihood', data=None, parameters=None, parameter_range=None, initial_parameters=None, discrete_approximation='round', parent_Fit=None, **kwargs)

Bases: py3plex.algorithms.statistics.powerlaw.Distribution

property name
parameters(params)
pdf(data=None)

Returns the probability density function (normalized histogram) of the theoretical distribution for the values in data within xmin and xmax, if present.

Parameters

data (list or array, optional) – If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.

Returns

probabilities

Return type

array

py3plex.algorithms.statistics.powerlaw.bisect_map(mn, mx, function, target)

Uses binary search to find the target solution to a function, searching in a given ordered sequence of integer values.

Parameters
  • seq (list or array, monotonically increasing integers) –

  • function (a function that takes a single integer input, which monotonically) – decreases over the range of seq.

  • target (the target value of the function) –

Returns

  • value (the input value that yields the target solution. If there is no)

  • exact solution in the input sequence, finds the nearest value k such that

  • function(k) <= target < function(k+1). This is similar to the behavior of

  • bisect_left in the bisect package. If even the first, leftmost value of seq

  • does not satisfy this condition, -1 is returned.

py3plex.algorithms.statistics.powerlaw.ccdf(data, survival=True, **kwargs)

The complementary cumulative distribution function (CCDF) of the data.

Parameters
  • data (list or array, optional) –

  • survival (bool, optional) – Whether to calculate a CDF (False) or CCDF (True). True by default.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is less than or equal to X.

py3plex.algorithms.statistics.powerlaw.cdf(data, survival=False, **kwargs)

The cumulative distribution function (CDF) of the data.

Parameters
  • data (list or array, optional) –

  • survival (bool, optional) – Whether to calculate a CDF (False) or CCDF (True). False by default.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is less than or equal to X.

py3plex.algorithms.statistics.powerlaw.checkunique(data)

Quickly checks if a sorted array is all unique elements.

py3plex.algorithms.statistics.powerlaw.cumulative_distribution_function(data, xmin=None, xmax=None, survival=False, **kwargs)

The cumulative distribution function (CDF) of the data.

Parameters
  • data (list or array, optional) –

  • survival (bool, optional) – Whether to calculate a CDF (False) or CCDF (True). False by default.

  • xmin (int or float, optional) – The minimum data size to include. Values less than xmin are excluded.

  • xmax (int or float, optional) – The maximum data size to include. Values greater than xmin are excluded.

Returns

  • X (array) – The sorted, unique values in the data.

  • probabilities (array) – The portion of the data that is less than or equal to X.

py3plex.algorithms.statistics.powerlaw.distribution_compare(data, distribution1, parameters1, distribution2, parameters2, discrete, xmin, xmax, nested=None, **kwargs)
py3plex.algorithms.statistics.powerlaw.distribution_fit(data, distribution='all', discrete=False, xmin=None, xmax=None, comparison_alpha=None, search_method='Likelihood', estimate_discrete=True)
py3plex.algorithms.statistics.powerlaw.exponential_likelihoods(data, Lambda, xmin, xmax=False, discrete=False)
py3plex.algorithms.statistics.powerlaw.find_xmin(data, discrete=False, xmax=None, search_method='Likelihood', return_all=False, estimate_discrete=True, xmin_range=None)
py3plex.algorithms.statistics.powerlaw.gamma_likelihoods(data, k, theta, xmin, xmax=False, discrete=False)
py3plex.algorithms.statistics.powerlaw.is_discrete(data)

Checks if every element of the array is an integer.

py3plex.algorithms.statistics.powerlaw.likelihood_function_generator(distribution_name, discrete=False, xmin=1, xmax=None)
py3plex.algorithms.statistics.powerlaw.loglikelihood_ratio(loglikelihoods1, loglikelihoods2, nested=False, normalized_ratio=False)

Calculates a loglikelihood ratio and the p-value for testing which of two probability distributions is more likely to have created a set of observations.

Parameters
  • loglikelihoods1 (list or array) – The logarithms of the likelihoods of each observation, calculated from a particular probability distribution.

  • loglikelihoods2 (list or array) – The logarithms of the likelihoods of each observation, calculated from a particular probability distribution.

  • nested (bool, optional) – Whether one of the two probability distributions that generated the likelihoods is a nested version of the other. False by default.

  • normalized_ratio (bool, optional) – Whether to return the loglikelihood ratio, R, or the normalized ratio R/sqrt(n*variance)

Returns

  • R (float) – The loglikelihood ratio of the two sets of likelihoods. If positive, the first set of likelihoods is more likely (and so the probability distribution that produced them is a better fit to the data). If negative, the reverse is true.

  • p (float) – The significance of the sign of R. If below a critical value (typically .05) the sign of R is taken to be significant. If above the critical value the sign of R is taken to be due to statistical fluctuations.

py3plex.algorithms.statistics.powerlaw.lognormal_likelihoods(data, mu, sigma, xmin, xmax=False, discrete=False)
py3plex.algorithms.statistics.powerlaw.negative_binomial_likelihoods(data, r, p, xmin=0, xmax=False)
py3plex.algorithms.statistics.powerlaw.nested_loglikelihood_ratio(loglikelihoods1, loglikelihoods2, **kwargs)

Calculates a loglikelihood ratio and the p-value for testing which of two probability distributions is more likely to have created a set of observations. Assumes one of the probability distributions is a nested version of the other.

Parameters
  • loglikelihoods1 (list or array) – The logarithms of the likelihoods of each observation, calculated from a particular probability distribution.

  • loglikelihoods2 (list or array) – The logarithms of the likelihoods of each observation, calculated from a particular probability distribution.

  • nested (bool, optional) – Whether one of the two probability distributions that generated the likelihoods is a nested version of the other. True by default.

  • normalized_ratio (bool, optional) – Whether to return the loglikelihood ratio, R, or the normalized ratio R/sqrt(n*variance)

Returns

  • R (float) – The loglikelihood ratio of the two sets of likelihoods. If positive, the first set of likelihoods is more likely (and so the probability distribution that produced them is a better fit to the data). If negative, the reverse is true.

  • p (float) – The significance of the sign of R. If below a critical value (typically .05) the sign of R is taken to be significant. If above the critical value the sign of R is taken to be due to statistical fluctuations.

py3plex.algorithms.statistics.powerlaw.pdf(data, xmin=None, xmax=None, linear_bins=False, **kwargs)

Returns the probability density function (normalized histogram) of the data.

Parameters
  • data (list or array) –

  • xmin (float, optional) – Minimum value of the PDF. If None, uses the smallest value in the data.

  • xmax (float, optional) – Maximum value of the PDF. If None, uses the largest value in the data.

  • linear_bins (float, optional) – Whether to use linearly spaced bins, as opposed to logarithmically spaced bins (recommended for log-log plots).

Returns

  • bin_edges (array) – The edges of the bins of the probability density function.

  • probabilities (array) – The portion of the data that is within the bin. Length 1 less than bin_edges, as it corresponds to the spaces between them.

py3plex.algorithms.statistics.powerlaw.plot_ccdf(data, ax=None, survival=False, **kwargs)
py3plex.algorithms.statistics.powerlaw.plot_cdf(data, ax=None, survival=False, **kwargs)

Plots the cumulative distribution function (CDF) of the data to a new figure or to axis ax if provided.

Parameters
  • data (list or array) –

  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

  • survival (bool, optional) – Whether to plot a CDF (False) or CCDF (True). False by default.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

py3plex.algorithms.statistics.powerlaw.plot_pdf(data, ax=None, linear_bins=False, **kwargs)

Plots the probability density function (PDF) to a new figure or to axis ax if provided.

Parameters
  • data (list or array) –

  • ax (matplotlib axis, optional) – The axis to which to plot. If None, a new figure is created.

  • linear_bins (bool, optional) – Whether to use linearly spaced bins (True) or logarithmically spaced bins (False). False by default.

Returns

ax – The axis to which the plot was made.

Return type

matplotlib axis

py3plex.algorithms.statistics.powerlaw.power_law_ks_distance(data, alpha, xmin, xmax=None, discrete=False, kuiper=False)
py3plex.algorithms.statistics.powerlaw.power_law_likelihoods(data, alpha, xmin, xmax=False, discrete=False)
py3plex.algorithms.statistics.powerlaw.stretched_exponential_likelihoods(data, Lambda, beta, xmin, xmax=False, discrete=False)
py3plex.algorithms.statistics.powerlaw.trim_to_range(data, xmin=None, xmax=None, **kwargs)

Removes elements of the data that are above xmin or below xmax (if present)

py3plex.algorithms.statistics.powerlaw.truncated_power_law_likelihoods(data, alpha, Lambda, xmin, xmax=False, discrete=False)

py3plex.algorithms.statistics.statistics module

py3plex.algorithms.statistics.statistics.core_network_statistics(G, labels=None, name='example')

py3plex.algorithms.statistics.topology module

py3plex.algorithms.statistics.topology.basic_pl_stats(degree_sequence)

:param degree sequence of individual nodes

py3plex.algorithms.statistics.topology.plot_power_law(degree_sequence, title, xlabel, plabel, ylabel='Number of nodes', formula_x=70, formula_y=0.05, show=True, use_normalization=False)

Module contents