reliefe package

Submodules

reliefe.minimal_neighborhood_ablation module

reliefe.reliefe_classes module

ReliefE algorithm : Skrlj and Petkovic, 2021

class reliefe.reliefe_classes.ReliefE(num_iter: Union[float, int, List[Union[float, int]]] = 1.0, k=8, normalize_descriptive=True, embedding_based_distances=False, num_threads='all', verbose=False, mlc_distance='cosine', latent_dimension=128, sparsity_threshold=0.15, determine_k_automatically=False, samples=2048, use_average_neighbour=False)

Bases: object

static compute_members(y)
determine_latent_dim(xs, true_dim=None, empirical_left_continuous=True, metric='euclidean')

A method for computing the latent dimension, based on the paper: https://www.nature.com/articles/s41598-017-11873-y.pdf

Parameters
  • xs – data matrix X

  • true_dim – the actual dimension or None

  • empirical_left_continuous – Whether only left continous part shall be considered.

  • metric – The metric to compute the distances.

Returns

Latent dimension.

fit(x, y, embedding_method=None, store_neighborhoods=None)

Key idea of ReliefE: embed the instance space. Compute mean embedding for each of the classes. compare, in the update step, to the joint label embedding instead of the single instance

Parameters
  • x – Feature space, array-like.

  • y – Target space, a 0/1 array-like structure (1-hot encoded for classification).

  • embedding_method – Custom embedding class (e.g., TruncatedSVD())

  • store_neighborhoods – File to store adaptive k values to for ablation.

Returns

None.

send_message(message)
reliefe.reliefe_classes.sgn(el)

The standard sgn function.

Parameters

el – float input

Returns

sign of the number

reliefe.reliefe_evaluation module

reliefe.reliefe_evaluation.evaluate_importances_with_logistic_regression(x_train, x_test, y_train, y_test, importances, k)

reliefe.utils module

class reliefe.utils.MLCDistances

Bases: object

ACCURACY = 'accuracy'
DISTANCES = ['hamming', 'f1', 'accuracy', 'subset', 'cosine', 'hyperbolic']
F1 = 'f1'
HAMMING = 'hamming'
SUBSET = 'subset'
class reliefe.utils.TaskTypes

Bases: object

CLASSIFICATION = 'classification'
HMLC = 'hierarchical multi-label classification'
MLC = 'multi-label classification'
REGRESSION = 'regression'
reliefe.utils.aggregate_nom(values)
reliefe.utils.aggregate_num(values)
reliefe.utils.basic_parse_sparse_line(line, offset=0)
reliefe.utils.feature_ranking_wrapper(path_to_arff: str, descriptive_indices: Union[None, List[int]], target_index: Union[None, int], feature_ranking: Callable, *args)

Takes care of the datasets that contain nominal attributes: - loads the data from arff file and 1-hot encode the nominal features - computed the relevance of the transformed features - compute the relevance of original features by summing up the relevance of the 1-hot encoded groups

E.g., if the data has two features: x1 which is numeric and x2 which can take values A, B, and C, the pipeline is like this: - load data which has now 4 features: x1, x2-A, x2-B, x2-C (e.g., if originally, the first example is 2.3,B,

the converted example is 2.3,0,1,0.)

  • compute relevance for the transformed features, e.g., [0.1, 0.2, 0.3, 0.05]

  • compute relevance for the original features: [0.1, 0.55], since 0.2 + 0.3 + 0.05 = 0.55.

reliefe.utils.get_task_type(target_attributes_values: List[List[str]])

Finds the task type from the possible attribute values. :param target_attributes_values: :return:

reliefe.utils.impute_missing_values(column, possible_values, is_numeric, missing_char_nom)

Replaces missing_char_nom for nominal and np.NaNs for numeric attributes with the modes and averages. This is done in-place. :param column: list of values :param possible_values: empty for numeric (and ignored), and list of possible values otherwise :param is_numeric: is the attribute numeric :param missing_char_nom: character that denotes missing nominal value :return: number of missing values

reliefe.utils.is_one_based_sparse_arff(path_to_data, n_attributes)

Determines if the data is given in sparse format. If it is, determines whether the attribute indices are 0- or 1-based :param path_to_data: :param n_attributes: :return: (True/False, 0/1) The second component is not used if the arff is not sparse, i.e., if the first component (equals True).

reliefe.utils.load_arff(path_to_data, descriptive_indices: List[int], target_indices: Union[int, List[int]], missing_character='?', impute_missing=True)

Loads arff in sparse or dense form. Nominal attributes (descriptive or target) are 1-hot encoded.

Due to some inconsistencies in use of quotation marks, of data-creators, single and double quotation marks are simply removed prior to any processing.

Parameters
  • path_to_data – path to arff

  • descriptive_indices – list of 0-based indices of descriptive attributes

  • target_indices – analogue of descriptive indices. Can be a single number.

  • missing_character – character that denotes missing value in the arff

  • impute_missing – should missing values be imputed? Is so, simple per-column means/modes are computed.

Otherwise, missing numeric values are converted to np.NaNs. :return: descriptive matrix (dense), target matrix (sparse), attribute_ranges The ranges are given as a dictionary {index of original descriptive attribute: (start, end), …}, where columns with indices in range(start, end) in descriptive matrix belong to the descriptive attribute. For example, if the attribute was originally numeric, then start = end - 1.

reliefe.utils.measure_time(f)
reliefe.utils.read_meta(dframe, specs='dataset_specifications.txt')
reliefe.utils.show_arff_attributes(path_to_data, k=5)
reliefe.utils.test_stuff(test_first=False, test_second=False)

Module contents