reliefe package¶

Submodules¶

reliefe.minimal_neighborhood_ablation module¶

reliefe.reliefe_classes module¶

ReliefE algorithm : Skrlj and Petkovic, 2021

class reliefe.reliefe_classes.ReliefE(num_iter: Union[float, int, List[Union[float, int]]] = 1.0, k=8, normalize_descriptive=True, embedding_based_distances=False, num_threads='all', verbose=False, mlc_distance='cosine', latent_dimension=128, sparsity_threshold=0.15, determine_k_automatically=False, samples=2048, use_average_neighbour=False)¶

Bases: object

static compute_members(y)¶

determine_latent_dim(xs, true_dim=None, empirical_left_continuous=True, metric='euclidean')¶

A method for computing the latent dimension, based on the paper: https://www.nature.com/articles/s41598-017-11873-y.pdf

Parameters

xs – data matrix X
true_dim – the actual dimension or None
empirical_left_continuous – Whether only left continous part shall be considered.
metric – The metric to compute the distances.

Returns

Latent dimension.

fit(x, y, embedding_method=None, store_neighborhoods=None)¶

Key idea of ReliefE: embed the instance space. Compute mean embedding for each of the classes. compare, in the update step, to the joint label embedding instead of the single instance

Parameters

x – Feature space, array-like.
y – Target space, a 0/1 array-like structure (1-hot encoded for classification).
embedding_method – Custom embedding class (e.g., TruncatedSVD())
store_neighborhoods – File to store adaptive k values to for ablation.

Returns

None.

send_message(message)¶

reliefe.reliefe_classes.sgn(el)¶

The standard sgn function.

Parameters: el – float input
Returns: sign of the number

reliefe.reliefe_evaluation module¶

reliefe.reliefe_evaluation.evaluate_importances_with_logistic_regression(x_train, x_test, y_train, y_test, importances, k)¶

reliefe.utils module¶

class reliefe.utils.MLCDistances¶

Bases: object

ACCURACY = 'accuracy'¶

DISTANCES = ['hamming', 'f1', 'accuracy', 'subset', 'cosine', 'hyperbolic']¶

F1 = 'f1'¶

HAMMING = 'hamming'¶

SUBSET = 'subset'¶

class reliefe.utils.TaskTypes¶

Bases: object

CLASSIFICATION = 'classification'¶

HMLC = 'hierarchical multi-label classification'¶

MLC = 'multi-label classification'¶

REGRESSION = 'regression'¶

reliefe.utils.aggregate_nom(values)¶

reliefe.utils.aggregate_num(values)¶

reliefe.utils.basic_parse_sparse_line(line, offset=0)¶

reliefe.utils.feature_ranking_wrapper(path_to_arff: str, descriptive_indices: Union[None, List[int]], target_index: Union[None, int], feature_ranking: Callable, *args)¶

Takes care of the datasets that contain nominal attributes: - loads the data from arff file and 1-hot encode the nominal features - computed the relevance of the transformed features - compute the relevance of original features by summing up the relevance of the 1-hot encoded groups

E.g., if the data has two features: x1 which is numeric and x2 which can take values A, B, and C, the pipeline is like this: - load data which has now 4 features: x1, x2-A, x2-B, x2-C (e.g., if originally, the first example is 2.3,B,

the converted example is 2.3,0,1,0.)

compute relevance for the transformed features, e.g., [0.1, 0.2, 0.3, 0.05]
compute relevance for the original features: [0.1, 0.55], since 0.2 + 0.3 + 0.05 = 0.55.

reliefe.utils.get_task_type(target_attributes_values: List[List[str]])¶: Finds the task type from the possible attribute values. :param target_attributes_values: :return:

reliefe.utils.impute_missing_values(column, possible_values, is_numeric, missing_char_nom)¶: Replaces missing_char_nom for nominal and np.NaNs for numeric attributes with the modes and averages. This is done in-place. :param column: list of values :param possible_values: empty for numeric (and ignored), and list of possible values otherwise :param is_numeric: is the attribute numeric :param missing_char_nom: character that denotes missing nominal value :return: number of missing values

reliefe.utils.is_one_based_sparse_arff(path_to_data, n_attributes)¶: Determines if the data is given in sparse format. If it is, determines whether the attribute indices are 0- or 1-based :param path_to_data: :param n_attributes: :return: (True/False, 0/1) The second component is not used if the arff is not sparse, i.e., if the first component (equals True).

reliefe.utils.load_arff(path_to_data, descriptive_indices: List[int], target_indices: Union[int, List[int]], missing_character='?', impute_missing=True)¶

Loads arff in sparse or dense form. Nominal attributes (descriptive or target) are 1-hot encoded.

Due to some inconsistencies in use of quotation marks, of data-creators, single and double quotation marks are simply removed prior to any processing.

Parameters

path_to_data – path to arff
descriptive_indices – list of 0-based indices of descriptive attributes
target_indices – analogue of descriptive indices. Can be a single number.
missing_character – character that denotes missing value in the arff
impute_missing – should missing values be imputed? Is so, simple per-column means/modes are computed.

Otherwise, missing numeric values are converted to np.NaNs. :return: descriptive matrix (dense), target matrix (sparse), attribute_ranges The ranges are given as a dictionary {index of original descriptive attribute: (start, end), …}, where columns with indices in range(start, end) in descriptive matrix belong to the descriptive attribute. For example, if the attribute was originally numeric, then start = end - 1.

reliefe.utils.measure_time(f)¶

reliefe.utils.read_meta(dframe, specs='dataset_specifications.txt')¶

reliefe.utils.show_arff_attributes(path_to_data, k=5)¶

reliefe.utils.test_stuff(test_first=False, test_second=False)¶

reliefe package¶

Submodules¶

reliefe.minimal_neighborhood_ablation module¶

reliefe.reliefe_classes module¶

reliefe.reliefe_evaluation module¶

reliefe.utils module¶

Module contents¶