reliefe package¶
Submodules¶
reliefe.minimal_neighborhood_ablation module¶
reliefe.reliefe_classes module¶
ReliefE algorithm : Skrlj and Petkovic, 2021
-
class
reliefe.reliefe_classes.
ReliefE
(num_iter: Union[float, int, List[Union[float, int]]] = 1.0, k=8, normalize_descriptive=True, embedding_based_distances=False, num_threads='all', verbose=False, mlc_distance='cosine', latent_dimension=128, sparsity_threshold=0.15, determine_k_automatically=False, samples=2048, use_average_neighbour=False)¶ Bases:
object
-
static
compute_members
(y)¶
-
determine_latent_dim
(xs, true_dim=None, empirical_left_continuous=True, metric='euclidean')¶ A method for computing the latent dimension, based on the paper: https://www.nature.com/articles/s41598-017-11873-y.pdf
- Parameters
xs – data matrix X
true_dim – the actual dimension or None
empirical_left_continuous – Whether only left continous part shall be considered.
metric – The metric to compute the distances.
- Returns
Latent dimension.
-
fit
(x, y, embedding_method=None, store_neighborhoods=None)¶ Key idea of ReliefE: embed the instance space. Compute mean embedding for each of the classes. compare, in the update step, to the joint label embedding instead of the single instance
- Parameters
x – Feature space, array-like.
y – Target space, a 0/1 array-like structure (1-hot encoded for classification).
embedding_method – Custom embedding class (e.g., TruncatedSVD())
store_neighborhoods – File to store adaptive k values to for ablation.
- Returns
None.
-
send_message
(message)¶
-
static
-
reliefe.reliefe_classes.
sgn
(el)¶ The standard sgn function.
- Parameters
el – float input
- Returns
sign of the number
reliefe.reliefe_evaluation module¶
-
reliefe.reliefe_evaluation.
evaluate_importances_with_logistic_regression
(x_train, x_test, y_train, y_test, importances, k)¶
reliefe.utils module¶
-
class
reliefe.utils.
MLCDistances
¶ Bases:
object
-
ACCURACY
= 'accuracy'¶
-
DISTANCES
= ['hamming', 'f1', 'accuracy', 'subset', 'cosine', 'hyperbolic']¶
-
F1
= 'f1'¶
-
HAMMING
= 'hamming'¶
-
SUBSET
= 'subset'¶
-
-
class
reliefe.utils.
TaskTypes
¶ Bases:
object
-
CLASSIFICATION
= 'classification'¶
-
HMLC
= 'hierarchical multi-label classification'¶
-
MLC
= 'multi-label classification'¶
-
REGRESSION
= 'regression'¶
-
-
reliefe.utils.
aggregate_nom
(values)¶
-
reliefe.utils.
aggregate_num
(values)¶
-
reliefe.utils.
basic_parse_sparse_line
(line, offset=0)¶
-
reliefe.utils.
feature_ranking_wrapper
(path_to_arff: str, descriptive_indices: Union[None, List[int]], target_index: Union[None, int], feature_ranking: Callable, *args)¶ Takes care of the datasets that contain nominal attributes: - loads the data from arff file and 1-hot encode the nominal features - computed the relevance of the transformed features - compute the relevance of original features by summing up the relevance of the 1-hot encoded groups
E.g., if the data has two features: x1 which is numeric and x2 which can take values A, B, and C, the pipeline is like this: - load data which has now 4 features: x1, x2-A, x2-B, x2-C (e.g., if originally, the first example is 2.3,B,
the converted example is 2.3,0,1,0.)
compute relevance for the transformed features, e.g., [0.1, 0.2, 0.3, 0.05]
compute relevance for the original features: [0.1, 0.55], since 0.2 + 0.3 + 0.05 = 0.55.
-
reliefe.utils.
get_task_type
(target_attributes_values: List[List[str]])¶ Finds the task type from the possible attribute values. :param target_attributes_values: :return:
-
reliefe.utils.
impute_missing_values
(column, possible_values, is_numeric, missing_char_nom)¶ Replaces missing_char_nom for nominal and np.NaNs for numeric attributes with the modes and averages. This is done in-place. :param column: list of values :param possible_values: empty for numeric (and ignored), and list of possible values otherwise :param is_numeric: is the attribute numeric :param missing_char_nom: character that denotes missing nominal value :return: number of missing values
-
reliefe.utils.
is_one_based_sparse_arff
(path_to_data, n_attributes)¶ Determines if the data is given in sparse format. If it is, determines whether the attribute indices are 0- or 1-based :param path_to_data: :param n_attributes: :return: (True/False, 0/1) The second component is not used if the arff is not sparse, i.e., if the first component (equals True).
-
reliefe.utils.
load_arff
(path_to_data, descriptive_indices: List[int], target_indices: Union[int, List[int]], missing_character='?', impute_missing=True)¶ Loads arff in sparse or dense form. Nominal attributes (descriptive or target) are 1-hot encoded.
Due to some inconsistencies in use of quotation marks, of data-creators, single and double quotation marks are simply removed prior to any processing.
- Parameters
path_to_data – path to arff
descriptive_indices – list of 0-based indices of descriptive attributes
target_indices – analogue of descriptive indices. Can be a single number.
missing_character – character that denotes missing value in the arff
impute_missing – should missing values be imputed? Is so, simple per-column means/modes are computed.
Otherwise, missing numeric values are converted to np.NaNs. :return: descriptive matrix (dense), target matrix (sparse), attribute_ranges The ranges are given as a dictionary {index of original descriptive attribute: (start, end), …}, where columns with indices in range(start, end) in descriptive matrix belong to the descriptive attribute. For example, if the attribute was originally numeric, then start = end - 1.
-
reliefe.utils.
measure_time
(f)¶
-
reliefe.utils.
read_meta
(dframe, specs='dataset_specifications.txt')¶
-
reliefe.utils.
show_arff_attributes
(path_to_data, k=5)¶
-
reliefe.utils.
test_stuff
(test_first=False, test_second=False)¶