autoBOTLib.features package¶

Submodules¶

autoBOTLib.features.features_concepts module¶

class autoBOTLib.features.features_concepts.ConceptFeatures(max_features=10000, targets=None, knowledge_graph='../memory')¶

Bases: object

Core class describing sentence embedding methodology employed here.

__init__(max_features=10000, targets=None, knowledge_graph='../memory')¶: Initialize self. See help(type(self)) for accurate signature.

get_grounded_from_path(present_tokens, graph_path)¶: Method which performs a very simple term grounding. This simply evaluates if both terms are present in the corpus. :param list present_tokens: The present tokens :param str graph_path: Path to the triplet base (compressed)

add_triplet(tokens, index, relations=['is_a'])¶

concept_graph(document_space, graph_path)¶: If no prior knowledge graph is supplied, one is constructed. :param document_space: The list of input documents :param graph_path: The path of the knowledge graph used. :return grounded: Grounded relations.

get_propositionalized_rep(documents)¶

The method for constructing the representation.

Parameters: documents – The input list of documents.

fit(text_vector, refit=False, knowledge_graph=None)¶

Fit the model to a text vector.

Parameters: text_vector – Input list of documents.

transform(text_vector, use_conc_docs=False)¶: Transform the data into suitable form.

get_feature_names()¶

fit_transform(text_vector, b=None)¶

A classifc fit-transform method.

Parameters: text_vector – The input list of documents.
Return transformedObj: Transformed texts (to features).

autoBOTLib.features.features_contextual module¶

class autoBOTLib.features.features_contextual.ContextualDocs(model='all-mpnet-base-v2')¶

Bases: object

__init__(model='all-mpnet-base-v2')¶

Class initialization method.

Parameters

ndim – Number of latent dimensions
model – The sentence-transformer model

fit(documents)¶

Parameters: documents – The input set of documents.

transform(documents)¶

Parameters: documents – The input set of documents.

fit_transform(documents, b=None)¶

Parameters: documents – The input set of documents.

get_feature_names()¶

Parameters: fnames – Feature names (custom api artefact)

autoBOTLib.features.features_contextual_supervised module¶

autoBOTLib.features.features_document_graph module¶

class autoBOTLib.features.features_document_graph.RelationalDocs(ndim=128, random_seed=1965123, targets=None, ed_cutoff=- 2, verbose=True, neigh_size=None, doc_limit=4096, percentile_threshold=95)¶

Bases: object

__init__(ndim=128, random_seed=1965123, targets=None, ed_cutoff=- 2, verbose=True, neigh_size=None, doc_limit=4096, percentile_threshold=95)¶

Class initialization method.

Parameters

ndim – Number of latent dimensions
targets – The target vector
random_seed – The random seed used
ed_cutoff – Cutoff for fuzzy string matching when comparing documents
doc_limit – The max number of documents to be considered.
verbose – Whether to have the printouts

jaccard_index(set1, set2)¶

The classic Jaccard index.

Parameters

set1 – First set
set2 – Second set

Return JaccardIndex

fit(text_list)¶

The fit method.

Parameters: text_list – List of input texts

transform(new_documents)¶

Transform method.

Parameters: new_documents – The new set of documents to be transformed.
Return all_embeddings: The final embedding matrix

fit_transform(documents, b=None)¶: The sklearn-like fit-transform method.

get_feature_names()¶

get_graph(wspace, ltl)¶

A method to obtain a graph from a weighted space of documents.

Parameters

wspace – node1,node2 weight mapping
ltl – The number of documents

Return G

The document graph

autoBOTLib.features.features_keyword module¶

class autoBOTLib.features.features_keyword.KeywordFeatures(max_features=10000, targets=None)¶

Bases: object

Core class describing sentence embedding methodology employed here.

__init__(max_features=10000, targets=None)¶: Initialize self. See help(type(self)) for accurate signature.

fit(text_vector, refit=False)¶

Fit the model to a text vector.

Parameters: text_vector – The input list of texts

transform(text_vector)¶

Transform the data into suitable form.

Parameters: text_vector – The input list of texts.
Return transformedObject: The transformed input texts (feature space)

get_feature_names()¶

fit_transform(text_vector, b=None)¶

A classifc fit-transform method.

Parameters: text_vector – Input list of texts.
Return transformedObject: Transformed list of texts

autoBOTLib.features.features_sentence_embeddings module¶

class autoBOTLib.features.features_sentence_embeddings.documentEmbedder(max_features=10000, num_cpu=8, dm=1, pretrained_path='doc2vec.bin', ndim=512)¶

Bases: object

Core class describing sentence embedding methodology employed here. The class functions as a sklearn-like object.

__init__(max_features=10000, num_cpu=8, dm=1, pretrained_path='doc2vec.bin', ndim=512)¶

The standard sgn function.

Parameters

max_features – integer, number of latent dimensions
num_cpu – integer, number of CPUs to be used
dm – Whether to use the “distributed memory” model
pretrained_path – The path where a pretrained model is located (if any)

fit(text_vector, b=None, refit=False)¶: Fit the model to a text vector. :param text_vector: a list of texts

transform(text_vector)¶: Transform the data into suitable form. :param text_vector: The text vector to be transformed via a trained model

get_feature_names()¶

fit_transform(text_vector, a2=None)¶: A classifc fit-transform method. :param text_vector: a text vector used to build and transform a corpus.

autoBOTLib.features.features_token_relations module¶

class autoBOTLib.features.features_token_relations.relationExtractor(max_features=10000, split_char='|||', witem_separator='&&&&', num_cpu=8, neighborhood_token=64, min_token='bigrams', targets=None, verbose=True)¶

Bases: object

The main token relation extraction class. Works for arbitrary tokens.

__init__(max_features=10000, split_char='|||', witem_separator='&&&&', num_cpu=8, neighborhood_token=64, min_token='bigrams', targets=None, verbose=True)¶: Initialize self. See help(type(self)) for accurate signature.

compute_distance(pair, token_dict)¶

A core distance for computing index-based differences.

Parameters

pair – the pair of tokens
token_dict – distance map

Return pair[0], pair[1], dist

The two tokens and the distance

witem_kernel(instance)¶

A simple kernel for traversing a given document.

Parameters: instance – a piece of text
Return global_distances: Distances between tokens

fit(text_vector, b=None)¶

Fit the model to a text vector.

Parameters: text_vector – The input list of texts.

get_feature_names()¶: Return exact feature names.

transform(text_vector, custom_shape=None)¶

Transform the data into suitable form.

Parameters: text_vector – The input list of texts.

fit_transform(text_vector, a2)¶

A classifc fit-transform method.

Parameters: text_vector – Input list of texts.

autoBOTLib.features.features_topic module¶

class autoBOTLib.features.features_topic.TopicDocs(ndim=128, random_seed=1965123, topic_tokens=8196, verbose=True)¶

Bases: object

__init__(ndim=128, random_seed=1965123, topic_tokens=8196, verbose=True)¶

Class initialization method.

Parameters

ndim – Number of latent dimensions
targets – The target vector
random_seed – The random seed used
ed_cutoff – Cutoff for fuzzy string matching when comparing documents
doc_limit – The max number of documents to be considered.
verbose – Whether to have the printouts

fit(text_list)¶

The fit method.

Parameters: text_list – List of input texts

transform(new_documents)¶

Transform method.

Parameters: new_documents – The new set of documents to be transformed.
Return all_embeddings: The final embedding matrix

fit_transform(documents, b=None)¶: The sklearn-like fit-transform method.

get_feature_names()¶: Get feature names.

autoBOTLib.features package¶

Submodules¶

autoBOTLib.features.features_concepts module¶

autoBOTLib.features.features_contextual module¶

autoBOTLib.features.features_contextual_supervised module¶

autoBOTLib.features.features_document_graph module¶

autoBOTLib.features.features_keyword module¶

autoBOTLib.features.features_sentence_embeddings module¶

autoBOTLib.features.features_token_relations module¶

autoBOTLib.features.features_topic module¶

Module contents¶