Feature types

autoBOT’s supported feature types

Feature name

Description

Code (autoBOT)

Symbolic feature types

Concept features

Features derived from ConceptNet

concept_features

Relational features - token

Features based on longer distances between tokens

relational_features_token

Relational features - char

Features based on longer distances between characters

relational_features_char

Relational features - bigram

Features based on longer distances between character pairs

relational_features_bigram

Topic features

KMeans clusters of TF-IDF into d dimensions, presences are feature values

topic_features

Keywords

Keyword-based features

keyword_features

Word-based features

Standard word TF-IDF-based features

word_features

Character-based features

Standard character TF-IDF-based features

char_features

Part-of-speech tag-based features

TF-IDF of documents, created based on POS annotations

pos_features

Sub-symbolic feature types

Neural - contextual

Features based on sentence-transformers library (XLM, multilingual)

contextual_features

Neural - dbow

doc2vec - dbow

neural_features_dbow

Neural - dm

doc2vec - dm

neural_features_dm

Document graph

Features derived as node embeddings of the corpus graph

document_graph

Representation types are defined as follows (within autoBOT); the keys of these dictionaries are the arguments under the field representation_type (see examples). Finally, note that new features are added on regular basis, hence there are more feature types available as in the original paper.

# Full stack
feature_presets['neurosymbolic'] = [
    'concept_features', 'document_graph', 'neural_features_dbow',
    'neural_features_dm', 'relational_features_token', 'topic_features',
    'keyword_features', 'relational_features_char', 'char_features',
    'word_features', 'contextual_features', 'relational_features_bigram'
]

# This one is ~language agnostic
feature_presets['neurosymbolic-lite'] = [
    'document_graph', 'neural_features_dbow', 'neural_features_dm',
    'topic_features', 'keyword_features', 'relational_features_char', 'relational_features_token','char_features', 'word_features', 'relational_features_bigram'
]

# MLJ paper versions
feature_presets['neurosymbolic-default'] = [
    'neural_features_dbow', 'neural_features_dm', 'keyword_features',
    'relational_features_char', 'char_features', 'word_features', "pos_features",
    'concept_features'
]

feature_presets['neural'] = [
    'document_graph', 'neural_features_dbow', 'neural_features_dm'
]

feature_presets['symbolic'] = [
    'concept_features', 'relational_features_token', 'topic_features',
    'keyword_features', 'relational_features_char', 'char_features',
    'word_features', 'pos_features', 'relational_features_bigram'
]

So, for example, to use the symbolic feature space only, one can simply:

autoBOTLibObj = autoBOTLib.GAlearner(train_sequences,
train_targets,
time_constraint = 1,
representation_type = "symbolic").evolve()