Feature types¶
Feature name |
Description |
Code (autoBOT) |
---|---|---|
Symbolic feature types |
||
Concept features |
Features derived from ConceptNet |
concept_features |
Relational features - token |
Features based on longer distances between tokens |
relational_features_token |
Relational features - char |
Features based on longer distances between characters |
relational_features_char |
Relational features - bigram |
Features based on longer distances between character pairs |
relational_features_bigram |
Topic features |
KMeans clusters of TF-IDF into d dimensions, presences are feature values |
topic_features |
Keywords |
Keyword-based features |
keyword_features |
Word-based features |
Standard word TF-IDF-based features |
word_features |
Character-based features |
Standard character TF-IDF-based features |
char_features |
Part-of-speech tag-based features |
TF-IDF of documents, created based on POS annotations |
pos_features |
Sub-symbolic feature types |
||
Neural - contextual |
Features based on sentence-transformers library (XLM, multilingual) |
contextual_features |
Neural - dbow |
doc2vec - dbow |
neural_features_dbow |
Neural - dm |
doc2vec - dm |
neural_features_dm |
Document graph |
Features derived as node embeddings of the corpus graph |
document_graph |
Representation types are defined as follows (within autoBOT); the keys of these dictionaries are the arguments under the field representation_type (see examples). Finally, note that new features are added on regular basis, hence there are more feature types available as in the original paper.
# Full stack feature_presets['neurosymbolic'] = [ 'concept_features', 'document_graph', 'neural_features_dbow', 'neural_features_dm', 'relational_features_token', 'topic_features', 'keyword_features', 'relational_features_char', 'char_features', 'word_features', 'contextual_features', 'relational_features_bigram' ] # This one is ~language agnostic feature_presets['neurosymbolic-lite'] = [ 'document_graph', 'neural_features_dbow', 'neural_features_dm', 'topic_features', 'keyword_features', 'relational_features_char', 'relational_features_token','char_features', 'word_features', 'relational_features_bigram' ] # MLJ paper versions feature_presets['neurosymbolic-default'] = [ 'neural_features_dbow', 'neural_features_dm', 'keyword_features', 'relational_features_char', 'char_features', 'word_features', "pos_features", 'concept_features' ] feature_presets['neural'] = [ 'document_graph', 'neural_features_dbow', 'neural_features_dm' ] feature_presets['symbolic'] = [ 'concept_features', 'relational_features_token', 'topic_features', 'keyword_features', 'relational_features_char', 'char_features', 'word_features', 'pos_features', 'relational_features_bigram' ]
So, for example, to use the symbolic feature space only, one can simply:
autoBOTLibObj = autoBOTLib.GAlearner(train_sequences,
train_targets,
time_constraint = 1,
representation_type = "symbolic").evolve()