Using custom feature transformers¶
You can use any custom feature transformer classes as part of the evolution. In the following examples, we demonstrate one of the most powerful functionalities of autoBOTLib - its modularity.
Using custom features¶
Let’s explore the following example.
## how to use custom features?
import autoBOTLib
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import pipeline ## A necessary import
## Load example data frame
dataframe = pd.read_csv("../data/insults/train.tsv", sep="\t")
train_sequences = dataframe['text_a'].values.tolist()
train_targets = dataframe['label'].values
## Let's say we wish to use only the following two text-to-feature transformer objects
tfidf_word_unigram = TfidfVectorizer(ngram_range=(1,2),
sublinear_tf=False,
max_features=100)
tfidf_char_bigram = TfidfVectorizer(analyzer='char',
ngram_range=(1,2),
max_features=100)
## Note: You can use any transformer class that is implemented in accordance with the scikit-learn API (.fit, .transform, .fit_transform, .get_feature_names, etc.)
## Next, put them into a list. Note the use of text_col class.
custom_features = [
('word_features',
pipeline.Pipeline([('s1', autoBOTLib.feature_constructors.text_col(key='no_stopwords')),
('word_tfidf_unigram', tfidf_word_unigram)])),
('char_features',
pipeline.Pipeline([('s2', autoBOTLib.feature_constructors.text_col(key='no_stopwords')),
('char_tfidf_bigram', tfidf_char_bigram)]))
]
## Finally, specify this list as
autoBOTLibObj = autoBOTLib.GAlearner(train_sequences,
train_targets,
time_constraint = 1,
custom_transformer_pipeline = custom_features).evolve()
## Down-stream task
dataframe2 = pd.read_csv("../data/insults/test.tsv", sep="\t")
test_sequences = dataframe2['text_a'].values.tolist()
predictions = autoBOTLibObj.predict(test_sequences)
Note that the only constraints for you to include a custom transformation class are the following:
The class must have the sklearn-like API
Join it as a series of pipeline objects.
To see how to implement an example custom class, you can inspect for example the keyword-based features.
Extending existing feature spaces¶
Let’s finally inspect how to extend the existing feature space of autoBOTLib with some custom features. Why would this be useful? Assume you have your one interesting feature constructor, but would like to exploit the existing ones (instead of implementing them from scratch).
import autoBOTLib
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import pipeline ## A necessary import
## Load example data frame
dataframe = pd.read_csv("../data/insults/train.tsv", sep="\t")
train_sequences = dataframe['text_a'].values.tolist()
train_targets = dataframe['label'].values
## Define custom transformer classes as in the example above
tfidf_word_unigram = TfidfVectorizer(ngram_range=(1, 2),
sublinear_tf=False,
max_features=100)
tfidf_char_bigram = TfidfVectorizer(analyzer='char',
ngram_range=(1, 2),
max_features=100)
custom_features = [
('word_features_custom',
pipeline.Pipeline([
('s1', autoBOTLib.feature_constructors.text_col(key='no_stopwords')),
('word_tfidf_unigram', tfidf_word_unigram)
])),
('char_features_cusom',
pipeline.Pipeline([
('s2', autoBOTLib.feature_constructors.text_col(key='no_stopwords')),
('char_tfidf_bigram', tfidf_char_bigram)
]))
]
## Finally, use the flag "combine_with_existing_representation" to append the new transformer pipeline to an existing one (e.g., neurosymbolic). This way, you can easily extend current autoBOTLib!
autoBOTLibObj = autoBOTLib.GAlearner(
train_sequences,
train_targets,
time_constraint=1,
representation_type = "neurosymbolic",
custom_transformer_pipeline=custom_features,
combine_with_existing_representation = True).evolve()
dataframe2 = pd.read_csv("../data/insults/test.tsv", sep="\t")
test_sequences = dataframe2['text_a'].values.tolist()
predictions = autoBOTLibObj.predict(test_sequences)
Note how you only need to specify the combine_with_existing_representation flag apart from the custom pipeline, which will be appended to the existing (e.g., neurosymbolic) one.