Suggested useΒΆ
Current implementation of autoBOT enables the user to explore multiple different representation spaces. However, the end-goal is to offer a tool which serves as a very strong baseline. To this end, the configuration below was shown to perform well across multiple benchmarks/shared tasks. Note that this configuration needs the sentence-transformers library (multilingual contextual representations).
import autoBOTLib
import pandas as pd
## Load example data frame
dataframe = pd.read_csv("../data/insults/train.tsv", sep="\t")
train_sequences = dataframe['text_a'].values.tolist()
train_targets = dataframe['label'].values
autoBOTLibObj = autoBOTLib.GAlearner(
train_sequences, # input sequences
train_targets, # target space
time_constraint=3, # time in hours
num_cpu="all", # number of CPUs to use
task_name="example test", # task identifier
scoring_metric = "f1", # sklearn-compatible scoring metric as the fitness.
hof_size=3, # size of the hall of fame
top_k_importances=25, # how many top features to output as final ranking
memory_storage=
"./memory", # triplet base for concept features (see ./examples folder)
representation_type="neurosymbolic")
autoBOTLibObj.evolve(
nind=10, ## population size
crossover_proba=0.6, ## crossover rate
mutpb=0.4) ## mutation rate
The triplet knowledge bases can be downloaded from e.g., https://github.com/totogo/awesome-knowledge-graph#knowledge-graph-dataset. See the autobot/examples for more examples.