Use of subword tokenization for domain generation algorithm classification

Cybersecurity

Table 2 A summary of DL methods for DGA detection and classification

Detection or classification problem	DL models	Dataset	F1 score
Detection (Berman 2019)	CNN embedding + CNN (1D) + fully connected layers	1 million benign 852,116 DGA from 50 classes	0.9933
Detection (Selvi et al. 2021)	LSTM: embedding + LSTM + fully connected layers	32,000 benign 32,000 DGA	0.9762
Classification (Qiao et al. 2019)	LSTM with attention: embedding + LSTM + attention + fully connected layers	910,313 benign 765,091 DGA from 15 classes: 759,091 DGA from 14 random-looking classes 6000 from 1 word-looking class	0.9458
Detection and Classification (Vij et al. 2020)	LSTM: embedding + LSTM + fully connected layers	109,935 benign 109,935 DGA from 11 classes (all are random-looking DGAs)	Detection: 0.9804 Classification: 0.7192
Detection and Classification (Ren et al. 2020)	CNN-BiLSTM with attention: embedding + CNN + LSTM + attention + fully connected layer	1 million benign 308,230 DGA from 24 classes: 19 arithmetic-based 2 wordlist-based 3 part-wordlist-based	Detection: 0.9879 Classification: 0.8300
Detection (Yang et al. 2022)	Subword tokenization and transformer	10,000 benign and 10,000 DGA from 9 classes: (one wordlist-based DGA)	Detection: 0.9697