From: Use of subword tokenization for domain generation algorithm classification
Dataset | Characteristics | Methods | F1 scores | |
---|---|---|---|---|
Word-looking DGAs | Random-looking DGAs | |||
1 | 11 word-looking DGAs, 39 random-looking DGAs, Ratio (W/R) = 0.282 (Zago et al. 2020b) | ML (lexical features, RF) (Zago et al. 2020a) | 0.6820 | 0.7360 |
ML (n-gram) (Cucchiarelli et al. 2021) | 0.9084 | 0.7848 | ||
BiLSTM (Cucchiarelli et al. 2021) | 0.8745 | 0.6989 | ||
CNN-BiLSTM (Cucchiarelli et al. 2021) | 0.9010 | 0.7026 | ||
2 | 2 word-looking DGAs, 19 random-looking DGAs, Ratio (W/R) = 0.105 | ML (SVM) (Ren et al. 2020) | 0.3670 | 0.6180 |
LSTM (Ren et al. 2020) | 0.5500 | 0.6600 | ||
CNN (Ren et al. 2020) | 0.5135 | 0.7053 | ||
CNN-BiLSTM (Ren et al. 2020) | 0.4503 | 0.7009 | ||
CNN-BiLSTM with attention (Ren et al. 2020) | 0.8854 | 0.7853 | ||
3 | 4 word-looking DGAs, 53 random-looking DGAs, ratio (W/R) = 0.0075 | ML (MLP) (Vranken and Alizadeh 2022) | 0.7887 | 0.7750 |
ML (RF) (Vranken and Alizadeh 2022) | 0.3444 | 0.6498 | ||
ML (SVM) (Vranken and Alizadeh 2022) | 0.8371 | 0.7513 | ||
LSTM (Vranken and Alizadeh 2022) | 0.7331 | 0.8348 | ||
4 | 1 word-looking DGA, 14 random-looking DGAs Ratio (W/R) = 0.071 | LSTM (Qiao et al. 2019) | 0.1626 | 0.9445 |
LSTM with attention (Qiao et al. 2019) | 0.1743 | 0.9458 | ||
5 | 11 random-looking DGAs | LSTM (Vij et al. 2020) | – | 0.7192 |