Skip to main content

Table 1 A summary of recent ML methods for DGA detection and classification

From: Use of subword tokenization for domain generation algorithm classification

Detection or classification problem

Features

ML models

Dataset

Results

Detection (Almashhadani et al. 2020)

Lexical features

DT, SVM, kNN

85,000 benign

85,000 DGA

from 20 classes

F1 scores:

0.9437 (DT)

0.9411 (SVM)

0.9443 (kNN)

Detection (Wang et al. 2022)

Distance-based features (KL distance, edit distance, Jaccard index)

SVM, NN

10,000 benign,

10,000 DGA

from 12 classes

Accuracy close to 1

Classification (Vranken and Alizadeh 2022)

TF-IDF of the n-grams in domain names

SVM, MLP, RF, DT, kNN

583,9543 benign

492,800 DGA

from 57 classes

F1 scores:

0.7573 (SVM)

0.7759 (MLP)

0.6284 (RF)

0.6443 (DT)

Detection and Classification (Zago et al. 2020a, b)

Lexical features

Adaboost, NN, RF, SVM, DT, kNN

10,000 benign

50 DGA classes, each has 10,000

F1 scores:

Detection

0.556–0.989

Classification

0.297–0.769

Detection and Classification (Cucchiarelli et al. 2021)

n-gram features

MLP

10,000 benign

50 DGA classes,

each has 10,000

F1 scores:

Detection

0.964

Classification

0.823