From: Use of subword tokenization for domain generation algorithm classification
DGA Type | Class name | URL | RIndex |
---|---|---|---|
Word-looking | pizd | animalforget.net Character tokenization: “a”, “n”, “i”, “m”, “a”, “l”, “f”, “o”, “r”, “g”, “e”, “t” Subword tokenization: “animal”, “for”, “get” | \(N\left({Tokens}_{char}\right)\)=12 \(N\left({Tokens}_{subword}\right)\)=3 RIndex = 0.75 |
Word-looking | rovnix | thesetobewarfarebecomes.net Character tokenization: “t”, “h”, “e”, “s”, “e”, “t”, “o”, “b”, “e”, “w”, “a”, “r”, “f”, “a”, “r”, “e”, “b”, “e”, “c”, “o”, “m”, “e”, “s” Subword tokenization: “these”, “to”, “be”, “war”, “fare”, “be”, “com”, “es” | \(N\left({Tokens}_{char}\right)\)=23 \(N\left({Tokens}_{subword}\right)\)=8 RIndex = 0.65 |
Random-looking | ramnit | ukphbhncsdpgo.com Character tokenization: “u”, “k”, “p”, “h”, “b”, “h”, “n”, “c”, “s”, “d”, “p”, “g”, “o” Subword tokenization: “uk”, “p”, “hb”, “hn”, “cs”, “dp”, “go” | \(N\left({Tokens}_{char}\right)\)=13 \(N\left({Tokens}_{subword}\right)\)=7 RIndex = 0.46 |