LSTM RNN: detecting exploit kits using redirection chain sequences

Burgess, Jonah; O’Kane, Philip; Sezer, Sakir; Carlin, Domhnall

doi:10.1186/s42400-021-00093-7

Cybersecurity

Table 1 Comparison of related works

From: LSTM RNN: detecting exploit kits using redirection chain sequences

Ref	Dataset	Approach	Results
Nikolaev et al. (2016)	HTTP logs from 200+ networks over 6 months and PCAPs (Duncan 2020) over 3 months	Compares EK detection using 5 indicators (MIME type, structure, duration, repetition, browser agent) against RegEx only based detection	Average precision of 0.95 and recall of 0.92-0.95 using all 5 indicators
Harnmetta and Ngamsuriyaroj (2018)	820 PCAPs (Duncan 2020) (2014-2016)	Applies Decision Tree classifier to content-based, interaction and connection-specific features extracted from the HTTP, DNS and Files logs produced by Zeek	Classified EK traffic with 0.99 accuracy, 0.92 precision and families with 0.82-0.99 accuracy, 0.8-0.99 precision
Singh and Goyal (2019)	Dataset extracted from 3496 malicious and 2907 benign websites (MalCrawler)	Determines importance of 25 different features for detecting malicious websites, according to accuracy and computational costs. Applies 10-fold cross-validation (CV) in WEKA using Naive Bayes and C4.5 classifiers	Identifies top 5 attributes of malicious sites; cloaking, use of iFrame, redirection, size of obfuscated code and pop-ups using Window.open() function
Süren et al. (2019)	240 PCAPs (Duncan 2020) (2016)	Extracts 20 URL-based features from each domain in EK attack chain and compares ML algorithms	KNN, SVM, GBC achieved 0.958, 0.916 and 1.0 accuracy
Stringhini et al. (2013)	5000 redirect chains from a large AV vendor (2012)	Builds redirection graphs by aggregating redirect chains from a collection of different users, and, extracts 28 features from 5 categories for SVM	Achieved F1 score of up to 0.881, depending on the range of features considered
Li et al. (2014)	Crawled Alexa top 1m domains and Microsoft’s feed of malicious URLs over 4-6 weeks (2012)	Detects mass redirect-script injections by comparing suspicious JS files to their original versions. Based on the observation that redirection scripts are often quietly injected into legitimate JS libraries, whose unaltered code is publicly available	Produced detailed analysis of malicious JS/redirects and quantified the use obfuscation/evasion techniques
Matsunaka et al. (2014)	D3M 2013 dataset of 108 malicious websites (Marionette)	Uses monitoring sensors on the client-side (browser, web proxy and DNS), and, an analysis centre on the server-side to detect EK attacks. EXE downloads are classified as malicious if the URL is not present in previous HTTP headers or web content	Achieved 0% FPR with 24.2% FNR when tested against dataset of 108 URLs (33 malicious)
Mekky et al. (2014)	15,000 malicious paths and 225,000 benign paths, provided by a large ISP (2011-2012)	Reconstructs user browsing activity into trees, representing time-based sessions, and, extracts 8 redirection-based features for use with a Decision Tree classifier	Extracted redirection trees with 0.965 accuracy, and, classified with precision and recall values of 0.9-0.98
Takata et al. (2015)	Crawled 19,899 EK landing pages over 3 years (Marionette)	Applies program slicing to JS; executes each code segment and extracts URLs, even when cloaking prevents the execution of malicious JS branches	Extracted 30,000 new URLs compared to existing techniques
Nelms et al. (2015)	Dataset of 683 manually labelled, malicious download paths (164 EK instances)	Investigates browsing paths followed by users before an attack. WebWitness identifies a malicious download and traces back through HTTP requests, building a tree of redirects that led to the malware	Identified EKs with 0.9919 accuracy when tested against 48 EK samples using 10-fold CV
Taylor et al. (2016)	688 million redirection trees, extracted from 3800 hours of traffic (2013-2014)	Builds web session trees (WST) and extracts URL-based features. Subtree similarity searches are performed against the WSTs to identify node-level and structural similarities with known malicious trees	Achieved 95% FPR against a dataset of 85 EK samples, and, identified 28 new EK instances during analysis
Nagai et al. (2019)	D3M 2015 dataset of 256 malicious websites (Marionette)	Builds WSTs similar to (Taylor et al. 2016), but, aims to handle incomplete redirection data using time-based clustering. Focus is on WST construction rather than feature extraction	Average accuracy of 0.862 using 2-f CV. Scored higher on EKs families represented in both train and test sets
Takata et al. (2018)	8467 JS samples from 20,272 malicious websites (2012-2016)	Compares redirection graphs from browsers running different JS implementations to identify structural differences resulting from evasive code	Discovered several new evasion techniques that abuse JS implementation differences
Shibahara et al. (2019)	Crawled 455,860 websites, 1.3% labelled as malicious or evasive (2016)	Graph mining approach to detect malicious sites, even if full chain of redirects cannot be extracted. 22 redirect, HTML and JS-based features obtained from each graph, evaluated with RF classifier	Achieved F1 score of 0.766 for sites hosting EK URLs, and, identified 143 more malicious sites than conventional systems

Back to article page