From: LSTM RNN: detecting exploit kits using redirection chain sequences
Ref | Dataset | Approach | Results |
---|---|---|---|
HTTP logs from 200+ networks over 6 months and PCAPs (Duncan 2020) over 3 months | Compares EK detection using 5 indicators (MIME type, structure, duration, repetition, browser agent) against RegEx only based detection | Average precision of 0.95 and recall of 0.92-0.95 using all 5 indicators | |
820 PCAPs (Duncan 2020) (2014-2016) | Applies Decision Tree classifier to content-based, interaction and connection-specific features extracted from the HTTP, DNS and Files logs produced by Zeek | Classified EK traffic with 0.99 accuracy, 0.92 precision and families with 0.82-0.99 accuracy, 0.8-0.99 precision | |
Dataset extracted from 3496 malicious and 2907 benign websites (MalCrawler) | Determines importance of 25 different features for detecting malicious websites, according to accuracy and computational costs. Applies 10-fold cross-validation (CV) in WEKA using Naive Bayes and C4.5 classifiers | Identifies top 5 attributes of malicious sites; cloaking, use of iFrame, redirection, size of obfuscated code and pop-ups using Window.open() function | |
240 PCAPs (Duncan 2020) (2016) | Extracts 20 URL-based features from each domain in EK attack chain and compares ML algorithms | KNN, SVM, GBC achieved 0.958, 0.916 and 1.0 accuracy | |
5000 redirect chains from a large AV vendor (2012) | Builds redirection graphs by aggregating redirect chains from a collection of different users, and, extracts 28 features from 5 categories for SVM | Achieved F1 score of up to 0.881, depending on the range of features considered | |
Crawled Alexa top 1m domains and Microsoft’s feed of malicious URLs over 4-6 weeks (2012) | Detects mass redirect-script injections by comparing suspicious JS files to their original versions. Based on the observation that redirection scripts are often quietly injected into legitimate JS libraries, whose unaltered code is publicly available | Produced detailed analysis of malicious JS/redirects and quantified the use obfuscation/evasion techniques | |
D3M 2013 dataset of 108 malicious websites (Marionette) | Uses monitoring sensors on the client-side (browser, web proxy and DNS), and, an analysis centre on the server-side to detect EK attacks. EXE downloads are classified as malicious if the URL is not present in previous HTTP headers or web content | Achieved 0% FPR with 24.2% FNR when tested against dataset of 108 URLs (33 malicious) | |
15,000 malicious paths and 225,000 benign paths, provided by a large ISP (2011-2012) | Reconstructs user browsing activity into trees, representing time-based sessions, and, extracts 8 redirection-based features for use with a Decision Tree classifier | Extracted redirection trees with 0.965 accuracy, and, classified with precision and recall values of 0.9-0.98 | |
Crawled 19,899 EK landing pages over 3 years (Marionette) | Applies program slicing to JS; executes each code segment and extracts URLs, even when cloaking prevents the execution of malicious JS branches | Extracted 30,000 new URLs compared to existing techniques | |
Dataset of 683 manually labelled, malicious download paths (164 EK instances) | Investigates browsing paths followed by users before an attack. WebWitness identifies a malicious download and traces back through HTTP requests, building a tree of redirects that led to the malware | Identified EKs with 0.9919 accuracy when tested against 48 EK samples using 10-fold CV | |
688 million redirection trees, extracted from 3800 hours of traffic (2013-2014) | Builds web session trees (WST) and extracts URL-based features. Subtree similarity searches are performed against the WSTs to identify node-level and structural similarities with known malicious trees | Achieved 95% FPR against a dataset of 85 EK samples, and, identified 28 new EK instances during analysis | |
D3M 2015 dataset of 256 malicious websites (Marionette) | Builds WSTs similar to (Taylor et al. 2016), but, aims to handle incomplete redirection data using time-based clustering. Focus is on WST construction rather than feature extraction | Average accuracy of 0.862 using 2-f CV. Scored higher on EKs families represented in both train and test sets | |
8467 JS samples from 20,272 malicious websites (2012-2016) | Compares redirection graphs from browsers running different JS implementations to identify structural differences resulting from evasive code | Discovered several new evasion techniques that abuse JS implementation differences | |
Crawled 455,860 websites, 1.3% labelled as malicious or evasive (2016) | Graph mining approach to detect malicious sites, even if full chain of redirects cannot be extracted. 22 redirect, HTML and JS-based features obtained from each graph, evaluated with RF classifier | Achieved F1 score of 0.766 for sites hosting EK URLs, and, identified 143 more malicious sites than conventional systems |