Skip to main content

Table 3 Comparison of ML approaches proposed by the surveyed papers, in terms of feature extraction and representation; model creation, deployment and use; and benefits and drawbacks

From: On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

 

Feature

 

Model

 

Extraction

Representation

Selection

ML Approach

Evaluation

Deployment

Explanation

Contributions

Liu and Liu (2014)

Static

Boolean vector

Manual

Decision Tree

TPR = 0.813

FPR = 0.0046

Precision = 0.89

Accuracy = 0.98

Off-device

None

Using “Used” permissions, instead of requested ones can reduce feature noise. Using a two-step detection process can increase performance

Arp et al. (2014)

Static

Boolean vector

Manual

SVM

Recall = 0.94

Accuracy = 0.93

FPR = 0.01

Run-time

Hybrid (training off-device, feature extraction and detection on-device)

Uses feature weights to explain predictions

Emphasis on prediction explanation provides clarity to the user, increasing usability. Diversity of features used can alleviate concept drift

Yuan et al. (2016)

Static & Dynamic

Boolean vector

Manual

Deep Belief Network (DBN)

Precision = 0.94

Accuracy = 0.93

FPR = 0.01

Hybrid (training is off-device, feature extraction and detection is on-device)

Uses feature weights to explain predictions

Using deep learning for Android malware shows promise. Results show that there is resistance to re-packaged malware

Alzaylaee et al. (2020)

Static & Dynamic

Boolean vector

Information Gain

Multilayer perceptron (MLP)

TPR = 0.98

TNR = 0.91

FPR = 0.09

FNR = 0.02

Accuracy = 0.95

F = 0.96

AUC = 0.99

Run-time

Off-device

None

Using stateful input generation for dynamic analysis has improved code coverage, when compared to other works. Clear run-time performance evaluation is conducted and reported

Zhang et al. (2014)

Static

API graph similarity scores

Manual

Naive Bayes

FNR = 0.02

FPR = 0.05

Recall = 0.93

Run-time

Off-device

None

Using semantically-aware dependency graphs lessens the reliance on syntax, helping to detect zero-day malware and potentially alleviating concept drift

McLaughlin et al. (2017)

Static

Opcode sequences

None

Convolutional Neural Network (CNN)

Accuracy = 0.87

Precision = 0.87

Recall = 0.85

F = 0.86

Run-time

Off-device

None

Using opcodes and deep learning eliminates the need for manual feature engineering, and also could alleviate concept drift. Thorough run-time performance evaluation is conducted and reported

Li et al. (2018)

Static

Boolean vector

SFS

SVM

Accuracy = 0.95

Precision = 0.97

Recall = 0.93

FPR = 2.36

FM = 0.95

Hybrid (feature extraction on-device, training and detection off-device)

None

Using only “significant” permissions reduces feature noise and model complexity, potentially leading to better accuracy and lower over-fitting

Wang et al. (2014)

Static

Boolean vector

MI, SFS, Manual

SVM, Decision Tree, Random Forest

Accuracy = 0.95

TPR = 0.94

FPR = 0.006

F = 0.90

ROC

Off-device

Decision tree rules to explain predictions

The permission ranking used can lead to reduced feature noise and improved accuracy. The model explanation approach is novel and can inspire future efforts

Yuan et al. (2014)

Static & Dynamic

Boolean vector

Manual

DBN

Accuracy = 0.96

Off-device

None

Novel use of deep learning leads to improved accuracy. Using both static and dynamic features can alleviate susceptibility to evasion attacks

Wu et al. (2012)

Static

Boolean vector

Manual

K-Means, EM, kNN, Naive Bayes

Accuracy = 0.93

Recall = 0.87

Precision = 0.96

F = 0.91

Off-device

None

The performed malware family detection can help human analysts. Classification is augmented with clustering for more accurate detection

Milosevic et al. (2017)

Static

Boolean & Integer vectors

None

K-Means, EM; Ensemble of SVM, Naive Bayes, Decision Tree

Precision = 0.89

Recall = 0.89

F = 0.89

Run-time

Off-device

None

The performed clustering can help with obtaining ground truth for unlabeled samples, based on their neighbors. It can also help with malware family detection and manual analysis

Demontis et al. (2019)

Static

Boolean vector

Manual

Secure SVM

Attack-resistance

ROC

Hybrid (training off-device, feature extraction and detection on-device)

Uses feature weights to explain predictions

The proposed uniformed feature weights lessens SVM’s reliance on any single feature, alleviating certain evasion attacks. Extensive attack evaluations is performed

Yerima (2013)

Static

Boolean vector

Mutual information

Bayesian Classification

Accuracy = 0.92

FPR = 0.63

TPR = 0.90

FNR = 0.94

AUC = 0.97

Off-device

None

The use of Bayesian model makes integrating expert knowledge easier. AUC is provided which allows for easier model comparison

Kim et al. (2019)

Static

Boolean vector, Similarity scores

Topological Data Analysis

Deep learning

Accuracy = 0.98

Recall = 0.99

Precision = 0.98

F = 0.99

Resilience to obfuscation attacks

Off-device

None

The great variety of static features used can improve detection accuracy. As does the use of deep learning. A thorough investigation of resilience to different types of attacks is performed and reported

Sahs and Khan (2012)

Static

Boolean vectors, Graphs

None

SVM

TPR

Precision

Recall

F-graph

Off-device

None

The novel use of SVM kernels to represent graphs and strings can be inspirational for future work. It can also improve accuracy and alleviate concept drift

Feng et al. (2018)

Dynamic

Boolean vector

Chi-square

Ensemble: Stacking of SVM, Decision Tree, Extra Trees, Random Forest, Boosted Tree

Accuracy = 0.97

Precision = 0.95

TPR = 0.97

FPR = 0.016

AUC = 0.97

Off-device

None

Provides novel insight into the use of ensembles for Android malware detection. A comparison of different ensembling approaches is also provided, showing advantage for stacking. Provides evidence for the unsuitability of kNN for Android malware detection

Zhu et al. (2018)

Static

Boolean vector, PCA

TF-IDF

Rotation Forest

Sensitivity = 0.88

Precision = 0.88

Accuracy = 0.88

AUC = 0.89

Off-device

None

Use of Rotation Forest for Android malware detection can improve accuracy over individual models. However, there might be a performance penalty

Zhang et al. (2018)

Static

Boolean vector

None

CNN

Precision = 0.96

Recall = 0.98

Accuracy = 0.97

F = 0.97

Run-time

Off-device

None

The use of a complex neural network architecture like CNN leads to improved accuracy and help with Zero-day malware detection

Yerima et al. (2015)

Static

Boolean vector

Manual

Random Forest

TPR = 0.97

TNR = 0.97

FPR = 0.02

Accuracy = 0.97

Error rate = 0.02

AUC = 0.99

Off-device

None

Use of ensembles can help with detection of Zero-day malware. Features are extracted from both Manifest and DEX , increasing their diversity and potentially alleviating concept drift

Yerima et al. (2014)

Static

Boolean vector

Manual

Ensemble: Decision Tree, Logistic Regression (LR), Naive Bayes (NB)

TPR = 0.97

TNR = 0.97

FPR = 0.03

FNR = 0.02

Accuracy = 0.97

AUC = 0.95

Off-device

None

A thorough investigation of the effectiveness of different ensembling techniques is performed. Ensembling can also improve zero-day detection due to model diversity

Xu et al. (2018)

Static

Boolean and Bytecode vectors

None

MLP

Accuracy = 0.97

TPR = 0.97

FPR = 0.02

Run-time

Off-device

None

The two-layered detection design can improve performance without loosing accuracy. Use of deep learning reduces the need for manual feature engineering. An investigation of the resilience of the model against different attacks is reported

Wu and Hung (2014)

Dynamic

Boolean vector, 2-grams

Manual

SVM

Accuracy = 0.86

F = 0.85

Recall = 0.82

Precision = 0.9

FPR = 0.1

FNR = 0.18

Off-device

None

Uses APE, a complex input generation scheme for dynamic analysis, as opposed to simplistic random models used by prior literature. This can improve code coverage and detection accuracy

Wang et al. (2016)

Static

Boolean vector

None

DBN

Precision = 0.93

Recall = 0.94

F = 0.93

Off-device

None

Use of deep learning can improve detection accuracy and eliminate the need for manual feature engineering

Karbab et al. (2018)

Static

Vector sequence of API calls

None

CNN

F = 0.96

Precision = 0.96

Recall = 0.96

FPR = 0.031

Family detection

Concept drift

Attack resilience

Run-time

Hybrid (feature extraction on-device; training and detection off-device)

None

Provides a thorough requirement analysis for Android malware detection, which clearly lays out expectations from such system. This allows for better comparison of different solutions proposed by literature. Also, all API calls are considered for analysis, not just a subset, as done by prior work

Aafer et al. (2013)

Static

Boolean vector

Manual

Decision Tree

Accuracy \(\sim\) 99

TPR \(\sim\) 97

TNR \(\sim\) 100

Run-time

Off-device

None

Provides a novel way of extracting API calls from DEX files. High run-time performance which leads to increased practicality

Burguera et al. (2011)

Dynamic

Integer vector

Manual

K-Means

Detection rate = 0.85 \(\sim\) 1.0

Hybrid (feature extraction on-device; training and detection off-device)

None

Proposed an approach, which compares execution traces of different versions of an app, to detect re-packaged malware (e.g., Trojans)

Dini et al. (2012)

Dynamic

Integer vector

Manual

kNN

FPR = 0.001

Family detection

Run-time

On-device

None

The approach makes novel use of on-device dynamic analysis for anomaly-based Android malware detection

Peiravian and Zhu (2013)

Static

Boolean vector

None

Ensemble: Bagging with SVM and Decision Tree

Accuracy = 0.96

Precision = 0.95

Recall = 0.94

AUC = 0.96

Off-device

None

Provides comparison of use of permissions and API calls for malware detection. Ensemble learning can improve zero-day detection

Gascon et al. (2013)

Static

Graph (Integer vector)

None

SVM

FPR = 0.01

Detection rate = 0.89

ROC

Off-device

Using feature weights to explain predictions

Proposes a new way of labeling Dalvik functions for easier call graph generation. Makes novel use of kernels for embedding call graphs for digestion by ML models

Saracino et al. (2018)

Static & Dynamic

Integer vector

Manual

kNN

Accuracy = 0.96

FPR = 0.00001

Run-time

Battery

On-device

None

Uses a combination of on-device dynamic ML-based detection and signature-based techniques to achieve higher accuracy. Also uses metadata from market listings as features

Sanz et al. (2013)

Static

Boolean vector

None

LR, NB, BayesNet, Decision Tree, Random Tree, Random Forest

TPR = 0.91

FPR = 0.19

AUC = 0.92

Accuracy = 0.86

ROC

Off-device

None

A pioneering work in the use of permissions for Android malware detection

Zarni Aung (2013)

Static

Boolean vector

Information gain

K-Means, Decision Tree, Random Forest, CART

TPR = 0.97

FPR = 0.15

Precision = 0.84

Recall = 0.97

ROC Area = 0.87

Off-device

None

Pioneering work in the use of clustering with permissions as features for Android malware detection. Makes novel use of hardware features to detect certain types of malware (e.g., those who use Camera or microphone for spying)

Yang et al. (2014)

Static

Boolean vector

Manual

NB, SVM, Decision Tree, Random Forest

Accuracy = 0.95

FPR = 0.4

Family detection

Run-time

Off-device

None

Novel use of behavioral graph for detecting “malicious behavior” in Android apps, as opposed to simply label them as malware or benign. Clustering is performed to detect malware families

Amos et al. (2013)

Dynamic

Boolean vector

None

Random Forest, NB, MLP, BayesNet, LR, Decision Tree

Accuracy = 0.91

TPR = 0.97

FPR = 0.31

Run-time performance

Hybrid (feature extraction on-device; training and detection off-device)

None

Proposes a distributed system for large-scale detection of Android malware. Dynamic analysis alleviates evasion by code obfuscation or dynamic loading

Lindorfer et al. (2015)

Static & Dynamic

Boolean vector

Fisher score

LR, SVM

Accuracy = 0.99

Recall = 0.98

Precision = 0.99

Commercial comparison

Concept drift

Off-device

Using F-score to find most discriminate features

Provides malice score for apps to better communicate risk, as opposed to binary malware/benign labeling. Hybrid analysis allows for more accurate detection

Shabtai et al. (2014)

Dynamic

Integer and boolean vectors

Manual

LR, Decision Tree, SVM, Gaussian Regression, Isotonic Regression

TPR = 0.8

FPR = 0

Accuracy = 0.87

Run-time

Hybrid (feature extraction on-device; training and detection off-device)

None

Makes novel use of network traffic patterns of apps for malware detection. Reports on a thorough investigation of the CPU/RAM/Storage overhead of the proposed solution for on-device deployment

Suarez-Tangil et al. (2017)

Static

Boolean vectors

Mean decrease impurity

Extra Trees

Accuracy = 99.64%

Family classification

Off device

None

Novel use of a high variety of features to combat obfuscation. Novel use of feature ranking for dimensionality reduction

Bakour and Ünver (2021)

Static

Grayscale Image

Manual

Random Forest, Decision trees, kNN, Ensembles

Accuracy = 0.98

Off-device

None

Pioneering work in the use of image representation for Android malware detection. A great variety of image feature extraction techniques are used

Casolare et al. (2021)

Static

Color Image

Manual

Random Forest, SVM, MLP, CNN

Accuracy = 0.86

Precision = 0.86

Recall = 0.86

Off-device

None

Combines dynamic analysis with color image representation for Android malware detection

Cai et al. (2018)

Dynamic

Integer vector

Manual

Random Forest

Precision = 0.97

Recall = 0.99

F1 = 0.98

RoC Curve

AUC = 0.98

Family classification

Concept drift

Off-device

None

Makes novel use of ICC Intents for dynamic detection of Android malware. Can handle reflection when detecting API and system calls. Evaluated concept drift

Taheri et al. (2020)

Static

Boolean vector

Manual

FNN, ANN, WANN, KMNN

Accuracy = 0.99

FPR = 0.005

AUC = .99

Off-device

None

Makes novel use ot the hamming distance of static binary features for detecting malware. Extensive comparison of the use of different features and ML algorithms