Skip to main content

Table 3 Comparison of ML approaches proposed by the surveyed papers, in terms of feature extraction and representation; model creation, deployment and use; and benefits and drawbacks

From: On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

  Feature   Model  
Extraction Representation Selection ML Approach Evaluation Deployment Explanation Contributions
Liu and Liu (2014) Static Boolean vector Manual Decision Tree TPR = 0.813
FPR = 0.0046
Precision = 0.89
Accuracy = 0.98
Off-device None Using “Used” permissions, instead of requested ones can reduce feature noise. Using a two-step detection process can increase performance
Arp et al. (2014) Static Boolean vector Manual SVM Recall = 0.94
Accuracy = 0.93
FPR = 0.01
Run-time
Hybrid (training off-device, feature extraction and detection on-device) Uses feature weights to explain predictions Emphasis on prediction explanation provides clarity to the user, increasing usability. Diversity of features used can alleviate concept drift
Yuan et al. (2016) Static & Dynamic Boolean vector Manual Deep Belief Network (DBN) Precision = 0.94
Accuracy = 0.93
FPR = 0.01
Hybrid (training is off-device, feature extraction and detection is on-device) Uses feature weights to explain predictions Using deep learning for Android malware shows promise. Results show that there is resistance to re-packaged malware
Alzaylaee et al. (2020) Static & Dynamic Boolean vector Information Gain Multilayer perceptron (MLP) TPR = 0.98
TNR = 0.91
FPR = 0.09
FNR = 0.02
Accuracy = 0.95
F = 0.96
AUC = 0.99
Run-time
Off-device None Using stateful input generation for dynamic analysis has improved code coverage, when compared to other works. Clear run-time performance evaluation is conducted and reported
Zhang et al. (2014) Static API graph similarity scores Manual Naive Bayes FNR = 0.02
FPR = 0.05
Recall = 0.93
Run-time
Off-device None Using semantically-aware dependency graphs lessens the reliance on syntax, helping to detect zero-day malware and potentially alleviating concept drift
McLaughlin et al. (2017) Static Opcode sequences None Convolutional Neural Network (CNN) Accuracy = 0.87
Precision = 0.87
Recall = 0.85
F = 0.86
Run-time
Off-device None Using opcodes and deep learning eliminates the need for manual feature engineering, and also could alleviate concept drift. Thorough run-time performance evaluation is conducted and reported
Li et al. (2018) Static Boolean vector SFS SVM Accuracy = 0.95
Precision = 0.97
Recall = 0.93
FPR = 2.36
FM = 0.95
Hybrid (feature extraction on-device, training and detection off-device) None Using only “significant” permissions reduces feature noise and model complexity, potentially leading to better accuracy and lower over-fitting
Wang et al. (2014) Static Boolean vector MI, SFS, Manual SVM, Decision Tree, Random Forest Accuracy = 0.95
TPR = 0.94
FPR = 0.006
F = 0.90
ROC
Off-device Decision tree rules to explain predictions The permission ranking used can lead to reduced feature noise and improved accuracy. The model explanation approach is novel and can inspire future efforts
Yuan et al. (2014) Static & Dynamic Boolean vector Manual DBN Accuracy = 0.96 Off-device None Novel use of deep learning leads to improved accuracy. Using both static and dynamic features can alleviate susceptibility to evasion attacks
Wu et al. (2012) Static Boolean vector Manual K-Means, EM, kNN, Naive Bayes Accuracy = 0.93
Recall = 0.87
Precision = 0.96
F = 0.91
Off-device None The performed malware family detection can help human analysts. Classification is augmented with clustering for more accurate detection
Milosevic et al. (2017) Static Boolean & Integer vectors None K-Means, EM; Ensemble of SVM, Naive Bayes, Decision Tree Precision = 0.89
Recall = 0.89
F = 0.89
Run-time
Off-device None The performed clustering can help with obtaining ground truth for unlabeled samples, based on their neighbors. It can also help with malware family detection and manual analysis
Demontis et al. (2019) Static Boolean vector Manual Secure SVM Attack-resistance
ROC
Hybrid (training off-device, feature extraction and detection on-device) Uses feature weights to explain predictions The proposed uniformed feature weights lessens SVM’s reliance on any single feature, alleviating certain evasion attacks. Extensive attack evaluations is performed
Yerima (2013) Static Boolean vector Mutual information Bayesian Classification Accuracy = 0.92
FPR = 0.63
TPR = 0.90
FNR = 0.94
AUC = 0.97
Off-device None The use of Bayesian model makes integrating expert knowledge easier. AUC is provided which allows for easier model comparison
Kim et al. (2019) Static Boolean vector, Similarity scores Topological Data Analysis Deep learning Accuracy = 0.98
Recall = 0.99
Precision = 0.98
F = 0.99
Resilience to obfuscation attacks
Off-device None The great variety of static features used can improve detection accuracy. As does the use of deep learning. A thorough investigation of resilience to different types of attacks is performed and reported
Sahs and Khan (2012) Static Boolean vectors, Graphs None SVM TPR
Precision
Recall
F-graph
Off-device None The novel use of SVM kernels to represent graphs and strings can be inspirational for future work. It can also improve accuracy and alleviate concept drift
Feng et al. (2018) Dynamic Boolean vector Chi-square Ensemble: Stacking of SVM, Decision Tree, Extra Trees, Random Forest, Boosted Tree Accuracy = 0.97
Precision = 0.95
TPR = 0.97
FPR = 0.016
AUC = 0.97
Off-device None Provides novel insight into the use of ensembles for Android malware detection. A comparison of different ensembling approaches is also provided, showing advantage for stacking. Provides evidence for the unsuitability of kNN for Android malware detection
Zhu et al. (2018) Static Boolean vector, PCA TF-IDF Rotation Forest Sensitivity = 0.88
Precision = 0.88
Accuracy = 0.88
AUC = 0.89
Off-device None Use of Rotation Forest for Android malware detection can improve accuracy over individual models. However, there might be a performance penalty
Zhang et al. (2018) Static Boolean vector None CNN Precision = 0.96
Recall = 0.98
Accuracy = 0.97
F = 0.97
Run-time
Off-device None The use of a complex neural network architecture like CNN leads to improved accuracy and help with Zero-day malware detection
Yerima et al. (2015) Static Boolean vector Manual Random Forest TPR = 0.97
TNR = 0.97
FPR = 0.02
Accuracy = 0.97
Error rate = 0.02
AUC = 0.99
Off-device None Use of ensembles can help with detection of Zero-day malware. Features are extracted from both Manifest and DEX , increasing their diversity and potentially alleviating concept drift
Yerima et al. (2014) Static Boolean vector Manual Ensemble: Decision Tree, Logistic Regression (LR), Naive Bayes (NB) TPR = 0.97
TNR = 0.97
FPR = 0.03
FNR = 0.02
Accuracy = 0.97
AUC = 0.95
Off-device None A thorough investigation of the effectiveness of different ensembling techniques is performed. Ensembling can also improve zero-day detection due to model diversity
Xu et al. (2018) Static Boolean and Bytecode vectors None MLP Accuracy = 0.97
TPR = 0.97
FPR = 0.02
Run-time
Off-device None The two-layered detection design can improve performance without loosing accuracy. Use of deep learning reduces the need for manual feature engineering. An investigation of the resilience of the model against different attacks is reported
Wu and Hung (2014) Dynamic Boolean vector, 2-grams Manual SVM Accuracy = 0.86
F = 0.85
Recall = 0.82
Precision = 0.9
FPR = 0.1
FNR = 0.18
Off-device None Uses APE, a complex input generation scheme for dynamic analysis, as opposed to simplistic random models used by prior literature. This can improve code coverage and detection accuracy
Wang et al. (2016) Static Boolean vector None DBN Precision = 0.93
Recall = 0.94
F = 0.93
Off-device None Use of deep learning can improve detection accuracy and eliminate the need for manual feature engineering
Karbab et al. (2018) Static Vector sequence of API calls None CNN F = 0.96
Precision = 0.96
Recall = 0.96
FPR = 0.031
Family detection
Concept drift
Attack resilience
Run-time
Hybrid (feature extraction on-device; training and detection off-device) None Provides a thorough requirement analysis for Android malware detection, which clearly lays out expectations from such system. This allows for better comparison of different solutions proposed by literature. Also, all API calls are considered for analysis, not just a subset, as done by prior work
Aafer et al. (2013) Static Boolean vector Manual Decision Tree Accuracy \(\sim\) 99
TPR \(\sim\) 97
TNR \(\sim\) 100
Run-time
Off-device None Provides a novel way of extracting API calls from DEX files. High run-time performance which leads to increased practicality
Burguera et al. (2011) Dynamic Integer vector Manual K-Means Detection rate = 0.85 \(\sim\) 1.0 Hybrid (feature extraction on-device; training and detection off-device) None Proposed an approach, which compares execution traces of different versions of an app, to detect re-packaged malware (e.g., Trojans)
Dini et al. (2012) Dynamic Integer vector Manual kNN FPR = 0.001
Family detection
Run-time
On-device None The approach makes novel use of on-device dynamic analysis for anomaly-based Android malware detection
Peiravian and Zhu (2013) Static Boolean vector None Ensemble: Bagging with SVM and Decision Tree Accuracy = 0.96
Precision = 0.95
Recall = 0.94
AUC = 0.96
Off-device None Provides comparison of use of permissions and API calls for malware detection. Ensemble learning can improve zero-day detection
Gascon et al. (2013) Static Graph (Integer vector) None SVM FPR = 0.01
Detection rate = 0.89
ROC
Off-device Using feature weights to explain predictions Proposes a new way of labeling Dalvik functions for easier call graph generation. Makes novel use of kernels for embedding call graphs for digestion by ML models
Saracino et al. (2018) Static & Dynamic Integer vector Manual kNN Accuracy = 0.96
FPR = 0.00001
Run-time
Battery
On-device None Uses a combination of on-device dynamic ML-based detection and signature-based techniques to achieve higher accuracy. Also uses metadata from market listings as features
Sanz et al. (2013) Static Boolean vector None LR, NB, BayesNet, Decision Tree, Random Tree, Random Forest TPR = 0.91
FPR = 0.19
AUC = 0.92
Accuracy = 0.86
ROC
Off-device None A pioneering work in the use of permissions for Android malware detection
Zarni Aung (2013) Static Boolean vector Information gain K-Means, Decision Tree, Random Forest, CART TPR = 0.97
FPR = 0.15
Precision = 0.84
Recall = 0.97
ROC Area = 0.87
Off-device None Pioneering work in the use of clustering with permissions as features for Android malware detection. Makes novel use of hardware features to detect certain types of malware (e.g., those who use Camera or microphone for spying)
Yang et al. (2014) Static Boolean vector Manual NB, SVM, Decision Tree, Random Forest Accuracy = 0.95
FPR = 0.4
Family detection
Run-time
Off-device None Novel use of behavioral graph for detecting “malicious behavior” in Android apps, as opposed to simply label them as malware or benign. Clustering is performed to detect malware families
Amos et al. (2013) Dynamic Boolean vector None Random Forest, NB, MLP, BayesNet, LR, Decision Tree Accuracy = 0.91
TPR = 0.97
FPR = 0.31
Run-time performance
Hybrid (feature extraction on-device; training and detection off-device) None Proposes a distributed system for large-scale detection of Android malware. Dynamic analysis alleviates evasion by code obfuscation or dynamic loading
Lindorfer et al. (2015) Static & Dynamic Boolean vector Fisher score LR, SVM Accuracy = 0.99
Recall = 0.98
Precision = 0.99
Commercial comparison
Concept drift
Off-device Using F-score to find most discriminate features Provides malice score for apps to better communicate risk, as opposed to binary malware/benign labeling. Hybrid analysis allows for more accurate detection
Shabtai et al. (2014) Dynamic Integer and boolean vectors Manual LR, Decision Tree, SVM, Gaussian Regression, Isotonic Regression TPR = 0.8
FPR = 0
Accuracy = 0.87
Run-time
Hybrid (feature extraction on-device; training and detection off-device) None Makes novel use of network traffic patterns of apps for malware detection. Reports on a thorough investigation of the CPU/RAM/Storage overhead of the proposed solution for on-device deployment
Suarez-Tangil et al. (2017) Static Boolean vectors Mean decrease impurity Extra Trees Accuracy = 99.64%
Family classification
Off device None Novel use of a high variety of features to combat obfuscation. Novel use of feature ranking for dimensionality reduction
Bakour and Ünver (2021) Static Grayscale Image Manual Random Forest, Decision trees, kNN, Ensembles Accuracy = 0.98 Off-device None Pioneering work in the use of image representation for Android malware detection. A great variety of image feature extraction techniques are used
Casolare et al. (2021) Static Color Image Manual Random Forest, SVM, MLP, CNN Accuracy = 0.86
Precision = 0.86
Recall = 0.86
Off-device None Combines dynamic analysis with color image representation for Android malware detection
Cai et al. (2018) Dynamic Integer vector Manual Random Forest Precision = 0.97
Recall = 0.99
F1 = 0.98
RoC Curve
AUC = 0.98
Family classification
Concept drift
Off-device None Makes novel use of ICC Intents for dynamic detection of Android malware. Can handle reflection when detecting API and system calls. Evaluated concept drift
Taheri et al. (2020) Static Boolean vector Manual FNN, ANN, WANN, KMNN Accuracy = 0.99
FPR = 0.005
AUC = .99
Off-device None Makes novel use ot the hamming distance of static binary features for detecting malware. Extensive comparison of the use of different features and ML algorithms