Feature | Model | |||||||
---|---|---|---|---|---|---|---|---|
Extraction | Representation | Selection | ML Approach | Evaluation | Deployment | Explanation | Contributions | |
Liu and Liu (2014) | Static | Boolean vector | Manual | Decision Tree | TPR = 0.813 FPR = 0.0046 Precision = 0.89 Accuracy = 0.98 | Off-device | None | Using “Used” permissions, instead of requested ones can reduce feature noise. Using a two-step detection process can increase performance |
Arp et al. (2014) | Static | Boolean vector | Manual | SVM | Recall = 0.94 Accuracy = 0.93 FPR = 0.01 Run-time | Hybrid (training off-device, feature extraction and detection on-device) | Uses feature weights to explain predictions | Emphasis on prediction explanation provides clarity to the user, increasing usability. Diversity of features used can alleviate concept drift |
Yuan et al. (2016) | Static & Dynamic | Boolean vector | Manual | Deep Belief Network (DBN) | Precision = 0.94 Accuracy = 0.93 FPR = 0.01 | Hybrid (training is off-device, feature extraction and detection is on-device) | Uses feature weights to explain predictions | Using deep learning for Android malware shows promise. Results show that there is resistance to re-packaged malware |
Alzaylaee et al. (2020) | Static & Dynamic | Boolean vector | Information Gain | Multilayer perceptron (MLP) | TPR = 0.98 TNR = 0.91 FPR = 0.09 FNR = 0.02 Accuracy = 0.95 F = 0.96 AUC = 0.99 Run-time | Off-device | None | Using stateful input generation for dynamic analysis has improved code coverage, when compared to other works. Clear run-time performance evaluation is conducted and reported |
Zhang et al. (2014) | Static | API graph similarity scores | Manual | Naive Bayes | FNR = 0.02 FPR = 0.05 Recall = 0.93 Run-time | Off-device | None | Using semantically-aware dependency graphs lessens the reliance on syntax, helping to detect zero-day malware and potentially alleviating concept drift |
McLaughlin et al. (2017) | Static | Opcode sequences | None | Convolutional Neural Network (CNN) | Accuracy = 0.87 Precision = 0.87 Recall = 0.85 F = 0.86 Run-time | Off-device | None | Using opcodes and deep learning eliminates the need for manual feature engineering, and also could alleviate concept drift. Thorough run-time performance evaluation is conducted and reported |
Li et al. (2018) | Static | Boolean vector | SFS | SVM | Accuracy = 0.95 Precision = 0.97 Recall = 0.93 FPR = 2.36 FM = 0.95 | Hybrid (feature extraction on-device, training and detection off-device) | None | Using only “significant” permissions reduces feature noise and model complexity, potentially leading to better accuracy and lower over-fitting |
Wang et al. (2014) | Static | Boolean vector | MI, SFS, Manual | SVM, Decision Tree, Random Forest | Accuracy = 0.95 TPR = 0.94 FPR = 0.006 F = 0.90 ROC | Off-device | Decision tree rules to explain predictions | The permission ranking used can lead to reduced feature noise and improved accuracy. The model explanation approach is novel and can inspire future efforts |
Yuan et al. (2014) | Static & Dynamic | Boolean vector | Manual | DBN | Accuracy = 0.96 | Off-device | None | Novel use of deep learning leads to improved accuracy. Using both static and dynamic features can alleviate susceptibility to evasion attacks |
Wu et al. (2012) | Static | Boolean vector | Manual | K-Means, EM, kNN, Naive Bayes | Accuracy = 0.93 Recall = 0.87 Precision = 0.96 F = 0.91 | Off-device | None | The performed malware family detection can help human analysts. Classification is augmented with clustering for more accurate detection |
Milosevic et al. (2017) | Static | Boolean & Integer vectors | None | K-Means, EM; Ensemble of SVM, Naive Bayes, Decision Tree | Precision = 0.89 Recall = 0.89 F = 0.89 Run-time | Off-device | None | The performed clustering can help with obtaining ground truth for unlabeled samples, based on their neighbors. It can also help with malware family detection and manual analysis |
Demontis et al. (2019) | Static | Boolean vector | Manual | Secure SVM | Attack-resistance ROC | Hybrid (training off-device, feature extraction and detection on-device) | Uses feature weights to explain predictions | The proposed uniformed feature weights lessens SVM’s reliance on any single feature, alleviating certain evasion attacks. Extensive attack evaluations is performed |
Yerima (2013) | Static | Boolean vector | Mutual information | Bayesian Classification | Accuracy = 0.92 FPR = 0.63 TPR = 0.90 FNR = 0.94 AUC = 0.97 | Off-device | None | The use of Bayesian model makes integrating expert knowledge easier. AUC is provided which allows for easier model comparison |
Kim et al. (2019) | Static | Boolean vector, Similarity scores | Topological Data Analysis | Deep learning | Accuracy = 0.98 Recall = 0.99 Precision = 0.98 F = 0.99 Resilience to obfuscation attacks | Off-device | None | The great variety of static features used can improve detection accuracy. As does the use of deep learning. A thorough investigation of resilience to different types of attacks is performed and reported |
Sahs and Khan (2012) | Static | Boolean vectors, Graphs | None | SVM | TPR Precision Recall F-graph | Off-device | None | The novel use of SVM kernels to represent graphs and strings can be inspirational for future work. It can also improve accuracy and alleviate concept drift |
Feng et al. (2018) | Dynamic | Boolean vector | Chi-square | Ensemble: Stacking of SVM, Decision Tree, Extra Trees, Random Forest, Boosted Tree | Accuracy = 0.97 Precision = 0.95 TPR = 0.97 FPR = 0.016 AUC = 0.97 | Off-device | None | Provides novel insight into the use of ensembles for Android malware detection. A comparison of different ensembling approaches is also provided, showing advantage for stacking. Provides evidence for the unsuitability of kNN for Android malware detection |
Zhu et al. (2018) | Static | Boolean vector, PCA | TF-IDF | Rotation Forest | Sensitivity = 0.88 Precision = 0.88 Accuracy = 0.88 AUC = 0.89 | Off-device | None | Use of Rotation Forest for Android malware detection can improve accuracy over individual models. However, there might be a performance penalty |
Zhang et al. (2018) | Static | Boolean vector | None | CNN | Precision = 0.96 Recall = 0.98 Accuracy = 0.97 F = 0.97 Run-time | Off-device | None | The use of a complex neural network architecture like CNN leads to improved accuracy and help with Zero-day malware detection |
Yerima et al. (2015) | Static | Boolean vector | Manual | Random Forest | TPR = 0.97 TNR = 0.97 FPR = 0.02 Accuracy = 0.97 Error rate = 0.02 AUC = 0.99 | Off-device | None | Use of ensembles can help with detection of Zero-day malware. Features are extracted from both Manifest and DEX , increasing their diversity and potentially alleviating concept drift |
Yerima et al. (2014) | Static | Boolean vector | Manual | Ensemble: Decision Tree, Logistic Regression (LR), Naive Bayes (NB) | TPR = 0.97 TNR = 0.97 FPR = 0.03 FNR = 0.02 Accuracy = 0.97 AUC = 0.95 | Off-device | None | A thorough investigation of the effectiveness of different ensembling techniques is performed. Ensembling can also improve zero-day detection due to model diversity |
Xu et al. (2018) | Static | Boolean and Bytecode vectors | None | MLP | Accuracy = 0.97 TPR = 0.97 FPR = 0.02 Run-time | Off-device | None | The two-layered detection design can improve performance without loosing accuracy. Use of deep learning reduces the need for manual feature engineering. An investigation of the resilience of the model against different attacks is reported |
Wu and Hung (2014) | Dynamic | Boolean vector, 2-grams | Manual | SVM | Accuracy = 0.86 F = 0.85 Recall = 0.82 Precision = 0.9 FPR = 0.1 FNR = 0.18 | Off-device | None | Uses APE, a complex input generation scheme for dynamic analysis, as opposed to simplistic random models used by prior literature. This can improve code coverage and detection accuracy |
Wang et al. (2016) | Static | Boolean vector | None | DBN | Precision = 0.93 Recall = 0.94 F = 0.93 | Off-device | None | Use of deep learning can improve detection accuracy and eliminate the need for manual feature engineering |
Karbab et al. (2018) | Static | Vector sequence of API calls | None | CNN | F = 0.96 Precision = 0.96 Recall = 0.96 FPR = 0.031 Family detection Concept drift Attack resilience Run-time | Hybrid (feature extraction on-device; training and detection off-device) | None | Provides a thorough requirement analysis for Android malware detection, which clearly lays out expectations from such system. This allows for better comparison of different solutions proposed by literature. Also, all API calls are considered for analysis, not just a subset, as done by prior work |
Aafer et al. (2013) | Static | Boolean vector | Manual | Decision Tree | Accuracy \(\sim\) 99 TPR \(\sim\) 97 TNR \(\sim\) 100 Run-time | Off-device | None | Provides a novel way of extracting API calls from DEX files. High run-time performance which leads to increased practicality |
Burguera et al. (2011) | Dynamic | Integer vector | Manual | K-Means | Detection rate = 0.85 \(\sim\) 1.0 | Hybrid (feature extraction on-device; training and detection off-device) | None | Proposed an approach, which compares execution traces of different versions of an app, to detect re-packaged malware (e.g., Trojans) |
Dini et al. (2012) | Dynamic | Integer vector | Manual | kNN | FPR = 0.001 Family detection Run-time | On-device | None | The approach makes novel use of on-device dynamic analysis for anomaly-based Android malware detection |
Peiravian and Zhu (2013) | Static | Boolean vector | None | Ensemble: Bagging with SVM and Decision Tree | Accuracy = 0.96 Precision = 0.95 Recall = 0.94 AUC = 0.96 | Off-device | None | Provides comparison of use of permissions and API calls for malware detection. Ensemble learning can improve zero-day detection |
Gascon et al. (2013) | Static | Graph (Integer vector) | None | SVM | FPR = 0.01 Detection rate = 0.89 ROC | Off-device | Using feature weights to explain predictions | Proposes a new way of labeling Dalvik functions for easier call graph generation. Makes novel use of kernels for embedding call graphs for digestion by ML models |
Saracino et al. (2018) | Static & Dynamic | Integer vector | Manual | kNN | Accuracy = 0.96 FPR = 0.00001 Run-time Battery | On-device | None | Uses a combination of on-device dynamic ML-based detection and signature-based techniques to achieve higher accuracy. Also uses metadata from market listings as features |
Sanz et al. (2013) | Static | Boolean vector | None | LR, NB, BayesNet, Decision Tree, Random Tree, Random Forest | TPR = 0.91 FPR = 0.19 AUC = 0.92 Accuracy = 0.86 ROC | Off-device | None | A pioneering work in the use of permissions for Android malware detection |
Zarni Aung (2013) | Static | Boolean vector | Information gain | K-Means, Decision Tree, Random Forest, CART | TPR = 0.97 FPR = 0.15 Precision = 0.84 Recall = 0.97 ROC Area = 0.87 | Off-device | None | Pioneering work in the use of clustering with permissions as features for Android malware detection. Makes novel use of hardware features to detect certain types of malware (e.g., those who use Camera or microphone for spying) |
Yang et al. (2014) | Static | Boolean vector | Manual | NB, SVM, Decision Tree, Random Forest | Accuracy = 0.95 FPR = 0.4 Family detection Run-time | Off-device | None | Novel use of behavioral graph for detecting “malicious behavior” in Android apps, as opposed to simply label them as malware or benign. Clustering is performed to detect malware families |
Amos et al. (2013) | Dynamic | Boolean vector | None | Random Forest, NB, MLP, BayesNet, LR, Decision Tree | Accuracy = 0.91 TPR = 0.97 FPR = 0.31 Run-time performance | Hybrid (feature extraction on-device; training and detection off-device) | None | Proposes a distributed system for large-scale detection of Android malware. Dynamic analysis alleviates evasion by code obfuscation or dynamic loading |
Lindorfer et al. (2015) | Static & Dynamic | Boolean vector | Fisher score | LR, SVM | Accuracy = 0.99 Recall = 0.98 Precision = 0.99 Commercial comparison Concept drift | Off-device | Using F-score to find most discriminate features | Provides malice score for apps to better communicate risk, as opposed to binary malware/benign labeling. Hybrid analysis allows for more accurate detection |
Shabtai et al. (2014) | Dynamic | Integer and boolean vectors | Manual | LR, Decision Tree, SVM, Gaussian Regression, Isotonic Regression | TPR = 0.8 FPR = 0 Accuracy = 0.87 Run-time | Hybrid (feature extraction on-device; training and detection off-device) | None | Makes novel use of network traffic patterns of apps for malware detection. Reports on a thorough investigation of the CPU/RAM/Storage overhead of the proposed solution for on-device deployment |
Suarez-Tangil et al. (2017) | Static | Boolean vectors | Mean decrease impurity | Extra Trees | Accuracy = 99.64% Family classification | Off device | None | Novel use of a high variety of features to combat obfuscation. Novel use of feature ranking for dimensionality reduction |
Bakour and Ünver (2021) | Static | Grayscale Image | Manual | Random Forest, Decision trees, kNN, Ensembles | Accuracy = 0.98 | Off-device | None | Pioneering work in the use of image representation for Android malware detection. A great variety of image feature extraction techniques are used |
Casolare et al. (2021) | Static | Color Image | Manual | Random Forest, SVM, MLP, CNN | Accuracy = 0.86 Precision = 0.86 Recall = 0.86 | Off-device | None | Combines dynamic analysis with color image representation for Android malware detection |
Cai et al. (2018) | Dynamic | Integer vector | Manual | Random Forest | Precision = 0.97 Recall = 0.99 F1 = 0.98 RoC Curve AUC = 0.98 Family classification Concept drift | Off-device | None | Makes novel use of ICC Intents for dynamic detection of Android malware. Can handle reflection when detecting API and system calls. Evaluated concept drift |
Taheri et al. (2020) | Static | Boolean vector | Manual | FNN, ANN, WANN, KMNN | Accuracy = 0.99 FPR = 0.005 AUC = .99 | Off-device | None | Makes novel use ot the hamming distance of static binary features for detecting malware. Extensive comparison of the use of different features and ML algorithms |