On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

Mehrabi Koushki, Masoud; AbuAlhaol, Ibrahim; Raju, Anandharaju Durai; Zhou, Yang; Giagone, Ronnie Salvador; Shengqiang, Huang

doi:10.1186/s42400-022-00119-8

Cybersecurity

Table 3 Comparison of ML approaches proposed by the surveyed papers, in terms of feature extraction and representation; model creation, deployment and use; and benefits and drawbacks

From: On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

	Feature				Model
	Extraction	Representation	Selection	ML Approach	Evaluation	Deployment	Explanation	Contributions
Liu and Liu (2014)	Static	Boolean vector	Manual	Decision Tree	TPR = 0.813 FPR = 0.0046 Precision = 0.89 Accuracy = 0.98	Off-device	None	Using “Used” permissions, instead of requested ones can reduce feature noise. Using a two-step detection process can increase performance
Arp et al. (2014)	Static	Boolean vector	Manual	SVM	Recall = 0.94 Accuracy = 0.93 FPR = 0.01 Run-time	Hybrid (training off-device, feature extraction and detection on-device)	Uses feature weights to explain predictions	Emphasis on prediction explanation provides clarity to the user, increasing usability. Diversity of features used can alleviate concept drift
Yuan et al. (2016)	Static & Dynamic	Boolean vector	Manual	Deep Belief Network (DBN)	Precision = 0.94 Accuracy = 0.93 FPR = 0.01	Hybrid (training is off-device, feature extraction and detection is on-device)	Uses feature weights to explain predictions	Using deep learning for Android malware shows promise. Results show that there is resistance to re-packaged malware
Alzaylaee et al. (2020)	Static & Dynamic	Boolean vector	Information Gain	Multilayer perceptron (MLP)	TPR = 0.98 TNR = 0.91 FPR = 0.09 FNR = 0.02 Accuracy = 0.95 F = 0.96 AUC = 0.99 Run-time	Off-device	None	Using stateful input generation for dynamic analysis has improved code coverage, when compared to other works. Clear run-time performance evaluation is conducted and reported
Zhang et al. (2014)	Static	API graph similarity scores	Manual	Naive Bayes	FNR = 0.02 FPR = 0.05 Recall = 0.93 Run-time	Off-device	None	Using semantically-aware dependency graphs lessens the reliance on syntax, helping to detect zero-day malware and potentially alleviating concept drift
McLaughlin et al. (2017)	Static	Opcode sequences	None	Convolutional Neural Network (CNN)	Accuracy = 0.87 Precision = 0.87 Recall = 0.85 F = 0.86 Run-time	Off-device	None	Using opcodes and deep learning eliminates the need for manual feature engineering, and also could alleviate concept drift. Thorough run-time performance evaluation is conducted and reported
Li et al. (2018)	Static	Boolean vector	SFS	SVM	Accuracy = 0.95 Precision = 0.97 Recall = 0.93 FPR = 2.36 FM = 0.95	Hybrid (feature extraction on-device, training and detection off-device)	None	Using only “significant” permissions reduces feature noise and model complexity, potentially leading to better accuracy and lower over-fitting
Wang et al. (2014)	Static	Boolean vector	MI, SFS, Manual	SVM, Decision Tree, Random Forest	Accuracy = 0.95 TPR = 0.94 FPR = 0.006 F = 0.90 ROC	Off-device	Decision tree rules to explain predictions	The permission ranking used can lead to reduced feature noise and improved accuracy. The model explanation approach is novel and can inspire future efforts
Yuan et al. (2014)	Static & Dynamic	Boolean vector	Manual	DBN	Accuracy = 0.96	Off-device	None	Novel use of deep learning leads to improved accuracy. Using both static and dynamic features can alleviate susceptibility to evasion attacks
Wu et al. (2012)	Static	Boolean vector	Manual	K-Means, EM, kNN, Naive Bayes	Accuracy = 0.93 Recall = 0.87 Precision = 0.96 F = 0.91	Off-device	None	The performed malware family detection can help human analysts. Classification is augmented with clustering for more accurate detection
Milosevic et al. (2017)	Static	Boolean & Integer vectors	None	K-Means, EM; Ensemble of SVM, Naive Bayes, Decision Tree	Precision = 0.89 Recall = 0.89 F = 0.89 Run-time	Off-device	None	The performed clustering can help with obtaining ground truth for unlabeled samples, based on their neighbors. It can also help with malware family detection and manual analysis
Demontis et al. (2019)	Static	Boolean vector	Manual	Secure SVM	Attack-resistance ROC	Hybrid (training off-device, feature extraction and detection on-device)	Uses feature weights to explain predictions	The proposed uniformed feature weights lessens SVM’s reliance on any single feature, alleviating certain evasion attacks. Extensive attack evaluations is performed
Yerima (2013)	Static	Boolean vector	Mutual information	Bayesian Classification	Accuracy = 0.92 FPR = 0.63 TPR = 0.90 FNR = 0.94 AUC = 0.97	Off-device	None	The use of Bayesian model makes integrating expert knowledge easier. AUC is provided which allows for easier model comparison
Kim et al. (2019)	Static	Boolean vector, Similarity scores	Topological Data Analysis	Deep learning	Accuracy = 0.98 Recall = 0.99 Precision = 0.98 F = 0.99 Resilience to obfuscation attacks	Off-device	None	The great variety of static features used can improve detection accuracy. As does the use of deep learning. A thorough investigation of resilience to different types of attacks is performed and reported
Sahs and Khan (2012)	Static	Boolean vectors, Graphs	None	SVM	TPR Precision Recall F-graph	Off-device	None	The novel use of SVM kernels to represent graphs and strings can be inspirational for future work. It can also improve accuracy and alleviate concept drift
Feng et al. (2018)	Dynamic	Boolean vector	Chi-square	Ensemble: Stacking of SVM, Decision Tree, Extra Trees, Random Forest, Boosted Tree	Accuracy = 0.97 Precision = 0.95 TPR = 0.97 FPR = 0.016 AUC = 0.97	Off-device	None	Provides novel insight into the use of ensembles for Android malware detection. A comparison of different ensembling approaches is also provided, showing advantage for stacking. Provides evidence for the unsuitability of kNN for Android malware detection
Zhu et al. (2018)	Static	Boolean vector, PCA	TF-IDF	Rotation Forest	Sensitivity = 0.88 Precision = 0.88 Accuracy = 0.88 AUC = 0.89	Off-device	None	Use of Rotation Forest for Android malware detection can improve accuracy over individual models. However, there might be a performance penalty
Zhang et al. (2018)	Static	Boolean vector	None	CNN	Precision = 0.96 Recall = 0.98 Accuracy = 0.97 F = 0.97 Run-time	Off-device	None	The use of a complex neural network architecture like CNN leads to improved accuracy and help with Zero-day malware detection
Yerima et al. (2015)	Static	Boolean vector	Manual	Random Forest	TPR = 0.97 TNR = 0.97 FPR = 0.02 Accuracy = 0.97 Error rate = 0.02 AUC = 0.99	Off-device	None	Use of ensembles can help with detection of Zero-day malware. Features are extracted from both Manifest and DEX , increasing their diversity and potentially alleviating concept drift
Yerima et al. (2014)	Static	Boolean vector	Manual	Ensemble: Decision Tree, Logistic Regression (LR), Naive Bayes (NB)	TPR = 0.97 TNR = 0.97 FPR = 0.03 FNR = 0.02 Accuracy = 0.97 AUC = 0.95	Off-device	None	A thorough investigation of the effectiveness of different ensembling techniques is performed. Ensembling can also improve zero-day detection due to model diversity
Xu et al. (2018)	Static	Boolean and Bytecode vectors	None	MLP	Accuracy = 0.97 TPR = 0.97 FPR = 0.02 Run-time	Off-device	None	The two-layered detection design can improve performance without loosing accuracy. Use of deep learning reduces the need for manual feature engineering. An investigation of the resilience of the model against different attacks is reported
Wu and Hung (2014)	Dynamic	Boolean vector, 2-grams	Manual	SVM	Accuracy = 0.86 F = 0.85 Recall = 0.82 Precision = 0.9 FPR = 0.1 FNR = 0.18	Off-device	None	Uses APE, a complex input generation scheme for dynamic analysis, as opposed to simplistic random models used by prior literature. This can improve code coverage and detection accuracy
Wang et al. (2016)	Static	Boolean vector	None	DBN	Precision = 0.93 Recall = 0.94 F = 0.93	Off-device	None	Use of deep learning can improve detection accuracy and eliminate the need for manual feature engineering
Karbab et al. (2018)	Static	Vector sequence of API calls	None	CNN	F = 0.96 Precision = 0.96 Recall = 0.96 FPR = 0.031 Family detection Concept drift Attack resilience Run-time	Hybrid (feature extraction on-device; training and detection off-device)	None	Provides a thorough requirement analysis for Android malware detection, which clearly lays out expectations from such system. This allows for better comparison of different solutions proposed by literature. Also, all API calls are considered for analysis, not just a subset, as done by prior work
Aafer et al. (2013)	Static	Boolean vector	Manual	Decision Tree	Accuracy \(\sim\) 99 TPR \(\sim\) 97 TNR \(\sim\) 100 Run-time	Off-device	None	Provides a novel way of extracting API calls from DEX files. High run-time performance which leads to increased practicality
Burguera et al. (2011)	Dynamic	Integer vector	Manual	K-Means	Detection rate = 0.85 \(\sim\) 1.0	Hybrid (feature extraction on-device; training and detection off-device)	None	Proposed an approach, which compares execution traces of different versions of an app, to detect re-packaged malware (e.g., Trojans)
Dini et al. (2012)	Dynamic	Integer vector	Manual	kNN	FPR = 0.001 Family detection Run-time	On-device	None	The approach makes novel use of on-device dynamic analysis for anomaly-based Android malware detection
Peiravian and Zhu (2013)	Static	Boolean vector	None	Ensemble: Bagging with SVM and Decision Tree	Accuracy = 0.96 Precision = 0.95 Recall = 0.94 AUC = 0.96	Off-device	None	Provides comparison of use of permissions and API calls for malware detection. Ensemble learning can improve zero-day detection
Gascon et al. (2013)	Static	Graph (Integer vector)	None	SVM	FPR = 0.01 Detection rate = 0.89 ROC	Off-device	Using feature weights to explain predictions	Proposes a new way of labeling Dalvik functions for easier call graph generation. Makes novel use of kernels for embedding call graphs for digestion by ML models
Saracino et al. (2018)	Static & Dynamic	Integer vector	Manual	kNN	Accuracy = 0.96 FPR = 0.00001 Run-time Battery	On-device	None	Uses a combination of on-device dynamic ML-based detection and signature-based techniques to achieve higher accuracy. Also uses metadata from market listings as features
Sanz et al. (2013)	Static	Boolean vector	None	LR, NB, BayesNet, Decision Tree, Random Tree, Random Forest	TPR = 0.91 FPR = 0.19 AUC = 0.92 Accuracy = 0.86 ROC	Off-device	None	A pioneering work in the use of permissions for Android malware detection
Zarni Aung (2013)	Static	Boolean vector	Information gain	K-Means, Decision Tree, Random Forest, CART	TPR = 0.97 FPR = 0.15 Precision = 0.84 Recall = 0.97 ROC Area = 0.87	Off-device	None	Pioneering work in the use of clustering with permissions as features for Android malware detection. Makes novel use of hardware features to detect certain types of malware (e.g., those who use Camera or microphone for spying)
Yang et al. (2014)	Static	Boolean vector	Manual	NB, SVM, Decision Tree, Random Forest	Accuracy = 0.95 FPR = 0.4 Family detection Run-time	Off-device	None	Novel use of behavioral graph for detecting “malicious behavior” in Android apps, as opposed to simply label them as malware or benign. Clustering is performed to detect malware families
Amos et al. (2013)	Dynamic	Boolean vector	None	Random Forest, NB, MLP, BayesNet, LR, Decision Tree	Accuracy = 0.91 TPR = 0.97 FPR = 0.31 Run-time performance	Hybrid (feature extraction on-device; training and detection off-device)	None	Proposes a distributed system for large-scale detection of Android malware. Dynamic analysis alleviates evasion by code obfuscation or dynamic loading
Lindorfer et al. (2015)	Static & Dynamic	Boolean vector	Fisher score	LR, SVM	Accuracy = 0.99 Recall = 0.98 Precision = 0.99 Commercial comparison Concept drift	Off-device	Using F-score to find most discriminate features	Provides malice score for apps to better communicate risk, as opposed to binary malware/benign labeling. Hybrid analysis allows for more accurate detection
Shabtai et al. (2014)	Dynamic	Integer and boolean vectors	Manual	LR, Decision Tree, SVM, Gaussian Regression, Isotonic Regression	TPR = 0.8 FPR = 0 Accuracy = 0.87 Run-time	Hybrid (feature extraction on-device; training and detection off-device)	None	Makes novel use of network traffic patterns of apps for malware detection. Reports on a thorough investigation of the CPU/RAM/Storage overhead of the proposed solution for on-device deployment
Suarez-Tangil et al. (2017)	Static	Boolean vectors	Mean decrease impurity	Extra Trees	Accuracy = 99.64% Family classification	Off device	None	Novel use of a high variety of features to combat obfuscation. Novel use of feature ranking for dimensionality reduction
Bakour and Ünver (2021)	Static	Grayscale Image	Manual	Random Forest, Decision trees, kNN, Ensembles	Accuracy = 0.98	Off-device	None	Pioneering work in the use of image representation for Android malware detection. A great variety of image feature extraction techniques are used
Casolare et al. (2021)	Static	Color Image	Manual	Random Forest, SVM, MLP, CNN	Accuracy = 0.86 Precision = 0.86 Recall = 0.86	Off-device	None	Combines dynamic analysis with color image representation for Android malware detection
Cai et al. (2018)	Dynamic	Integer vector	Manual	Random Forest	Precision = 0.97 Recall = 0.99 F1 = 0.98 RoC Curve AUC = 0.98 Family classification Concept drift	Off-device	None	Makes novel use of ICC Intents for dynamic detection of Android malware. Can handle reflection when detecting API and system calls. Evaluated concept drift
Taheri et al. (2020)	Static	Boolean vector	Manual	FNN, ANN, WANN, KMNN	Accuracy = 0.99 FPR = 0.005 AUC = .99	Off-device	None	Makes novel use ot the hamming distance of static binary features for detecting malware. Extensive comparison of the use of different features and ML algorithms

Back to article page