Skip to main content

Table 2 Overview of static features extracted from Android APKs by the reviewed papers

From: On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

Feature

APK part

Rationale

Tools

Used by

Requested permissions

Manifest

Malware tend to request more permissions, and more dangerous ones.

APKTool (2021), aapt (Portal AD 2021b), androguard (Project A 2021)

Liu and Liu (2014), Arp et al. (2014), Yuan et al. (2014, 2016), Alzaylaee et al. (2020), Sahs and Khan (2012), Li et al. (2018), Wang et al. (2014), Wu et al. (2012), Demontis et al. (2019), Yerima (2013), Kim et al. (2019), Zhu et al. (2018), Zhang et al. (2018), Yerima et al. (2014, 2015), Wang et al. (2016), Aafer et al. (2013), Peiravian and Zhu (2013), Saracino et al. (2018), Sanz et al. (2013), Zarni Aung (2013), Lindorfer et al. (2015), Suarez-Tangil et al. (2017)

Used permissions

DEX

All requested permissions might not be used. Unused permissions introduce noise and should be eliminated.

APKTool (2021), PScout (Project P 2021b), baksmali (Project B 2021)

Liu and Liu (2014), Arp et al. (2014), Demontis et al. (2019), Lindorfer et al. (2015)

Hardware requirements

Manifest

Malware tend to request more sensitive hardware (e.g., Camera)

aapt (Portal AD 2021b)

Arp et al. (2014), Demontis et al. (2019), Sanz et al. (2013)

Names and types of app components

Manifest

To detect code reuse (common services, broadcast receivers, or other app components) by malware

Arp et al. (2014), Wu et al. (2012), Demontis et al. (2019), Kim et al. (2019), Suarez-Tangil et al. (2017)

Filtered intents

Manifest

Malware tend to subscribe to sensitive system broadcasts, such as BOOT_COMPLETE.

Arp et al. (2014), Wu et al. (2012), Demontis et al. (2019), Zhu et al. (2018), Zhang et al. (2018), Lindorfer et al. (2015), Suarez-Tangil et al. (2017)

API calls

DEX

Malware may call sensitive or suspicious APIs, such as ones to access SMS.

baksmali (Project B 2021), soot (Project S 2021), androguard (Project A 2021), dexdump (Man Pages U 2021)

Arp et al. (2014), Yuan et al. (2016), Sahs and Khan (2012), Wu et al. (2012) Yuan et al. (2014), Demontis et al. (2019), Yerima (2013), Kim et al. (2019), Zhu et al. (2018), Zhang et al. (2018), Yerima et al. (2014, 2015), Karbab et al. (2018), Aafer et al. (2013), Peiravian and Zhu (2013), Gascon et al. (2013), Yang et al. (2014), Suarez-Tangil et al. (2017)

Network addresses

DEX

Malware may commonly communicate with untrustworthy internet hosts.

Arp et al. (2014), Demontis et al. (2019)

Opcodes

DEX , Shared libraries

Certain sequences of opcodes may reveal malicious intents in apps.

baksmali (Project B 2021), IDA Pro (Hex-rays 2021)

McLaughlin et al. (2017), Kim et al. (2019)

Bytecodes

DEX

Certain bytecode sequences may reveal malicious intents in apps.

Grace et al. (2012), Xu et al. (2018), Bakour and Ünver (2021)

Decompiled Java code

DEX

Certain patterns of code may reveal malicious intent.

dex2jar (Project D 2021b), Procyon (Project P 2021a)

Milosevic et al. (2017), Wang et al. (2016)

Linux command strings

DEX & Resources

Malware may use dangerous commands to exploit the phone and gain privileged access.

Yerima (2013), Yerima et al. (2014, 2015)

Use of encryption routines

DEX

Malware may use encryption to hide their intent.

Yerima (2013), Lindorfer et al. (2015), Suarez-Tangil et al. (2017)

Presence of secondary APK or shell scripts

Assets

Malware may hide APK files which will be installed after infection. Shell scripts might be used for exploitation.

Yerima (2013), Lindorfer et al. (2015)

Environmental Information

Manifest

Malware may target a specific vulnerable execution environment (e.g., Android version).

Kim et al. (2019), Suarez-Tangil et al. (2017)

Constant strings

Resources

Malware may contain suspicious strings (e.g., fake ads)

APKTool (2021)

Kim et al. (2019), Zhang et al. (2018), Xu et al. (2018), Suarez-Tangil et al. (2017)

Use of Java reflection

DEX

Malware may use reflection to dynamically load code and thwart static analysis efforts.

Lindorfer et al. (2015), Suarez-Tangil et al. (2017)

Signing certificate data

META-INF

The fingerprint, serial number, owner or other data from the certificate may correspond to known malware authors.

Lindorfer et al. (2015), Suarez-Tangil et al. (2017)

Presence of native executables or libraries

Lib

Malware often use native code to perform exploits or make reverse-engineering harder.

Lindorfer et al. (2015), Suarez-Tangil et al. (2017)