Security has always been a secondary consideration in computer system designs in the past two decades because the priority is always given to performance, power and cost (area). This subordination leads to potential security risks and weak defense mechanisms.
Existing architecture-level defenses are usually resulted from passively responding to specific attacks
Several micro-architecture level defense mechanisms have been integrated into commercial chips. Here, we discuss three representative types of architecture-level defenses.
Memory overflow defense: Memory corruption bugs enable attackers to maliciously change a program’s behavior (van der Veen et al. 2012; Szekeres et al. 2013). Applications written in low-level languages like C/C++ are prone to these kinds of bugs due to the lack of memory safety. Since it is difficult to figure out all potential memory overflow bugs, an efficient way to enforce memory safety is to add extra hardware protection mechanisms. For example, Intel recently released a new ISA extension, named as Memory Protection Extensions (MPX) (Oleksenko et al. 2017). To facilitate memory safety checking, application developers are allowed to insert boundary record instructions to store the boundaries of protected memory regions, like arrays, at the places where these data structures are defined, and insert boundary check instructions to ensure that the access to these data structures are inside their valid ranges. Unfortunately, MPX suffers from high performance overhead which limits its adoption.
Pointer integrity defense: The Pointer Authentication (PA) mechanism is added to ARMv8.3-A to prevent memory corruption attacks (Qualcomm Technologies 2017). PA guarantees the integrity of pointers by binding each pointer with a Pointer Authentication Code (PAC). As the actual address space in 64-bit architectures is less than 64 bits, PA places the PAC to the unused bits in the pointer value to minimize the size and performance impact. PACs are computed by a lightweight cryptography algorithm (Avanzi 2017) and added to the pointer values by extended PAC instructions. The integrity of pointers is verified and restored by the AUT instructions. To restrict the accesses to pointers in special context, PA allocates keys for instruction pointers, data pointers and general-purpose instructions. The keys are still managed by software.
Control-flow integrity defense: Control-flow integrity (CFI) (Abadi et al. 2005; Davi 2015; Burow et al. 2017) is considered as one general and promising defense against code-reuse attacks (Shacham 2007; Bletsch et al. 2011; Carlini and Wagner 2014; Schuster et al. 2015). CFI restricts the control-flow of an application program to valid execution traces; and the program’s predefined Control-Flow Graph (CFG) tells what is valid. Forward-edge control-flow represents transfers caused by indirect jumps and function calls. Backward-edge control-flow represents transfers caused by function return instructions. Intel has an ISA-level CFI extension, named as Control-flow Enforcement Technology (CET) (Intel Corporation 2016), which protects forward-edge CFI by indirect branch tracking and ensures backward-edge CFI by hardware shadow stacks. However, CET suffers from two problems. One is the difficulty in defining a complete and precise legal CFG (Evans et al. 2015), because some informaion is only available at runtime, such as the target of an indirect jump. The other is the limited size of the hardware shadow stack. The stack has to rely on the operating system (OS) for handling context switches and deeply nested function calls (Frantzen and Shuey 2001).
In summary, a common characteristic of three representative types of architecture-level defenses is that they seek to protect the chip or system from a specific vulnerability. In other words, existing architecture-level defenses are usually resulted from passively responding to specific attacks. As a consequence, such patch-like approaches are usually not generic and cannot handle attacks which exploit zero-day vulnerabilities.
Trusted computing alone is not secure
Trusted computing ensures that executable binaries are not tampered (David et al. 2008; Trusted Computing 2008). The foundation for trusted computing relies on a dedicated chip, which is defined as a Trusted Platform Module (TPM), to measure the integrity of executable binaries by verifying their hash values. To implement this measuring idea, the TPM is designed as a coprocessor that helps the platform software to verify itself using a predefined sequence as well as a cryptographic engine that accelerates encryption, digital signatures, and hashing.
The TPM was initially used as the static root of trust for measurement (SRTM) (Trusted Computing Group 2003). SRTM utilizes the TPM to verify the integrity of booting processes. If a chain of trust is established during a booting process, the boundary of trust can be extended to include more than one level of software within the system. To set the system into a clean state without rebooting, researchers propose dynamic root of trust for measurement (DRTM). Two DRTM implementations are as follows. Intel develops the Trusted Execution Technology (TXT) (Intel Corporation 2006) to securely launch software (such as the hypervisor and security kernel) at arbitrary time. AMD offers similar capabilities with its Secure Virtual Machine (SVM) extensions.
However, the security requirements are far beyond guaranteeing the integrity of executable binaries. Even if the executable binaries are unchanged, there are still security risks as the control flow integrity (CFI) and data flow integrity (DFI) can be exploited by malware. Dedicated mechanisms implemented in processor chips could be very helpful in defending such kind of advanced attacks.
Logical isolation suffers from information leak through physical side-channels
ARM TrustZone (Wojtczuk and Rutkowska 2017) and Intel SGX (McKeen et al. 2013) are similar technologies which adopt logical isolation to provide a trusted execution environment for security-sensitive data and code. For performance reasons, secure and non-secure programs (or worlds) still share substantial physical hardware resources, including the on-chip cache hierarchy, TLB and others. This sharing leads to the risk of side-channel information leakage.
ARM TrustZone adds a NS bit in the memory system to divide the processor into two worlds: a normal world and an isolated secure world. The non-secure world cannot directly access the resources used by the secure world (ARM Limited 2009). The two worlds communicate with each other through a security monitor. To improve the system performance, caches are not flushed during world switches. However, this allows cache lines from the secure world to be evicted by the cache lines from the normal world and vice versa. Such evictions can be exploited as a cache side-channel to leak security-sensitive information. Exploiting the cache side-channel by evicting the secure cache lines cached in the shared cache belonging to the normal world, a prime+probe attack (Zhang et al. 2016) was able to infer the full AES128 secret key in 2.5 seconds from the normal world kernel or 14 minutes from a user space Android application. Allowing non-secure and secure cache lines to co-exist in caches may also result in cache incoherence behavior. This incoherence can be exploited to install rootkit, which evades the memory introspection mechanisms (Zhang et al. 2016).
Intel SGX is designed to increase the security of software through an “inverse sandbox” mechanism. In this approach, rather than attempting to identify and isolate all the malware on the platform, legitimate software can be sealed inside an enclave and protected from attacks by the malware, irrespective of the privilege level of the latter. In other words, this allows an application to create a secure enclave at the CPU level which is protected from the OS upon which it is running. Data inside an enclave is allowed to be accessed only by the codes located in the same enclave. Moreover, the content of an enclave is encrypted when stored in the memory. Even if an attacker has the ability to snoop the memory bus, she cannot get any useful information. Although the OS cannot directly access the memory region used by an enclave, to simplify the deployment and improve the performance, SGX still leaves the OS in charge of setting up the page tables used by enclaves. By observing an application’s page faults and page table attributes, a malicious OS can infer part of the memory access pattern of the application. By hacking the page fault handler of the OS, attackers can track the access patterns of the enclave at page size granularity. Xu et al. (2015) implemented a controlled-channel attack on SGX-enabled platforms to reveal the input-dependent control transfers and data accesses of the target program. A similar but stealthier attack can be launched utilizing the access bit inside the page table entry (Wang et al. 2017). Leveraging the contention in memory resources including caches and TLBs etc., Wang et al. implemented a traditional prime+probe attack which is more fine-grained and powerful on SGX-enabled platforms.
Current architecture-level security subsystems are not really secure
Most of current server chips include a subsystem dedicated for system management. The subsystem is basically a tiny computer-within-a-computer normally embedded directly in the chip along with the host processor cores. Typical examples of such subsystems include Intel Management Engine (ME) (Datenschutz and Pataky 2017; Bogowitz and Swinford 2004), AMD Platform Security Processor (PSP) (Advanced Micro Devices 2018b; Wikimedia Foundation 2018) and Power On Chip Controller (OCC) (Sinharoy et al. 2015). The functions of these subsystems include managing the boot process, keeping the system under thermal limits, and running based on user input modes and parameters. The subsystems can also be used to monitor the system for identifying suspicious activities or events, and make appropriate responses. As security subsystems usually contain a rich set of hardware-based cryptographic primitives, they process security related functions, such as secure boot, secure updates, secure debug and secure communication, significantly faster than software-only solutions.
However, current designs (of these subsystems) need improvement in terms of performance and security. Firstly, complicated logic is expected to run on these subsystems to detect and handle potential malicious behaviors, even unknown attacks. These security inspection workloads can have strong hardware performance requirements, but popular designs use tiny embeded processor cores, such as ARM M series processors. Hence, there is a performance gap between the required workloads and the actual computation power. Secondly, the security of such a subsystem itself is a concern. As the subsystem has accessves to everything, it might provide an attacker with a powerful backdoor if compromised. The subsystem consists of layers of quite complex software. For example, the Intel ME runs a hidden MINIX OS (Datenschutz and Pataky 2017; Bogowitz and Swinford 2004). Various software components also run on the ME, such as the Intel Active Management Technology (AMT) (Bogowitz and Swinford 2004). On other hand, the complexity provides an attacker with a good chance to compromise the subsystem. On the other hand, the subsystem is implemented with embedded processors (Sinharoy et al. 2015; Datenschutz and Pataky 2017; Advanced Micro Devices 2018b) which usually lack the common security hardening such as stack cookies, No-eXecute (NX) flags, or address space layout randomization (ASLR). Hence, there is also a security gap between the security requirements and the current security hardening. It has been reported that the vulnerabilities in Intel ME and AMD PSP allow attackers to gain administrative capabilities (Intel Security 2017; Advanced Micro Devices 2018a).