GPT Conjecture: Understanding the Trade-offs between Granularity, Performance and Timeliness in Control-Flow Integrity

Performance/security trade-off is widely noticed in CFI research, however, we observe that not every CFI scheme is subject to the trade-off. Motivated by the key observation, we ask three questions. Although the three questions probably cannot be directly answered, they are inspiring. We find that a deeper understanding of the nature of the trade-off will help answer the three questions. Accordingly, we proposed the GPT conjecture to pinpoint the trade-off in designing CFI schemes, which says that at most two out of three properties (fine granularity, acceptable performance, and preventive protection) could be achieved.


I. INTRODUCTION
Along with the increased complexity of software, it becomes harder for the developers to ensure execution correctness in their software products, especially in those developed by the low-level programming languages, such as C/C++. A substantial amount of execution in-correctness is caused by the exploitation of software vulnerabilities in the real world. Softwares inevitably contain a wide variety of vulnerabilities, opening a window for attacks to compromise the system. Attackers have developed a series of attack methods, such as shellcode injection [1], return-to-libc [2], ROP [3] and so on, to exploit all kinds of vulnerabilities, e.g., buffer overflow, format string, use-after-free, and so on [4]. Among all kinds of attacks, the control-flow hijacking attack is the most dangerous one, because it allows the attacker to control the program's execution, execute arbitrary malicious code and attain Turingcomplete operation [3]. To mitigate the threats, many defense mechanisms, such as stack smashing protector (SSP) [5], address space layout randomization (ASLR) [6], data execution prevention (DEP) [7] and so on, have been put forward by researchers and applied in the real world software products.
Among all the defense techniques, security schemes based on the concept of control-flow integrity (CFI) have attracted many researchers' attention because of its simplicity to implement, effectiveness to cope with the full spectrum of controlflow hijacking attacks, and flexibility to trade between security and efficiency. CFI schemes guarantee the correctness of the program by dynamically checking the control-flow transfer and confining the target address to a legal set.
Since CFI was introduced by Abadi et al. in 2005 [8], many researchers afterward were dedicated to enhance its runtime performance, security, scalability, compatibility and so on. According to mainstream taxonomy, most CFI schemes can be clarified into two categories: fine-grained CFI schemes that provide more security guarantee, and coarse-grained CFI schemes that attain higher runtime performance. However, both fine-grained and coarse-grained CFI schemes have noticeable limitations that have not been addressed yet. As shown in previous survey papers [9], lightweight CFI schemes can not fully prevent sophisticated code reuse attack. The adversarys attacking strategy is to search large gadgets chain whose starting addresses are allowed in a rough control-flow graph that coarse-grained CFI schemes adopted [10], [11]. Precise CFI schemes usually suffer from unacceptable runtime overhead. Hence, it is widely believed "performance/security trade-off" exists between runtime overhead and security in different CFI schemes [9], [12].
However, we observe that not every CFI scheme is subject to the trade-off between performance and security. In fact, several CFI schemes are "immunized" from doing such a tradeoff. For instance, πCFI designed by Niu et al. achieves finegrained security with a runtime overhead of 3.2% on average, which is fairly low and acceptable [13]. Victor et al. proposed a context-sensitive CFI scheme that achieves stronger security than conventional fine-grained ones with an overhead of less than some of the coarse-grained ones [14].
Key Observation. The trade-off between performance and security does not universally exist in meaningful CFI schemes. This intriguing observation motivates us to ask three questions: ➊ does trade-off really exist in different CFI schemes? ➋ if trade-off do exist, How do previous works comply with it? ➌ how can it inspire future research?
Although the questions probably cannot be directly answered, they are inspiring. On the other hand, we find that a deeper understanding of the nature of the trade-off will help answer these questions. Accordingly, we propose the GPT conjecture to pinpoint general trade-offs in CFI schemes: the impossibility of guaranteeing both fine granularity and acceptable performance in a Just-In-Time CFI scheme. We analyze its rationality through empirical study-surveying a series of representative CFI schemes and showing how existing CFI schemes comply with our conjecture. Finally, we give some recommendations for future researchers. We believe that our conjecture will help researchers have a more clear understanding of internal relations among properties of CFI schemes, thereby, motivating future research in this area.

II. BACKGROUND
When compiling source code written by low-level language (such as C or C++) into machine code, the compiler emits control data [15] (data that are loaded to processor program counter at some point in program execution, e.g., return addresses and function pointers) into the binary file without any protection. The security of control data depends on checks inserted by the programmer to enforce memory safety [16]. Along with program execution, attacker's malicious tampering with control data through software vulnerabilities, such as buffer overflow, can transfer the program's control-flow to any executable address in process space.
Based on this observation, researchers invented CFI to protect programs against control-flow hijacking attacks by checking programs' control data before loading them into the program counter (EIP/RIP register in x86/x64 architecture). CFI's strategy is to restrict the control-flow of a program to a pre-calculated CFG by checking indirect control-flow transfers at runtime [9]. Generally, most of CFI schemes follow a mainstream that consists of two phases.
In phase one, an analyzer statically computes the program's control-flow graph (CFG). CFG is a representation in graph form of all legitimate control-flow transfers (also being called branch) in program space. It consists of sets of nodes and directed edges. Each node and edge denotes a basic block and a valid branch in the program respectively. For a comprehensive understanding, we refer the reader to the formal definition of CFG in work by Allen, et al. [17].
In phase two, a runtime control-flow checking (validation) component validates just fetched control data before each indirect-branching according to the legitimate CFG generated in phase one 1 . An indirect-branch can pass checking only if it can be matched to a corresponding edge in the CFG. A failed validation will result in the process to terminate its execution and report an error. In such a fashion, control-flow attacks which usually introduce out-of-range branch are extremely prohibited. Researchers need to design efficient data structures to represent the CFG and enable runtime checking.
Despite its straightforward main idea, it is pretty challenging to design a CFI scheme with strong security, acceptable performance, high compatibility and so on [12], [9]. Researchers have designed hundreds of CFI schemes to explore its potential in different perspectives. The dominant difference of these various CFI schemes can be summarized into three aspects: 1) the precision of a CFG they employed. 2) the algorithm they designed to check indirect-branches. 3) the time point checking algorithm was activated.

1) Precision of CFG Analyzer
: CFG can be obtained by analyzing the program's source code or binary code. Like pointer analysis [18], perfect CFG generation is can not be fully achieved yet in many situations [10]. By now researchers have adopted several types of methods (insensitive analysis, context-sensitive analysis, and path-sensitive analysis) in their CFG analyzer and achieve different precisions. It is widely agreed that path-sensitive analysis is more precise than context-sensitive analysis, and context-sensitive analysis is more precise than insensitive analysis [19].
2) Algorithm to Enforce Checking: The efficiency of different CFI schemes is largely dependent on their algorithms to enforce validation, which is tightly combined with their data structure that represents the CFG and enables runtime checking. Researchers have designed different types of algorithms and data structures in different CFI schemes. For example, the original CFI scheme proposed by the Abadi, et al. groups branch targets into different sets, assigns each set with a label, and inlines labels into each jump targets, i.e., the basic block's in code. Based on this data structure, "guard instructions" are emitted before each indirect-branch instruction to compare its label with the one in target basic block [8]. A mismatch indicates that the control data is corrupted, then the program's execution will be redirect to the error handling code accordingly.

3) Just-In-Time Checking vs. Lazy Checking:
Another difference among CFI schemes is how they schedule their checking operations. Most CFI schemes check the target address before indirect-branch occurs (we define it as a Just-In-Time checking). While, to achieve better performance, some works log each indirect-branches at runtime and check them by employing another accompanying thread [21], [22], [23], [14], [22] (we define it as Lazy checking). For example, PITTYPAT [22] enforces path-sensitive CFI by maintaining a "shadow" execution/analyzer, running concurrently with the protected process and checks its finished indirect-branches. Such a non-intrusive checking does not disturb the normal execution of the monitored process, hence achieves pathsensitive CFI with practical runtime overhead.

III. CONJECTURE
This section aims to answer Question➊ and Question➋. We observe that some terms, such as coase-grained/fine-grained, have not been clearly defined. Before introducing the GPT conjecture, let us give a more precise definition of the terms and concepts that will be used throughout the paper. Then we propose the GPT conjecture which helps to answer Question➊. At last, some evidence is collected from an empirical study to answer the Question➋.

Property 1.(Granularity)
Suppose a program has n indirect branch instructions. Let Z i 2 denote the set of valid successors (basic blocks) of the i-th indirect branch instruction, and S denote the set of all successor sets, namely, For a CFI scheme, let C i denote the checking set which is defined by the scheme and assigned to the i-th indirect branch instruction, then used to check the branch's target at runtime. Only the elements in C i are valid successors authorized by the CFI schemes that the i-th branch instruction could jump to. Definition 1. For arbitrary two sets Z i , Z j from S, satisfying Z i ∩ Z j = ∅ ∨ Z i = Z j , as long as the CFI scheme merges Z i , Z j when define its C i or C j , namely, we define this scheme as a coase-grained CFI scheme. Otherwise, we define it as a fine-grained CFI scheme. This definition enables us to determinate the granularity property of CFI schemes. REMARK 1. According to Definition 1, both contextsensitive and path-sensitive CFI schemes belong to fine-grained CFI scheme. In essence, they reduce the size of their checking set C i for i ∈ [1, n] based on context-sensitive or path-sensitive pointer analysis. Their protection is generally considered to be more powerful than that of insensitive fine-grained CFI scheme. REMARK 2. Note that CFI schemes [24], [25] which adopt pointer encryption approach should be classified as coase-grained CFI scheme. They cannot fully prevent code reuse attack because of two noticeable drawbacks. As discussed in Cryptographically Enforced Control Flow Integrity (CCFI) [24], it is still possible to replace the current encrypted pointer with another one from the program space and potentially disrupt control flow. The other drawback is that these schemes suffer from key leakage issues: the key can be infered by brute-force attack or known-plaintext attack [26], especially for schemes which adopt a linear encryption/decryption method (XOR) [25]. REMARK 3. We remark that schemes that only provide partial protection-protecting subset of indirect branches in program space-belong to coase-grained CFI scheme. For instance, vfGuard [27], VTV [28], and SAFEDIS-PATCH [29] only achieve strict protection for virtual function calls in COTS binaries;

Property 2.(Performance)
Evidence 1. As discussed in many papers [4], [9], [30], runtime performance is one of the most important determinants of whether a defense technique will be adopted by industry. Generally, to get adopted by industry, a defense technique should introduce less than 5% average overhead, such as StackGurad, ASLR, and DEP. Techniques incuring an overhead larger than 10% do not tend to gain wide adoption in production environments. Accordingly, the threshold should lie between 5%-10%.

Evidence 2.
Other than runtime performance, space performance is another important index to measure a scheme. Program's runtime memory consumption consists of four aspects, i.e., code, global data, heap, and stack. Different programs have different ratios in four aspects, and a defense technique commonly increases memory consumption in one or more aspects. We observe that shadow based protections like shadow stack [31], shadow memory [32] and shadow processing [33], that double memory consumption in one or more aspects are unlikely to be deployed in practice.

Definition 2.
Conservatively, we define a runtime overhead of less than 10% and a space overhead of less than 100% (in any of aforementioned four aspects) as an acceptable performance. Otherwise, it is an unacceptable performance. This definition enables us to determinate the performance property of CFI schemes.

Property 3.(Timeliness)
Observation 1. Whereas the term "integrity" in the context of CFI implies that it can prevent the attacks [8], some of the CFI schemes do not hit the mark. To achieve higher efficiency, some CFI schemes as mentioned in Section II-3 adopted a lazy checking mechanism, which checks programs' control-flow following the programs execution rather than before each indirect branching. Generally, they log the program's runtime control-flow transfer along with its execution, then check the control-flow offline or through an accompanying thread. In these designs, a sliding window exists between the program's control-flow transfer and checking. The attacker can compromise the system without being perceived in the sliding window, which means this kind of CFI cannot protect software against such attacks. Definition 3. We regard that the aforementioned design of CFI schemes provides less protection than CFI schemes that perform Just-In-Time checking. We define protection capability powered by lazy checking schemes as detective protection, the others that powered by Just-In-Time checking as preventive protection. This definition enables us to determinate the property timeliness of CFI schemes.

GP T Conjecture:
A control-flow integrity scheme can have at most two out of three properties: P1. Fine granularity P2. Acceptable performance P3. Preventive protection

C. Some Evidence of the GPT Conjecture
In this section, we will reflect on our conjecture through several pieces of evidence. To verify the rationality of our conjecture, we conduct an empirical study on 32 representative works, and show the results in Table I. Three columns (P1, P2 and P3) in the table display three properties respectively as we define in Section III-A. P1 column denotes the granularity-check-mark indicates a fine-grained scheme whileas cross-mark represents a coase-grained 3.00% ✗ 1 If a CFI scheme supports different security levels, e.g. having both coase-grained and fine-grained versions, we focus on its most secure version. 2 'H', 'P' and 'C' denote hardware-assisted CFI scheme, path sensitive CFI scheme, and context sensitive CFI scheme, respectively.
scheme. P2 column shows the performance overheads which are reported in corresponding papers. Note that we prefer evaluation results which are based on SPEC CPU R 2006 benchmarks [53]. P3 column labels whether a CFI scheme provides preventive protection. We label the data in each column with red color when it fails to meet the requirement defined in the conjecture.
Evidence i. It can be clearly seen in Table I that all CFI schemes we surveyed comply with our conjecture-no CFI schemes can achieve all three properties. Also, some of unsophisticated schemes, such as PITTYPAT [22] and GRIFFIN [23], only achieve one properity, i.e., fine granularity.
Evidence ii. MCFI [20] and πCFI developed by Niu, et al. achieve fine granularity with acceptable runtime overheads, i.e., 3.2% and 5.0%, respectively. However, researchers did not realize that their better runtime overhead is achieved through sacrificing their space performance. Even though they did not report their space overhead in their paper explicitly, we can infer it in a reasonable manner.
As discussed in Section II-2, both of two schemes adopt two tables, namely Bary and Tary, to support their runtime checking. Accordingly, 1GB/4GB memory space on x86-32 and x86-64 operating system, respectively, need to be reserved in each process for the tables. As stated by the author, "On x86-32, memory segmentation is used, as in NaCl [54]. A 1GB segment is reserved for running the application code and another 1GB segment is reserved for the table region. x86-64, however, does not support memory segmentation. Instead, memory writes are instrumented so that they are restricted to the [0, 4GB) memory region. Another 4GB memory region is reserved for tables." In view of the size of memory consumption of typical programs (mostly less than 1GB [53]), their space overhead has already reached 100% except for code bloat caused by extra no-op instructions inserted to enforce four-byte alignment on indirect-branch targets.
Evidence iii. GRIFFIN [23] is a hardware-assisted CFI, which leverages Intel PT to record control-flow of a monitored program. It supports multiple types of CFI policies to enable flexible trade-offs between security and performance. The fine-grained scheme incures an average of 11.9% overhead. It leverages idle cores on a multi-core system for security checking by having multiple worker threads to check runtime control-flow simultaneously. In most of the time, it performs non-blocking checking which analyzes trace buffer of Intel PT whenever it becomes full; In a few cases when security-sensitive system calls are invoked, it performs blocking checking which stops the target thread until all the control transfers in the buffer have been checked. It can only provide the detective protection for software according to Definition 3. This case indicates that GPT conjecture is applicable to hardware-assisted CFI schemes.
Evidence iv. PITTYPAT [22], µCFI [21] and PathArmor [14] are path/context sensitive CFI schemes which adopt path-sensitive or context-sensitive analysis to generate their CFG. However, path-sensitive and context-sensitive analysis is generally considered to be more time-consuming and spaceconsuming than insensitive analysis [19]. We find that all three CFI schemes adopt two common features: hard-assisted branch recording and lazy checking. Specifically, PITTYPAT and µCFI employ Intel PT-a brand new hardware feature in Intel CPUs-to efficiently record conditional and indirect branches taken by a program at runtime while PathArmor adopts Last Branch Record (LBR) registers available in Intel processors to monitor recently exercised control-flow transfers in an efficient way. Their control-flow checking is achieved through accompanying threads. This case indicates that both path-sensitive and context-sensitive CFI schemes conform to the claim of GPT conjecture. REMARK 4. Our observations indicate that the GPT conjecture is universally applicable in all kinds of scenarios. Further, four pieces of evidence are not meant to be exhaustive and more evidence are easy to find.

IV. IMPLICATIONS OF THE GPT CONJECTURE
In this section, we will focus on answering Question ➌: how can GPT conjecture inspire future research?
First of all, GPT conjecture illustrates the inherent tradeoffs of three important properties (fine granularity, acceptable performance, and preventive protection) in CFI schemes. It helps researchers to have a deeper understanding of the nature of CFI based protection. Accordingly, future researchers should make a necessary sacrifice before designing new CFI schemes. In the broader context, GPT conjecture provides insights into the feasible design space for CFI schemes, shedding some light on the manner in which algorithm designers and software engineers have circumvented the conjecture.
Second, for decades, security researchers have been focused on CFI scheme's runtime performance and made their best effort to improve it. Evidence ii shows that in some cases, better runtime performance is achieved by sacrificing its space performance. Just as Gerhard states, "For some problems, we can reach an improved time complexity, but it seems that we have to pay for this with an exponential space complexity" [55]. Therefore, performance evaluation in future research should not merely be limited to runtime performance and researchers should have a more comprehensive evaluation of their schemes.
Third, Evidence iii that even powerful hardware support cannot eliminate the runtime overhead of Just-In-Time CFI schemes to an acceptable level, which implies that the challenge in the implementation of CFI cannot be solved only through engineering efforts, instead, it may relate to computational complexity theory [56]. In a broader sense, we observe that indirect branching poses not only challenge in the security field, but also challenges to many others: precise pointer analysis is NP-hard [57]; indirect branch prediction is a performance-limiting factor for current computer systems [58]. Hence, GPT conjecture implies the complexity of the CFI problem, which deserves to be investigated through theoretical methods.
At last, despite the inspiring implications that Gpt conjecture gives to us, we admit that we still cannot prove the conjecture at this time.

V. CONCLUSION
Control-flow integrity is a popular defence technique for detecting and defeating control-flow hijacking attacks. Since its inception in the decade, researchers have put great efforts to explore its potential regarding security, performance, compatibility and so on. Even though performance/security trade-off is widely noticed in CFI research, we observe that not every CFI scheme is subject to it. In this paper, we propose the GPT conjecture to illustrate the general trade-offs in CFI schemes. The conjecture points out the impossibility of guaranteeing both fine granularity and acceptable performance in a Just-In-Time CFI schemes. We have verified the rationality of our conjecture based on an empirical study on existing works. Even though we cannot prove the conjecture at this time, we believe that GPT conjecture will help researcher to have a deeper understanding of the nature of CFI problem and it will direct future research in this area.