Workflow of IM-Visor
As a pre-IME design, IM-Visor always recognizes and isolates sensitive keystrokes before the IMEs could access them. To achieve this, whenever a user intends to type in a soft keyboard, the STIE will be initialized to intercept touch events and analyze whether it is a sensitive keystroke. From the perspective of how touch events (i.e., keystrokes or non-keystrokes) are handled, Fig. 8 shows the workflow of IM-Visor after the STIE has been initialized. The red data path indicates the trusted path from touch screen to a user app. On the other hand, as shown in green color, when non-sensitive touch events (i.e., non-sensitive keystrokes or non-keystrokes) are found, the pre-IME Guard asks the Replay Executor to replay the corresponding touch event to the targeted apps (e.g., IME apps or other apps).
Address challenge 1: isolation ahead of IME translation
At first, let’s recall some backgrounds about the IMF and event subsystem in “Android IME, Input Method Framework (IMF) and event subsystem” section. A keystroke would be preprocessed by event subsystem before any IME app could access it. Specifically, TouchInputMapper in event subsystem is the class for touch event processing. InputMethodManagerService (IMMS) in the IMF is a global system service that manages the interaction across IME apps and user apps. Anytime a user app requests a soft keyboard, IMMS would ask an IME app to show a soft keyboard by calling showSoftInput.
STIE initialization
The primary technical challenge of the STIE initialization is guaranteeing that IM-Visor is always aware of when a user is typing in a soft keyboard prior to the execution of any IME app code. If IM-Visor can create the STIE as soon as a user firstly puts his or her finger on a soft keyboard in normal world, then the pre-IME Guard is able to intercept user keystrokes from the start of input and the pre-IME nature can be ensured. To address this challenge, the key idea is to check whether a soft keyboard has been shown up each time a touch event arrives in event subsystem. In modern mobile devices with a touch screen, we assume that a user intends to type text when he or she taps on touch screen after a soft keyboard has been shown up. And the keyboard display information is maintained in secure world.
Figure 9 shows how we initialize the STIE. The first user tap on the edit box of a user app will ask IMMS to start up an IME app. This process in fact invokes two hooks: sync and showSoftInput. IM-Visor will ignore the touch but update keyboard display information in secure world. At this moment, the STIE has not been initialized yet. Then the user may taps on a soft keyboard. This behaviour of course invokes sync again. At this moment, the STIE must be initialized, because tapping on an IME soft keyboard is obviously a keystroke. We reconfigure peripherals like display controller and touch screen as secure. Then the pre-IME Guard receives touch events directly through secure touch screen.
Touch event processing and keystroke translation
In order to intercept user keystrokes in secure world, the touch screen is reconfigured to be only accessed by secure world, and a separate touch screen driver is implemented in secure world. As a result, anytime a touch interrupt arrives, the driver will be the first to receive user touch coordinates. In order to figure out which keystroke a user types, we need two pieces of information: touch coordinates and the current soft keyboard layout. The touch screen driver in secure world provides a secure way to obtain touch coordinates. Now we explain how the pre-IME Guard gets the soft keyboard layout securely. The soft keyboard layout is a piece of display data in normal world that an IME app puts in framebuffer. And framebuffer is a region of memory which is allocated by a Linux display driver. The display controller is a peripheral to generate the necessary control signals for data display. To obtain the soft keyboard layout on which a user is typing, the pre-IME Guard takes two steps. Step 1) The display controller is reconfigured as secure by TrustZone TZPC so that normal world cannot change it. This is important because display controller provides information about the start region of framebuffer. If the display controller is not controlled by secure world, normal world software can deceive the pre-IME Guard into a wrong framebuffer and translated in a wrong soft keyboard layout. Step 2) After a touch event happened, the pre-IME Guard reads framebuffer and checks correctness of the layout. As a proof-of-concept prototype, IM-Visor preloads the layout information of popular IME apps and determines whether the layout is correct by comparing the hash of the current layout with the preloaded standard one. As a future work, in step 2, instead of the “preload&check” way, we will obtain the current layout by an efficient optical character recognition (OCR).
Leaving framebuffer in normal world is not a secure concern as it seems. Supposing an untrusted IME app intends to figure out which keystroke a user types, it also needs the above two pieces of information. But sensitive touch coordinates only stay in secure world. So an IME app cannot succeed in finding sensitive keystrokes without touch coordinates.
After identifying a keystroke, the pre-IME Guard will translate it into a character. Now we give an example of keystroke translation. For Latin language, every keystroke can directly correspond to a character, but for non-Latin languages candidate words often need to be shown. Here, we only discuss Latin language translation with a qwerty keyboard. Supposing the user types “a” in the soft keyboard, then secure touch screen gets the touch point C(x, y). The preloaded keyboard layout helps the pre-IME Guard determine whether this point falls in the geo range of “a” key button, which is defined by the top-left point A(x1, y1) and the bottom-right point B(x2, y2). If it falls in, the pre-IME Guard translate the keystroke as a character “a”.
Sensitive keystroke analysis
In order to analyze whether keystrokes are sensitive, we accept the I-BOX’s policy engine, which enforces a specific context-based policy and a specific prefix-matching policy. In the IMF, text fields in user apps have different types, such as dates and passwords. IM-Visor can leverage these information to decide whether current input is sensitive or not. Specifically, the hook startInput in IMMS can provide information of text fields. If the current edit box works for passwords (or something sensitive like that), the pre-IME Guard will know it from the start of a soft keyboard display and treat all following keystrokes as sensitive. This is called the “Context-based Policy”. User activities such as logging in is a typical case that IM-Visor can enforce such policy. For general user input stream, after translating keystrokes to string, IM-Visor leverages prefix-matching to search all possible substrings when a new char is typed (Aho and Corasick 1975). The sensitive data set used for searching is defined by users. As a user could consider large numbers of data instances as sensitive, IM-Visor uses a trie-like structure to maintain it in secure world. This is called the “Prefix-matching Policy”.
Address challenge 2: trusted path
After isolating and translating sensitive keystrokes, we should commit them to the targeted user app. Obviously, as a pre-IME defense, we cannot use untrusted IME apps to commit sensitive string. So we have to find another data path isolated from IME apps. Our main idea here is to add an independent system service that can commit sensitive string from secure world to a user app for the trusted path.
Normally, the edit box of a user app uses a local binder named IInputContext.Stub to receive char strings. And the client of IInputContext.Stub is initialized in the IMMS at the start of input. In light of the above fact, we add some code in the IMF to make the IMMS create an extra binder client for our newly added service commit-proxy. And then the commit-proxy is capable of committing sensitive string to the user app. Because the new IPC and new service are independent of an IME app, sensitive string in this data path cannot be accessed by any IMEs.
Create a new IPC. Figure 10 shows how the commit-proxy creates a new IPC with a user app. When a user taps on the edit box, the user app asks IMMS to callback functions in the current active IME app. Hooks in startInput will be invoked. The pre-IME Guard creates a token to IMMS, which contains a unique id to identify the current user app. Then IMMS requests the commit-proxy to bind the user app with two parameters (token, InputConnection). When the commit-proxy receives this bind request, it makes a SMC call to check whether the id in token is valid. If it is valid, the commit-proxy will add the new connection. Otherwise the bind request will be refused. If any sensitive string is found, the pre-IME Guard sends sensitive string to the commit-proxy by a shared memory, and then the commit-proxy will commit it with the above new IPC.
Address challenge 3: benefits retaining
To retain the extra benefits of IME apps (e.g., auto correcting and word association), one possible way is to implement the value added feature in IM-Visor. However, this way makes all IME apps useless and means a lot of extra coding works for IM-Visor (e.g., cloud-based word association and local trusted GUI lib). We look for a more efficient and elegant approach.
Our key idea here is to design a replay mechanism and let IME apps work for non-sensitive keystrokes. We designed a daemon thread named replay executor running in System Server process to replay touch events. If some touch events need to be replayed, the pre-IME Guard puts them in a shared memory and then the Replay Executor reads and replays them. We explain the detail as follows.
In Android system, every activity or service maintains a thread loop to receive touch events or other input events by an input channel. If the Replay Executor intends to replay a touch event directly, it needs to maintain the input channels and selects which activity or service will receive the touch event. The selection is based on not only touch coordinates but also window layouts and the current window focus.
The key point is that the Replay Executor only receives touch events from the pre-IME Guard and then triggers event subsystem to complete the “maintain&selection”. As mentioned in “Android IME, Input Method Framework (IMF) and event subsystem” section, an input dispatch thread in WindowManagerService is responsible for touch events dispatching. In most cases, the input dispatch thread sleeps on an input event queue. When a touch event needs to be dispatched, it wakes up and dequeues an event, then selects an activity or service for dispatching. If we can handle this input event queue and wake up the thread when a replay is needed, we are able to let the event subsystem do the “maintain&selection” work. This is exactly how the Replay Executor works. The context of WindowManagerService provides the event queue of input dispatch thread. The Replay Executor encapsulates non-sensitive keystrokes as required Android touch event format and enqueues them. Then it simply wakes up the dispatch thread to do the rest work for our replay. As a related issue, non-keystroke touch events also can be replayed by this way.
Minor challenge 4: buffer revisiting threat
As discussed in “Threat model” section, we discover a new data leakage path from a user app to an IME app by some revisit APIs. To prevent this threat, we hook all revisit APIs (see Listing 1) and analyze the revisited char string again to detect and block sensitive text when a third-party IME app revisits the buffer of a user app.
Implementation
We have implemented IM-Visor on Samsung 4412 development board equipped with ARM TrustZone. Android and kernel version on the board are 4.0 and 3.0.2 respectively.
Pre-IME guard and services. Specifically, the pre-IME Guard runs as a trustlet in secure world and Android runs in normal world. The commit-proxy is a system service in Android System Server process. The Replay Executor is a daemon thread running in System Server process. Both of them are passively waiting to receive data from the pre-IME Guard. When a user types in the STIE, the pre-IME Guard receives keystrokes from touch screen and translates them into a char string. Corresponding to its sensitiveness, we return it through green path or red path. (see Fig. 8).
Hooks in the IMF and event subsystem. To minimize system overhead, we have to hook as less as possible. Specifically, only three classes in Android has been hooked: InputMethodManagerService, BaseInputConnection and TouchInputMapper. In order to jump into secure world, a TrustZone driver is installed in Linux kernel. Hooks make SMC calls through the newly installed driver. When these hooks are invoked, the pre-IME Guard can intervene the data-flows in Android IMF. Listing 1 shows all the hooks we put in Android.
Secure touch screen and display controller reconfiguration. When a user intends to type in an IME soft keyboard, some reconfiguration should be done for the STIE initialization. We reconfigure Interrupt Security Register(ICDISR), Priority Mask Register (ICCPMR) and Enable Set Register (ICDISER) to make the touch input as a secure interrupt and mask all non-secure intterupt. In CPU Interface Control Register (ICCICR), FIQEn, EnableS are set to 1 to enable FIQ interrupt. FIQ bit in Secure Configuration Register (SCR) is also set to 1 to ensure FIQ interrupt routing to TrustZone monitor mode. Besides, touch screen and display controller are set to be secure peripherals with TZPC. As a proof-of-concept prototype, we only implement single-touch in the separate touch driver and leave multi-touch as a future work.