Paper

TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones

Authors:

William Enck, The Pennsylvania State University
Peter Gilbert, Duke University
Byung-Gon Chun, Intel Labs
Landon P. Cox, Duke University
Jaeyeon Jung, Intel Labs
Patrick McDaniel, The Pennsylvania State University
Anmol N. Sheth, Intel Labs

Official Website: http://www.appanalysis.org/

Direct Link to Paper: http://appanalysis.org/tdroid10.pdf

Video demonstration of TaintDroid in action: http://www.youtube.com/watch?v=qnLujX1Dw4Y

Background Concepts

There are two concepts that are central to this paper.
Information Flow
Information flow as the name suggests is essentially transfer of information. This transfer of information can be between two processes or withing a given process from let's say a variable x to a variable y. Information Flow Theory tries to quantify this flow of information into a mathematical model.
In a security model the Information Flow can be categorized into:
Explicit Flow
Explicit Flow is when information subject to 'security classifications' is transfered to a variable(or process) which is not subject to the same or higher level of 'security' causing a security breach. To put simply explicit flow is when 'secure' information is transferred so that it is publicly observable.
A Pseudo Code Example:
PRIVATE VAR secure PUBLIC VAR notsecure notsecure=secure In the information in 'secure' which is PRIVATE is transferred to 'notsecure' which is PUBLIC which is an 'Information Leak'.
Implicit Flow
Implicit Flow is when the information subject to 'security classifications' is deduced indirectly. In this the leakage of information is through the program control flow. Depending on the flow of the program the secure information is compromised.
A Pseudo Code Example:
PRIVATE VAR secure PUBLIC VAR notsecure if secure="blah blah" then: insecure=1 else: insecure=0 We can deduce if information in 'secure' is "blah blah" by checking the value of 'insecure'. Information leakage due to implicit flows are much harder to detect and protect from.

For more on Information Flow and its Math Model refer "A Lattice Model of Secure Information Flow" [1]

Taint Analysis

Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.

Background on Information Flow Theory. Explicit and Implicit Flow.
Background on the taint data tracking method, how it has been used in other systems (i.e. not phones)
A reader's digest version of any new articles about this kind of security vulnerability on phones, on apps that collect more personal data than users would expect.

Research problem

In today’s society, smartphones are the new big thing. Smartphones, by their nature, are linked into many private details of our lives, including not only classic data like our contact list, but new kinds of data unique to smartphones, such as location data. Except for the odd tunnel or elevator, these phones are constantly connected to the internet. Smartphones also have the ability to download and run third party applications; indeed, this is why we call them "smart". When you combine third party applications with an internet connection, you suddenly find yourself unsure of how your data is being used, that is, what is to stop a third party application from disseminating our private information? As it turns out, very little.

A telling example of this is a wallpaper application that sends your phone number back to the developer. Once the app is running on your phone, it can typically access any of the information on your phone, and it is not necessarily clear when it has done so, or what it is doing with it.

The authors of this paper set out to try to understand what kind of information is being collected and where that information is being sent, and in order to do that, they first needed to build a means of tracking that information.

The strategy they chose is called Dynamic Taint Analysis, sometimes called Taint Tracking. The basic idea being to mark (taint) sensitive information at its source, and to then follow that mark as it moves through a system. In the context of this paper, if ever we should see marked data leave the network interface of the phone, then we know that some sensitive information has been disseminated.

There are many difficulties associated with implementing such a system on a smartphone. Their design goals were to create a light-weight, minimal overhead, real-time tracking system that runs directly on a real phone, with real applications. To be really useful, the tracking system must not impact the user experience too heavily.

Some of the difficulties include

Smart phones are resource constrained. Processing power and memory are limited, and any processing that we do perform will consume battery power. If the tracking system is to be real-time, and for the phone to be considered "usable" by the end user, the system must be truly light weight.
Third party applications arrive in a compiled format; we cannot analyze their source code.
Applications may do complex things with the sensitive data. It is unlikely that the application will simply read a location from the GPS and dump it straight out over the network. More likely is that the application will use that data in someway, or combine it with other data, before it is sent. We need to be able to track sensitive data throughout this entire process if we hope to perform any useful analysis.
Applications can share information with other applications, meaning that our tracking has to work across multiple processes.
The tracking must operate on a real phone, not a simulated one. With a simulated system, where we control the virtual hardware and memory, we can be certain that we can see everything that an application might do. On a real device, how can we get "low enough" to see everything the applications do?

How does this problem relate to past related work?

Contribution

What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Critique

What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.

References

[1] DENNING, D. E. A Lattice Model of Secure Information Flow. Communications of the ACM 19, 5 (May 1976), 236–243.