COMP 3000 Essay 2 2010 Question 8: Difference between revisions

From Soma-notes
Gautam (talk | contribs)
Tmalone (talk | contribs)
Line 108: Line 108:
=Critique=
=Critique=
What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.
What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.
This paper has quite a bit of information and has a very strong structure in explaining what TaintDroid is and what it does. The sections begin as a high-level overview of TaintDroid, then explains the history followed by an explanation of sources that are tracked by TaintDroid and its design. It continues with test results and the strengths and weaknesses of TaintDroid, with references to related work. A proposed structure to improve the readability of the paper might be to explain what the background information of the Android phone is first before explaining the overview process of the TaintDroid. That way, the concepts of the paper would be easier to understand.
Did a good job in explaining the challenges in monitoring network disclosure of privacy sensitive information and how TaintDroid is successful despite these challenges
<ol>
<li>Smartphones are resource constrained</li>
<li>Third-party applications are entrusted with several types of privacy sensitive information</li>
<li>Context-based privacy sensitive information is dynamic and can be difficult to identify even when sent in the clear</li>
<li>Applications can share information</li>
</ol>
TaintDroid uses dynamic taint analysis to find a way around these challenges, using a taint source as the targeted sensitive information, and a taint marking to identify the information type. This paper discusses the strengths and weaknesses of the TaintDroid very effectively. The TaintDroid only tracks data flows and does not track control flows to minimize overhead. There are also other overhead issues due to the Taint Tag Storage, which is well explained due to the fact that most string objects all have the same tag. For this reason, it is possible for false positives to occur. By the test results and statistics of smartphones using TaintDroid, it is very obvious that personal information is often misused and that TaintDroid effectively identifies at a high percentage, the occurrence of this misuse. A possible improvement for this paper is the prediction of future smartphone security measures. This paper does a great job in explaining the TaintDroid, as well as related work in the security of personal information. Where this paper lacks is what changes could be made, or what possible updates to current programs can be implemented to further improve the results of tracking misuse of information, and even prevent it from occurring.


=References=
=References=

Revision as of 18:12, 29 November 2010

Paper

TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones

Authors:

  • William Enck, The Pennsylvania State University
  • Peter Gilbert, Duke University
  • Byung-Gon Chun, Intel Labs
  • Landon P. Cox, Duke University
  • Jaeyeon Jung, Intel Labs
  • Patrick McDaniel, The Pennsylvania State University
  • Anmol N. Sheth, Intel Labs


Official Website: http://www.appanalysis.org/

Direct Link to Paper: http://appanalysis.org/tdroid10.pdf

Video demonstration of TaintDroid in action: http://www.youtube.com/watch?v=qnLujX1Dw4Y

Background Concepts

As a brief glance through the paper might suggest the paper has much to do with "Information Flow Tracking" and "Dynamic Taint Analysis".It is evident that to follow these ideas in this paper the ideas which form the basis of this theory have to be understood. All in all, the following two concepts can be said to be central to understanding this paper.

  • Information Flow
  • Taint Analysis

Information Flow
Information flow as the name suggests is essentially transfer of information. This transfer of information can be between two processes or withing a given process from let's say a variable x to a variable y. Information Flow Theory tries to quantify this flow of information into a mathematical model.
In a security model the Information Flow can be categorized into:
Explicit Flow
Explicit Flow is when information subject to 'security classifications' is transfered to a variable(or process) which is not subject to the same or higher level of 'security' causing a security breach. To put simply explicit flow is when 'secure' information is transferred so that it is publicly observable.
A Pseudo Code Example:
PRIVATE VAR secure
PUBLIC VAR notsecure
notsecure=secure

In the information in 'secure' which is PRIVATE is transferred to 'notsecure' which is PUBLIC which is an 'Information Leak'.
Implicit Flow
Implicit Flow is when the information subject to 'security classifications' is deduced indirectly. In this the leakage of information is through the program control flow. Depending on the flow of the program the secure information is compromised.
A Pseudo Code Example:
PRIVATE VAR secure
PUBLIC VAR notsecure
if secure="blah blah" then:
insecure=1
else:
insecure=0

We can deduce if information in 'secure' is "blah blah" by checking the value of 'insecure'. Information leakage due to implicit flows are much harder to detect and protect from.

For more on Information Flow and its Math Model refer "A Lattice Model of Secure Information Flow" [1]
Taint Analysis

The main idea behind taint analysis is that any variable that can be modified directly or indirectly by the user and can become a security vulnerability is "tainted". The basic idea is that through various operations the "taint" can be passed from variable to variable and when a tainted variable is used to execute dangerous commands a security breach may occur. The basic premise of taint analysis is to identify these "tainted" variables and ensure that they do not create a security breach.
Taint Analysis done at run-time is called as Dynamic Taint Analysis. The approach used in dynamic taint analysis is to label the data originating from untrusted sources as tainted. The analysis keeps track of all the tainted data in the memory and when such data is used in a dangerous situation, a possible bug is detected. This approach offers the capabilities to detect most of the input validation vulnerabilities with a very low false positive rate. However there are some disadvantages when using dynamic taint analysis. The execution of the program is slower because of the necessary additional checks.
Static taint analysis is the technique used for detecting the overapproximation of the set of instructions that are influenced by user input. This set of tainted instructions is computed statically only by analyzing the sources of the program. The main advantage for static taint analysis is that it takes into account all the possible execution paths of the program. On the other hand the analysis may not be so accurate as the one performed dynamically because the static analyzer does not have access to the additional runtime information of the program.

A Math Model:
There are two types of variables

  • Tainted
  • Untainted

Thus V = {T,U} , T denotes "Tainted", U denotes "Untainted"
Binary Operator : V x V -> V
xy=T, if x = T OR y = T
xy=U, if x = y = U

It is now intuitively easy to see that whenever a tainted variable is used it makes that variable tainted and thus the taint is propagated. Taking this further we can see that, if needed, we can tag variables as tainted by attaching to them a tainted tag, lets say "T'" which can then be tracked or used as wanted.

Note: The paper talks about Dynamic Taint Analysis. TaintDroid makes ingenious use of "taint" to taint variables that are of value and tracks their progress. Though in the actual context of Taint Analysis "taint" is used for untrusted information however in this case the "taint" variables are infact important private data. (Just in case if it confused someone :D)
For more detailed information on Taint Analysis refer "Detecting Software Vulnerabilities Static Taint Analysis"[2]

Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.

  • Background on the taint data tracking method, how it has been used in other systems (i.e. not phones)
  • A reader's digest version of any new articles about this kind of security vulnerability on phones, on apps that collect more personal data than users would expect.

Research problem

In today’s society, smartphones are the new big thing. Smartphones, by their nature, are linked into many private details of our lives, including not only classic data like our contact list, but new kinds of data unique to smartphones, such as location data. Except for the odd tunnel or elevator, these phones are constantly connected to the internet. Smartphones also have the ability to download and run third party applications; indeed, this is why we call them "smart". When you combine third party applications with an internet connection, you suddenly find yourself unsure of how your data is being used, that is, what is to stop a third party application from disseminating our private information? As it turns out, very little.

A telling example of this is a wallpaper application that sends your phone number back to the developer. Once the app is running on your phone, it can typically access any of the information on your phone, and it is not necessarily clear when it has done so, or what it is doing with it.

The authors of this paper set out to try to understand what kind of information is being collected and where that information is being sent, and in order to do that, they first needed to build a means of tracking that information.

The strategy they chose is called Dynamic Taint Analysis, sometimes called Taint Tracking. The basic idea being to mark (taint) sensitive information at its source, and to then follow that mark as it moves through a system. In the context of this paper, if ever we should see marked data leave the network interface of the phone, then we know that some sensitive information has been disseminated.

There are many difficulties associated with implementing such a system on a smartphone. Their design goals were to create a light-weight, minimal overhead, real-time tracking system that runs directly on a real phone, with real applications. To be really useful, the tracking system must not impact the user experience too heavily.

Some of the difficulties include

  • Smart phones are resource constrained. Processing power and memory are limited, and any processing that we do perform will consume battery power. If the tracking system is to be real-time, and for the phone to be considered "usable" by the end user, the system must be truly light weight.
  • Third party applications arrive in a compiled format; we cannot analyze their source code.
  • Applications may do complex things with the sensitive data. It is unlikely that the application will simply read a location from the GPS and dump it straight out over the network. More likely is that the application will use that data in someway, or combine it with other data, before it is sent. We need to be able to track sensitive data throughout this entire process if we hope to perform any useful analysis.
  • Applications can share information with other applications, meaning that our tracking has to work across multiple processes.
  • The tracking must operate on a real phone, not a simulated one. With a simulated system, where we control the virtual hardware and memory, we can be certain that we can see everything that an application might do. On a real device, how can we get "low enough" to see everything the applications do?


How does this problem relate to past related work?

Contribution

The contributions of the TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones paper is not that they achieved Information flow tracking but that they achieved it efficient enough to run in real time on real constrained hardware devices with minimal overheads. As stated "TaintDroid only incurs an approximate 14% CPU overhead and an approximate 4.4% memory overhead for simultaneously tracking 32 taint markings per data unit." It should also be noted that the 14% CPU over-head is only in regards to a "CPU-bound micro-benchmark and imposes negligible overhead on interactive third-party applications."

This is achieved by modifying the code directly at the VM layer of the Android system to provide variable-level tracking. This allows direct control over how and what private information, such as location details from the GPS, are stored and accessed. Next they modify the JNI layer to provide message-level tracking which allows them to monitor inter-process a.k.a. inter-application communications. This also allows them to "patch the taint propagation on return." so they can keep track of information transfer via native code. Finally modifying the network interface and secondary storage interfaces they are able to provide file-level taint tracking which enables them to ensure "persistent information conservatively retains its taint markings."

By combining these three levels (variable, message and file) of taint tracking, TaintDroid was able to effectively track 30 randomly selected popular 3rd party android applications. In doing so it correctly flagged 105 instances of tainted information transmission. Of these 105, only 35 were legitimate transfers. It also determined that 50% off the applications submitted the users location to advertising servers and 5 of the applications transmitted the users device ID, phone number and SIM card serial number.

The other contribution of TaintDorid is accuracy of tracking sensitive data. Unlike existing solutions that rely on heavy-weight whole-system emulation, the virtualized architecture of Android integrated four granularities of taint propagation:variable-level, method-level, message-level, and file-level. There are many factors influence the performance and accuracy of TaintDoroid. Taint tracking granularity and flow semantics are two of them. In the variable-level, TaintDorid treats the values at level. And the variables provide flow semantics for taint progagation to distinguish the different data pointers at different level to ensure the accuracy. Existing taint tracking approaches,like Panorama Taint System,rely on instruction-level dynamic taint analysis using whole system emulation. This method leads to 2-20 times slowdown of system. Actually it's not suitable for the trend of realtime analysis. Moreover,instruction-level tracking faces a serious problem,taint explosion. When we use some complex instructions such as CMPXCHG, REP MOV,the stack pointer may become falsely tainted or taint loss. However,TaintDorid solved this problem with the combination of 4 levels.

From these outstanding numbers you can see that more effective higher granular permission systems are needed and TaintDroid is providing a step in the right direction, by providing a highly efficient real time tracking system.

Critique

What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.

This paper has quite a bit of information and has a very strong structure in explaining what TaintDroid is and what it does. The sections begin as a high-level overview of TaintDroid, then explains the history followed by an explanation of sources that are tracked by TaintDroid and its design. It continues with test results and the strengths and weaknesses of TaintDroid, with references to related work. A proposed structure to improve the readability of the paper might be to explain what the background information of the Android phone is first before explaining the overview process of the TaintDroid. That way, the concepts of the paper would be easier to understand.

Did a good job in explaining the challenges in monitoring network disclosure of privacy sensitive information and how TaintDroid is successful despite these challenges

  1. Smartphones are resource constrained
  2. Third-party applications are entrusted with several types of privacy sensitive information
  3. Context-based privacy sensitive information is dynamic and can be difficult to identify even when sent in the clear
  4. Applications can share information

TaintDroid uses dynamic taint analysis to find a way around these challenges, using a taint source as the targeted sensitive information, and a taint marking to identify the information type. This paper discusses the strengths and weaknesses of the TaintDroid very effectively. The TaintDroid only tracks data flows and does not track control flows to minimize overhead. There are also other overhead issues due to the Taint Tag Storage, which is well explained due to the fact that most string objects all have the same tag. For this reason, it is possible for false positives to occur. By the test results and statistics of smartphones using TaintDroid, it is very obvious that personal information is often misused and that TaintDroid effectively identifies at a high percentage, the occurrence of this misuse. A possible improvement for this paper is the prediction of future smartphone security measures. This paper does a great job in explaining the TaintDroid, as well as related work in the security of personal information. Where this paper lacks is what changes could be made, or what possible updates to current programs can be implemented to further improve the results of tracking misuse of information, and even prevent it from occurring.

References

[1] DENNING, D. E. A Lattice Model of Secure Information Flow. Communications of the ACM 19, 5 (May 1976), 236–243.
[2]D CEARA, ML POTET et.al Detecting Software Vulnerabilities Static Taint Analysis GINP ENSIMAG GoogleCode(2009)
[3] NEWSOME,J.,AND SONG,D. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Proceedings of the Network and Distributed System Security Symposium (NDSS 2005)