Talk:COMP 3000 Essay 2 2010 Question 8

From Soma-notes

Group Members

Trevor Bonesaw Malone - tmalone@connect.carleton.ca //FIRST POST!

Qi Zhang - qzhang13@connect.carleton.ca

Gregory Bint - gbint@connect.carleton.ca

Gautam Akiwate - gakiwate@connect.carleton.ca

Relevant Sources

Seems to be THE Dynamic Taint Analysis Paper.Talks about implementation on TaintCheck. Could be also useful for critique section -[Gautam]

Work Plan

As Trevor intimated, we should have clear division of work going forward. This is sort of the break down as I see it. Please edit as you think of new ideas!

  • Background Concepts
    • Information Flow Theory. (Implicit and Explicit Flows.)
    • What is dynamic taint analysis
    • What is the difference between dynamic and static analysis
  • Research Problem
    • How do we build a DTA engine for a phone?
    • Why do we want to? (information misuse)
  • Contribution
    • How did they implement their DTA engine
    • What did they find about information misuse
  • Critique
  • References
    • The article has 61 references! We can probably use some of them

List of information we need to find external sources for:

  • History of taint analysis
  • History of privacy research relating to smart phones

Work In Progress

Log what you are working on *right now* so that other people don't try to do the same thing. Make sure to clear your name from here when you are done.

  • Gregory Bint: Research Problem
    • If someone wants to find some sources for some of the questions I'm asking, that would be helpful!
  • Gautam Akiwate: Background Concepts
    • Any resources on Dynamic taint Analysis would be appreciated!
  • Qi Zhang: Contributions

Some Notes from the Video

Tracking of privacy sensitive data through Dynamic Taint Analysis (aka. Taint Tracking). The trick is to mark private data as it sourced, and then follow those marks until (unless) they leave the phone.

Android phones run Java apps, which are compiled into DEX, and then run on top of the Dalvik VM. It is this VM that we modify so that we can support the storage and tracking of taint tags.

Taint sources

  • low -bandwidth sensors
    • Location
    • Accelerometer
  • High-bandwidth sensors
    • Mic
    • Camera
  • Information DB
    • Address book
    • SMS storage
  • Device ID
    • IMEI
    • IMSI (don't actually track this one because of false positives)
    • ICC_ID
    • Phone Number

Taint sink (where marked data can leave the phone)

  • Network Taint Sink

Taint propagation

  • ???

Taint tags are stored in memory interleaved with the variables they are tracking

Some standard Data Flow technique is used to propagate these tags, especially as one variable that is marked may be assigned to another, so now that variable needs to be tracked as well.

Tracks explicit flows of data, not implicit To fully capture implicit flows, you need to do static analysis, which is hard with closed-source apps, and cannot be done real-time

Implicit flows are not tracked

  • Implicit flows can involve "taint-scope", tracking based on conditionals in code


Performance

The goal is to create a real time tracking system, so the TaintDroid's performance impact is of some importance

14% CPU overhead 4.4% memory overhead

Macro benchmarks (to get a feel for what the phone's usability is like with TD running)

  • App load: 3% (2ms)


Findings

20 out of 30 tested applications share data in a way that is not expected.

67 of 105 flagged pieces of data leaving the device had no obviously legitimate purpose (verified by the authors).

Many apps sent location data and other unique identifiers to advertising servers.

Most apps do not mention anything to the user.


Limitations

Tracks only explicit data flows.

An application *could* launder the tags off of the data, if they really wanted to hide this sort of thing from TaintDroid.

There are methods that could be used to protect against this, but they go against the goal of a light-weight, real-time tracking system. TD is not necessarily about catching truly malicious programs, but rather just those that leak information.


Why do apps take this information?

  • Lazy; in the demo video, the wallpaper app seems to use the IMEI just as a ready made unique ID
  • Overzealous; the developer might thing they *need* the data for something, but actually
  • Ads; advertises do seem a little presumptuous in their data collection
  • Spying; bosses or spouses
  • Malicious;


QA Period

Q: how do we prevent a malicious app from removing a taint attribute on a file

A: TD operates a too low a level for this to be a problem; TD assumes that the native code is trusted


Q: It seems like you had a lot of false positives

A: The point of this tool was to identify privacy sensitive information as having left the phone, not whether or not a privacy violation has taken place.


Q: Now that TD is released; couldn't malicious apps use some of the methods described in the paper to get around it?

A: Well, yes, but it is not just about maliciousness, it could just laziness or over-zealous ad stuff.

Other Information

Hey guys, thought I would just post a generalized paragraph about our essay.

In today’s society, Smartphones are the new big thing. To me that’s what makes this paper so interesting. This paper focuses on private information in android phones and the misuse of this information. The misuse of information includes the SIM card, the ID of the device, or the phone number. TaintDroid is used on smart phones with an efficient taint tracking and analysis system. It has the ability to track sensitive data from multiple sources and examines the misuse of such data. In their study, out of 80 popular third-party applications, TaintDroid monitored that 68 applications had potential misuse of user’s private data. This tool is great for knowing with applications are safe and which are not, so your private data can remained private.

Also, we should really think of splitting up the work in some way. If some people have specific sections they would like to do lets figure that out now so we can divide the workload and get it done over the next couple of days. I don't personally care what part I'm going to have to do, so lets get this going. Any other information people wanna post feel free the more the better, even if we don't end up using it.

Trevor Malone