Difference between revisions of "Talk:COMP 3000 Essay 2 2010 Question 8"

From Soma-notes
Jump to navigation Jump to search
(my notes from the video; please add!)
Line 8: Line 8:


Gautam Akiwate - gakiwate@connect.carleton.ca
Gautam Akiwate - gakiwate@connect.carleton.ca
== Some Notes from the Video ==
Tracking of privacy sensitive data through Dynamic Taint Analysis (aka. Taint Tracking).  The trick is to mark private data as it sourced, and then follow those marks until (unless) they leave the phone.
Android phones run Java apps, which are compiled into DEX, and then run on top of the Dalvik VM.  It is this VM that we modify so that we can support the storage and tracking of taint tags.
Taint sources
* low -bandwidth sensors
** Location
** Accelerometer
* High-bandwidth sensors
** Mic
** Camera
* Information DB
** Address book
** SMS storage
* Device ID
** IMEI
** IMSI  (don't actually track this one because of false positives)
** ICC_ID
** Phone Number
Taint sink  (where marked data can leave the phone)
* Network Taint Sink
Taint propagation
* ???
Taint tags are stored in memory interleaved with the variables they are tracking
Some standard Data Flow technique is used to propagate these tags, especially as one variable that is marked may be assigned to another, so now that variable needs to be tracked as well.
Tracks explicit flows of data, not implicit
To fully capture implicit flows, you need to do static analysis, which is hard with closed-source apps, and cannot be done real-time
Implicit flows are not tracked
* Implicit flows can involve "taint-scope", tracking based on conditionals in code
=== Performance ===
The goal is to create a real time tracking system, so the TaintDroid's performance impact is of some importance
14% CPU overhead
4.4% memory overhead
Macro benchmarks  (to get a feel for what the phone's usability is like with TD running)
* App load:  3%  (2ms)
=== Findings ===
20 out of 30 tested applications share data in a way that is not expected.
67 of 105 flagged pieces of data leaving the device had no obviously legitimate purpose (verified by the authors).
Many apps sent location data and other unique identifiers to advertising servers.
Most apps do not mention anything to the user.
=== Limitations ===
Tracks only explicit data flows.
An application *could* launder the tags off of the data, if they really wanted to hide this sort of thing from TaintDroid.
There are methods that could be used to protect against this, but they go against the goal of a light-weight, real-time tracking system.  TD is not necessarily about catching truly malicious programs, but rather just those that leak information.
Why do apps take this information?
* Lazy;  in the demo video, the wallpaper app seems to use the IMEI just as a ready made unique ID
* Overzealous;  the developer might thing they *need* the data for something, but actually
* Ads;  advertises do seem a little presumptuous in their data collection
* Spying;  bosses or spouses
* Malicious; 
=== QA Period ===
Q:  how do we prevent a malicious app from removing a taint attribute on a file
A:  TD operates a too low a level for this to be a problem;  TD assumes that the native code is trusted
Q:  It seems like you had a lot of false positives
A:  The point of this tool was to identify privacy sensitive information as having left the phone, not whether or not a privacy violation has taken place.
Q: Now that TD is released; couldn't malicious apps use some of the methods described in the paper to get around it?   
A: Well, yes, but it is not just about maliciousness, it could just laziness or over-zealous ad stuff.

Revision as of 00:32, 20 November 2010

Group Members

Trevor Bonesaw Malone - tmalone@connect.carleton.ca //FIRST POST!

Qi Zhang - qzhang13@connect.carleton.ca

Gregory Bint - gbint@connect.carleton.ca

Gautam Akiwate - gakiwate@connect.carleton.ca


Some Notes from the Video

Tracking of privacy sensitive data through Dynamic Taint Analysis (aka. Taint Tracking). The trick is to mark private data as it sourced, and then follow those marks until (unless) they leave the phone.

Android phones run Java apps, which are compiled into DEX, and then run on top of the Dalvik VM. It is this VM that we modify so that we can support the storage and tracking of taint tags.

Taint sources

  • low -bandwidth sensors
    • Location
    • Accelerometer
  • High-bandwidth sensors
    • Mic
    • Camera
  • Information DB
    • Address book
    • SMS storage
  • Device ID
    • IMEI
    • IMSI (don't actually track this one because of false positives)
    • ICC_ID
    • Phone Number

Taint sink (where marked data can leave the phone)

  • Network Taint Sink

Taint propagation

  • ???

Taint tags are stored in memory interleaved with the variables they are tracking

Some standard Data Flow technique is used to propagate these tags, especially as one variable that is marked may be assigned to another, so now that variable needs to be tracked as well.

Tracks explicit flows of data, not implicit To fully capture implicit flows, you need to do static analysis, which is hard with closed-source apps, and cannot be done real-time

Implicit flows are not tracked

  • Implicit flows can involve "taint-scope", tracking based on conditionals in code


Performance

The goal is to create a real time tracking system, so the TaintDroid's performance impact is of some importance

14% CPU overhead 4.4% memory overhead

Macro benchmarks (to get a feel for what the phone's usability is like with TD running)

  • App load: 3% (2ms)


Findings

20 out of 30 tested applications share data in a way that is not expected.

67 of 105 flagged pieces of data leaving the device had no obviously legitimate purpose (verified by the authors).

Many apps sent location data and other unique identifiers to advertising servers.

Most apps do not mention anything to the user.


Limitations

Tracks only explicit data flows.

An application *could* launder the tags off of the data, if they really wanted to hide this sort of thing from TaintDroid.

There are methods that could be used to protect against this, but they go against the goal of a light-weight, real-time tracking system. TD is not necessarily about catching truly malicious programs, but rather just those that leak information.


Why do apps take this information?

  • Lazy; in the demo video, the wallpaper app seems to use the IMEI just as a ready made unique ID
  • Overzealous; the developer might thing they *need* the data for something, but actually
  • Ads; advertises do seem a little presumptuous in their data collection
  • Spying; bosses or spouses
  • Malicious;


QA Period

Q: how do we prevent a malicious app from removing a taint attribute on a file A: TD operates a too low a level for this to be a problem; TD assumes that the native code is trusted


Q: It seems like you had a lot of false positives A: The point of this tool was to identify privacy sensitive information as having left the phone, not whether or not a privacy violation has taken place.


Q: Now that TD is released; couldn't malicious apps use some of the methods described in the paper to get around it? A: Well, yes, but it is not just about maliciousness, it could just laziness or over-zealous ad stuff.