Talk:COMP 3000 Essay 2 2010 Question 8: Difference between revisions

From Soma-notes
Gbint (talk | contribs)
m →‎QA Period: formatting
Tmalone (talk | contribs)
Line 103: Line 103:


A: Well, yes, but it is not just about maliciousness, it could just laziness or over-zealous ad stuff.
A: Well, yes, but it is not just about maliciousness, it could just laziness or over-zealous ad stuff.
===Other Information===
Hey guys, thought I would just post a generalized paragraph about our essay.
In today’s society, Smartphones are the new big thing. To me that’s what makes this paper so interesting. This paper focuses on private information in android phones and the misuse of this information. The misuse of information includes the SIM card, the ID of the device, or the phone number. TaintDroid is used on smart phones with an efficient taint tracking and analysis system. It has the ability to track sensitive data from multiple sources and examines the misuse of such data. In their study, out of 80 popular third-party applications, TaintDroid monitored that 68 applications had potential misuse of user’s private data. This tool is great for knowing with applications are safe and which are not, so your private data can remained private.
Also, we should really think of splitting up the work in some way. If some people have specific sections they would like to do lets figure that out now so we can divide the workload and get it done over the next couple of days. I don't personally care what part I'm going to have to do, so lets get this going. Any other information people wanna post feel free the more the better, even if we don't end up using it.

Revision as of 15:33, 22 November 2010

Group Members

Trevor Bonesaw Malone - tmalone@connect.carleton.ca //FIRST POST!

Qi Zhang - qzhang13@connect.carleton.ca

Gregory Bint - gbint@connect.carleton.ca

Gautam Akiwate - gakiwate@connect.carleton.ca


Some Notes from the Video

Tracking of privacy sensitive data through Dynamic Taint Analysis (aka. Taint Tracking). The trick is to mark private data as it sourced, and then follow those marks until (unless) they leave the phone.

Android phones run Java apps, which are compiled into DEX, and then run on top of the Dalvik VM. It is this VM that we modify so that we can support the storage and tracking of taint tags.

Taint sources

  • low -bandwidth sensors
    • Location
    • Accelerometer
  • High-bandwidth sensors
    • Mic
    • Camera
  • Information DB
    • Address book
    • SMS storage
  • Device ID
    • IMEI
    • IMSI (don't actually track this one because of false positives)
    • ICC_ID
    • Phone Number

Taint sink (where marked data can leave the phone)

  • Network Taint Sink

Taint propagation

  • ???

Taint tags are stored in memory interleaved with the variables they are tracking

Some standard Data Flow technique is used to propagate these tags, especially as one variable that is marked may be assigned to another, so now that variable needs to be tracked as well.

Tracks explicit flows of data, not implicit To fully capture implicit flows, you need to do static analysis, which is hard with closed-source apps, and cannot be done real-time

Implicit flows are not tracked

  • Implicit flows can involve "taint-scope", tracking based on conditionals in code


Performance

The goal is to create a real time tracking system, so the TaintDroid's performance impact is of some importance

14% CPU overhead 4.4% memory overhead

Macro benchmarks (to get a feel for what the phone's usability is like with TD running)

  • App load: 3% (2ms)


Findings

20 out of 30 tested applications share data in a way that is not expected.

67 of 105 flagged pieces of data leaving the device had no obviously legitimate purpose (verified by the authors).

Many apps sent location data and other unique identifiers to advertising servers.

Most apps do not mention anything to the user.


Limitations

Tracks only explicit data flows.

An application *could* launder the tags off of the data, if they really wanted to hide this sort of thing from TaintDroid.

There are methods that could be used to protect against this, but they go against the goal of a light-weight, real-time tracking system. TD is not necessarily about catching truly malicious programs, but rather just those that leak information.


Why do apps take this information?

  • Lazy; in the demo video, the wallpaper app seems to use the IMEI just as a ready made unique ID
  • Overzealous; the developer might thing they *need* the data for something, but actually
  • Ads; advertises do seem a little presumptuous in their data collection
  • Spying; bosses or spouses
  • Malicious;


QA Period

Q: how do we prevent a malicious app from removing a taint attribute on a file

A: TD operates a too low a level for this to be a problem; TD assumes that the native code is trusted


Q: It seems like you had a lot of false positives

A: The point of this tool was to identify privacy sensitive information as having left the phone, not whether or not a privacy violation has taken place.


Q: Now that TD is released; couldn't malicious apps use some of the methods described in the paper to get around it?

A: Well, yes, but it is not just about maliciousness, it could just laziness or over-zealous ad stuff.

Other Information

Hey guys, thought I would just post a generalized paragraph about our essay.

In today’s society, Smartphones are the new big thing. To me that’s what makes this paper so interesting. This paper focuses on private information in android phones and the misuse of this information. The misuse of information includes the SIM card, the ID of the device, or the phone number. TaintDroid is used on smart phones with an efficient taint tracking and analysis system. It has the ability to track sensitive data from multiple sources and examines the misuse of such data. In their study, out of 80 popular third-party applications, TaintDroid monitored that 68 applications had potential misuse of user’s private data. This tool is great for knowing with applications are safe and which are not, so your private data can remained private.

Also, we should really think of splitting up the work in some way. If some people have specific sections they would like to do lets figure that out now so we can divide the workload and get it done over the next couple of days. I don't personally care what part I'm going to have to do, so lets get this going. Any other information people wanna post feel free the more the better, even if we don't end up using it.