Talk:COMP 3000 Essay 2 2010 Question 8: Difference between revisions

From Soma-notes
Gautam (talk | contribs)
Sliske (talk | contribs)
 
(35 intermediate revisions by 6 users not shown)
Line 9: Line 9:
Gautam Akiwate - gakiwate@connect.carleton.ca
Gautam Akiwate - gakiwate@connect.carleton.ca


==Relevant Sources==
Corey Ling - cling@connect.carleton.ca
*NEWSOME,J.,AND SONG,D.Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software.      [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.2141&rep=rep1&type=pdf Dynamic Taint Analysis for Automatic Detection]
 
<u>Seems to be THE Dynamic Taint Analysis Paper.Talks about implementation on TaintCheck. Could be also useful for critique section</u> -[Gautam]
Sarah Liske
 
 


== Work Plan ==
== Work Plan ==
Line 18: Line 20:


* Background Concepts
* Background Concepts
** Information Flow Theory. (Implicit and Explicit Flows.)
** Information Flow Theory. (Implicit and Explicit Flows.) --Done[--[[User:Gautam|Gautam]] 03:54, 28 November 2010 (UTC)]
** What is dynamic taint analysis
** What is dynamic taint analysis --Done[--[[User:Gautam|Gautam]] 05:07, 28 November 2010 (UTC)]
** What is the difference between dynamic and static analysis
** What is the difference between dynamic and static analysis --Done[--[[User:Gautam|Gautam]] 03:54, 30 November 2010 (UTC)]]
* Research Problem
* Research Problem
** How do we build a DTA engine for a phone?
** How do we build a DTA engine for a phone? - done, but by who?
** Why do we want to?  (information misuse)
** Why do we want to?  (information misuse) - done, but by who?
* Contribution
* Contribution
** How did they implement their DTA engine
** How did they implement their DTA engine (Done: --[[User:Cling|Cling]] 04:50, 26 November 2010 (UTC))
** What did they find about information misuse
** What did they find about information misuse (Done: --[[User:Cling|Cling]] 04:50, 26 November 2010 (UTC))
** Compared to the existing taint tracking approaches. [[User:Zhangqi|Zhangqi]] 07:11, 27 November 2010 (UTC)
** (What else should be in the contributions? Anything need fleshing out?) (Working on that now :) ) sliske
* Critique
* Critique
**Added two paragraphs at the end of the present critique. Please incorporate it into your content as you deem fit.--[[User:Gautam|Gautam]] 09:07, 30 November 2010 (UTC)
**^ done. fleshed out critique, and added a bit about how taintdroid doesn't track implicit flow. Also reworded (the entire essay) for clarity where necessary/checked spelling. It would be a good idea for everyone to read it over once for spelling/clarity before thursday, just in case something doesn't make sense - sliske
* References
* References
** The article has 61 references!  We can probably use some of them
** The article has 61 references!  We can probably use some of them
**whee! reading papers and sticking in information as need be.
**references added and citations -taken care of- were removed/reworked, as it says in the assignment guidelines they're not allowed. will go over fill in a few places where information may be lacking after class sliske
**Referencing is a little askew. The numbers don't match the papers as listed in the referencing. Also the papers are usually cited with a number and enclosed in "[]"
**thanks for giving the paper a read over/noticing that :)


List of information we need to find external sources for:
List of information we need to find external sources for:
Line 40: Line 50:


* Gregory Bint:  Research Problem
* Gregory Bint:  Research Problem
** If someone wants to find some sources for some of the questions I'm asking, that would be helpful!


* Gautam Akiwate:  Background Concepts
* Gautam Akiwate:  Background Concepts
** Any resources on Dynamic taint Analysis would be appreciated!
** Any resources on Dynamic taint Analysis would be appreciated!
* Qi Zhang, Corey Ling: Contributions
* Trevor Malone: Critique
* Sarah Liske: References and Questions, Clarity/Spelling.


== Some Notes from the Video ==
== Some Notes from the Video ==
Line 148: Line 163:


[[user:Tmalone|Trevor Malone]]
[[user:Tmalone|Trevor Malone]]
Hey guys! Anything else we need to get done? Let me know and I can help in anyway possible.
[[user:Tmalone|Trevor Malone]]
==Relevant Sources==
*NEWSOME,J.,AND SONG,D.Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software.      [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.2141&rep=rep1&type=pdf Dynamic Taint Analysis for Automatic Detection]
<u>Seems to be THE Dynamic Taint Analysis Paper.Talks about implementation on TaintCheck. Could be also useful for critique section</u> -[Gautam]

Latest revision as of 06:09, 2 December 2010

Group Members

Trevor Bonesaw Malone - tmalone@connect.carleton.ca //FIRST POST!

Qi Zhang - qzhang13@connect.carleton.ca

Gregory Bint - gbint@connect.carleton.ca

Gautam Akiwate - gakiwate@connect.carleton.ca

Corey Ling - cling@connect.carleton.ca

Sarah Liske


Work Plan

As Trevor intimated, we should have clear division of work going forward. This is sort of the break down as I see it. Please edit as you think of new ideas!

  • Background Concepts
    • Information Flow Theory. (Implicit and Explicit Flows.) --Done[--Gautam 03:54, 28 November 2010 (UTC)]
    • What is dynamic taint analysis --Done[--Gautam 05:07, 28 November 2010 (UTC)]
    • What is the difference between dynamic and static analysis --Done[--Gautam 03:54, 30 November 2010 (UTC)]]
  • Research Problem
    • How do we build a DTA engine for a phone? - done, but by who?
    • Why do we want to? (information misuse) - done, but by who?
  • Contribution
    • How did they implement their DTA engine (Done: --Cling 04:50, 26 November 2010 (UTC))
    • What did they find about information misuse (Done: --Cling 04:50, 26 November 2010 (UTC))
    • Compared to the existing taint tracking approaches. Zhangqi 07:11, 27 November 2010 (UTC)
    • (What else should be in the contributions? Anything need fleshing out?) (Working on that now :) ) sliske
  • Critique
    • Added two paragraphs at the end of the present critique. Please incorporate it into your content as you deem fit.--Gautam 09:07, 30 November 2010 (UTC)
    • ^ done. fleshed out critique, and added a bit about how taintdroid doesn't track implicit flow. Also reworded (the entire essay) for clarity where necessary/checked spelling. It would be a good idea for everyone to read it over once for spelling/clarity before thursday, just in case something doesn't make sense - sliske
  • References
    • The article has 61 references! We can probably use some of them
    • whee! reading papers and sticking in information as need be.
    • references added and citations -taken care of- were removed/reworked, as it says in the assignment guidelines they're not allowed. will go over fill in a few places where information may be lacking after class sliske
    • Referencing is a little askew. The numbers don't match the papers as listed in the referencing. Also the papers are usually cited with a number and enclosed in "[]"
    • thanks for giving the paper a read over/noticing that :)

List of information we need to find external sources for:

  • History of taint analysis
  • History of privacy research relating to smart phones

Work In Progress

Log what you are working on *right now* so that other people don't try to do the same thing. Make sure to clear your name from here when you are done.

  • Gregory Bint: Research Problem
  • Gautam Akiwate: Background Concepts
    • Any resources on Dynamic taint Analysis would be appreciated!
  • Qi Zhang, Corey Ling: Contributions
  • Trevor Malone: Critique
  • Sarah Liske: References and Questions, Clarity/Spelling.

Some Notes from the Video

Tracking of privacy sensitive data through Dynamic Taint Analysis (aka. Taint Tracking). The trick is to mark private data as it sourced, and then follow those marks until (unless) they leave the phone.

Android phones run Java apps, which are compiled into DEX, and then run on top of the Dalvik VM. It is this VM that we modify so that we can support the storage and tracking of taint tags.

Taint sources

  • low -bandwidth sensors
    • Location
    • Accelerometer
  • High-bandwidth sensors
    • Mic
    • Camera
  • Information DB
    • Address book
    • SMS storage
  • Device ID
    • IMEI
    • IMSI (don't actually track this one because of false positives)
    • ICC_ID
    • Phone Number

Taint sink (where marked data can leave the phone)

  • Network Taint Sink

Taint propagation

  • ???

Taint tags are stored in memory interleaved with the variables they are tracking

Some standard Data Flow technique is used to propagate these tags, especially as one variable that is marked may be assigned to another, so now that variable needs to be tracked as well.

Tracks explicit flows of data, not implicit To fully capture implicit flows, you need to do static analysis, which is hard with closed-source apps, and cannot be done real-time

Implicit flows are not tracked

  • Implicit flows can involve "taint-scope", tracking based on conditionals in code


Performance

The goal is to create a real time tracking system, so the TaintDroid's performance impact is of some importance

14% CPU overhead 4.4% memory overhead

Macro benchmarks (to get a feel for what the phone's usability is like with TD running)

  • App load: 3% (2ms)


Findings

20 out of 30 tested applications share data in a way that is not expected.

67 of 105 flagged pieces of data leaving the device had no obviously legitimate purpose (verified by the authors).

Many apps sent location data and other unique identifiers to advertising servers.

Most apps do not mention anything to the user.


Limitations

Tracks only explicit data flows.

An application *could* launder the tags off of the data, if they really wanted to hide this sort of thing from TaintDroid.

There are methods that could be used to protect against this, but they go against the goal of a light-weight, real-time tracking system. TD is not necessarily about catching truly malicious programs, but rather just those that leak information.


Why do apps take this information?

  • Lazy; in the demo video, the wallpaper app seems to use the IMEI just as a ready made unique ID
  • Overzealous; the developer might thing they *need* the data for something, but actually
  • Ads; advertises do seem a little presumptuous in their data collection
  • Spying; bosses or spouses
  • Malicious;


QA Period

Q: how do we prevent a malicious app from removing a taint attribute on a file

A: TD operates a too low a level for this to be a problem; TD assumes that the native code is trusted


Q: It seems like you had a lot of false positives

A: The point of this tool was to identify privacy sensitive information as having left the phone, not whether or not a privacy violation has taken place.


Q: Now that TD is released; couldn't malicious apps use some of the methods described in the paper to get around it?

A: Well, yes, but it is not just about maliciousness, it could just laziness or over-zealous ad stuff.

Other Information

Hey guys, thought I would just post a generalized paragraph about our essay.

In today’s society, Smartphones are the new big thing. To me that’s what makes this paper so interesting. This paper focuses on private information in android phones and the misuse of this information. The misuse of information includes the SIM card, the ID of the device, or the phone number. TaintDroid is used on smart phones with an efficient taint tracking and analysis system. It has the ability to track sensitive data from multiple sources and examines the misuse of such data. In their study, out of 80 popular third-party applications, TaintDroid monitored that 68 applications had potential misuse of user’s private data. This tool is great for knowing with applications are safe and which are not, so your private data can remained private.

Also, we should really think of splitting up the work in some way. If some people have specific sections they would like to do lets figure that out now so we can divide the workload and get it done over the next couple of days. I don't personally care what part I'm going to have to do, so lets get this going. Any other information people wanna post feel free the more the better, even if we don't end up using it.

Trevor Malone

Hey guys! Anything else we need to get done? Let me know and I can help in anyway possible.

Trevor Malone

Relevant Sources

Seems to be THE Dynamic Taint Analysis Paper.Talks about implementation on TaintCheck. Could be also useful for critique section -[Gautam]