Distributed OS: Winter 2011

Evaluation

Grades in this class will be determined based on the following criteria.

Undergraduate Students:

20% Class participation
20% Wiki participation
10% Group project oral presentation (April 5th in class)
30% Group project written report (Due April 11th)
20% Implementation report (Due March 1st)

Graduate Students:

15% Class participation
20% Wiki participation
10% Group project oral presentation (April 5th in class)
30% Group project written report (Due April 11th)
25% Literature review paper (Due March 1st)

Proposals for Implementation reports & Literature reviews should be emailed to Prof. Somayaji by February 1st.

Using the Wiki

All of the standard Mediawiki functions are available on this wiki in addition to the following extensions:

Cite: for easier references/endnotes
GraphViz: for inline graph drawing
SyntaxHighlight: for source code syntax highlighting (be sure to use the "source" tag)

Implementation reports (undergrads)

An implementation report is a 5-10 page paper that either

describes in detail one existing software system with distributed OS-like properties,
compare and contrasts an important characteristic of 3 or more software systems with distributed OS-like properties, or
reports on experiences setting up and using a software system with distributed OS-like properties.

Topics for an implementation report must be approved by Prof. Somayaji.

Implementation reports for Winter 2011:

Students: please add your report above following the template.

Literature review paper (graduate students)

The literature review paper should be a 8-12 page paper that reviews research and well-known commercial work in an area of distributed operating systems research or a closely related area.

Literature Review papers for Winter 2011:

Students: please add your paper above.

Group Projects

Observability & Contracts: How do I observe the acts of other agents, particularly "public" acts? How can make contracts between computers (promises to exchange actions in present for actions in the future)?
Attribution: How do we know who did what?
Reputation: How do we remember and disseminate knowledge of past actions?
Justice: Given that we can gather evidence of misbehavior, how can that evidence be assembled, judged, and the resulting decision enforced?
Public Goods: How can we build and maintain public goods (e.g., indices, caches)?

Readings

January 13, 2011

CCR (two papers)

Optional readings:

O'Donnell (2009), Prolog to A Survey of BGP Security Issues and Solutions
Butler et al. (2009), A Survey of BGP Security Issues and Solutions

February 10, 2011

Savage et al. (2000), Practical Network Support For IP Traceback.

February 15, 2011

Satyanarayanan et al. (1990), Coda: a highly available file system for a distributed workstation environment.
Ghemawat et al. (2003), The Google File System.

February 17, 2011

Weil et al. (2006), Ceph: A Scalable, High-Performance Distributed File System.

March 1, 2011

Oda et al. (2008), SOMA: Mutual Approval for Included Content in Web Pages.
Oda & Somayaji (2008), Content Provider Conflict on the Modern Web.

Problems to Solve

Attack computers with almost no consequences
- DDoS
- botnets
- capture and analyze private traffic
- distribute malware
- tampering with traffic
- Unauthorized access to data and resources
- Impersonate computers, individuals, applications
- Fraud, theft
- regulate behavior

Design Principles

subjects of governance: programs and computers
bind programs and computers to humans & human organizations, but recognize binding is imperfect
recognize that "bad" behavior is always possible. "good" behavior is enforced through incentives and sanctions.
rules will change. Even rules for rule changes will change. Need a "living document" governing how rules are chosen and enforced.

Scenarios

1: Stopping DDoS

Group members: Seyyed, Andrew Schoenrock, Thomas McMahon, Lester Mundt, AbdelRahman, Rakhim Davletkaliyev

Have the machine routing packets(could be ISP provider) detect suspicious packets, if the packets are signed, then those suspicious packets could be blocked,

the sender could be put on a black list.

(AS) Stopping DDoS against files, services, programs, etc
- (AS) Have file replication built into the system (similar to OceanStore) so that files are always available from different servers
- (AS) If files are not replicated then we could have a tiered messaging system (at the top level would be OS messages) and servers could then prioritize the incoming traffic. If a given server is experiencing an overload, it could send out a distress signal to its neighbours and then distribute what it is has to them. The system should have a built-in mechanism to re-balance the overall load after something like this happens. This would then mean that any DDoS attack would result in the service being more available.
  - I like this idea of having service fallover
  - Expanding on the idea of file replication and sending distress signals to it's neighbours, I could envision a group of servers that would learn to help each other out. Lending processing and storage when they are under utilized. The would sort of form a collective, club or gang. Members who didn't contribute ( always fully utilized ) would eventually be identified and banned. It would be these other computers that the targeted server would rely on for help in this situation. However cool this is it isn' really a solution because one could suppose the attackers might utilize the same strategy to recruit additional help in there attack.

(AS) Stopping DDoS against specific machines
- (AS) I don't think that this should be specifically addressed. I think measures introduced to guard against this will ultimately negatively impact the overall system in terms of performance.
  - I don't like the idea of sacrificing the one for the many though.
    - (AS) The main thing with what I've proposed is that the motivation behind doing a DDoS attack is completely gone (by doing one a service would either maintain or increase its overall availability). I think by eliminating the main result of a DDoS attack would mean that there would be no reason to guard against DDoS attacks on a specific machine.

Stopping DDoS
- Many of the DDoS attacks utilize the property of anonymity. These services serve anyone who requests there service. Many DDoS attacks then ensure sufficient traffic that the computer behind the service can no longer cope. If we remove anonymity and only serve 'known' parties the spurious requests would be ignored. So we need to 'know' who our friends are.
  - This of course requires a form of unspoofable authentication unlike IP.
    - (RD) Serving only 'known' parties reduces the distribution of information, or at least its rate. I was thinking of removing anonymity on a lower level, so that any party that's not anonymous while sending a packet to your machine is considered 'known', and anything unknown (unsigned, unrepresented in some way) is blocked. So, we don't really need to 'know' who our friends are, we just need to know who aren't.
    - (RD) Another thing I had in mind is punishment in case a 'known' party participates in DDoS-attack: not punishing the owner of that machine (who probably is a victim as well), but the software or hardware in some sense.

Stopping DDoS
- (RD) How about developing such a network topology and protocols that make DDoS attacks less efficient or harder to perform? Some sort of CAPTCHA, but for machines and protocols, to distinguish them from bots, maybe?

Stopping DDoS
- I'm not sure what it means by stopping, I don't think we can stop DDos given the way things are currently ran, we can only block it. From my knowledge most softwares that stop DDoS do so by blocking, or even complete shut down like Mccolo.

Stopping DDos
- One method is to use the same way of eliminating DoS by rejecting a specific rate of subsequent requests but from irrelevant sources.

How we could stop DDoS would be to have each connection to the internet assigned to a particular identity. This identity would be used to verify who is attempting connections. The reason DDoS works is because currently, IP addresses can be spoofed. The only way to verify an identity is to request a response, but by then the damage is done. With a verified identity, connection attempts being routed can be verified during transmission, so that the request may not necessarily even reach the destination host.

Basically, we need some encryption system using keys so that as the packets are being routed, the identity of the packet's sender can be verified. Ideally the decryption would be trivial so as to prevent noticeable latency. Because an identity is verified, if there is spoofing of packets, they would be dropped during the routing. If all the identities are verified and are still attempting a DDoS attack, the attacker's identity will be traced back to the attacker.

(RD) (I think we're not looking low enough. We're trying to find a solution for this problem assuming the system that made that problem possible is still unchanged. We enforce more security by identification, encryption, etc, but the system is still problem-prone. This will allow to identify an attacker, but after the attack was started (or even finished). It's like trying to eliminate theft from a society of poor, unemployed, uneducated people by enforcing more security and punishment. Which will help to reduce the rate and motivation, but can't stop the possible attack. It is pretty stupid analogy, but rather than policing that society, I want to make them rich, employed and educated, so that thefts are just not efficient way of getting goods for them. So, rather than protecting machines from attacks, I want to make the system where DDoS-attacks are just inappropriate.)

2: Stopping phishing

Group members: Waheed Ahmed, Nicolas Lessard, Raghad Al-Awwad, Tarjit Komal

A way of automatically checking the signature of a message to make sure it really is from a trusted source.
- ie: "Nation of Banks, did your member TD send me a message to reset my password?"

There should be filters to ensure where the message is coming from.If the message is coming from unknown source , it should be blocked.
Don't use the links in an email to get to any web page, if you suspect the message might not be authentic.
Avoid filling out forms in email messages that ask for personal financial information. Phishers can make exact forms which you can find on financial institution.
Make is so a machine needs to be authorized to use your information -- A machine that you don't own can't use your information to do anything, regardless of whether he has it or not.
Ensure that any website that requires the filling of personal information be a secure website which can be traced to the original organisation.
Ensure that whatever browser you are using is up to date with the most recent security patches applied.
Obviously, report and suspected phishing to the appropriate authorities so that proper action can be taken
"three strikes and you're out"
- Each machine is responsible for the massages it releases. When a machine is a repeat offender it loses access privileges
Revamp the security login process to something similar to:
- User enters username and clicks next.
- Server returns a user predefined image to the User.
- If image is the right image then user enters password to logon.

3: Limiting the spread of malware

Group members: keith, Andrew Luczak, David Barrera, Trevor Gelowsky, Scott Lyons

(KM) Heterogenous systems - it is much easier to write code to attack a single type of system
(KM) Individualized security policies
- (AL) A baseline security level would help prevent malware spreading to/from a system with "individual non-security"
(KM) Identify all programs through digital signatures
(KM) Peer rating system for programs, customize security policies based on peer ratings
- (SL) Need some way to keep rating system from being "gamed"
  - (AL) Maybe a program gets flagged if it experiences a rapid approval increase?
- (AL) Need to protect against benign programs with good ratings being updated into malware
(KM) System level forensics on program execution and resource/file modification
(KM) Customizable user and program blacklists
(SL) Sandboxing with breach management - know what files have been modified by a process
(SL) Trending - what does the application spend most of its time doing?

(DB)Multiple control/chokepoints where malware is looked for. This way, it's more difficult for attackers to take over several control points and for malware to remain unnoticed.
(DB)Heterogeneous systems help limit the spread of malware too. There's 2 points here. (1) If we're designing this system where we're all masters of our own domains, then we're likely to have different system configurations. However (2), if we want to communicate and interact with other domains, we need some standardized communication layer or mechanism. Standardization is very closely tied to homogeneous.
(DB)There should be consequences if you harbor malware or if malware originates from within your domain. This could be and incentive to help people be more proactive in terms of security.

4: Bandwidth hogs

Group members: Mike Preston, Fahim Rahman, Michael Du Plessis, Matthew Chou, Ahmad Yafawi

limit bandwidth for each user
if user has significant bandwidth demands for a certain period of time
- add them to a watch list
- monitor their behaviour
- divert communication to other hosts that can satisfy requests.
  - if there are no other hosts that can satisfy the request, then distribute data to other idle and capable hosts. Load is now reduced on the one link.
QoS
Tiered Bandwidth Distribution
- The main idea is you get more bandwidth to your machine as much as you give back to the community.
  - It's similar to some trackers and dark net programs in which they wont increase your download speed unless you contribute X amount of Bytes back to your peers.
- Tier 1, Basic privileges i.e. all machines have minimal bandwidth.
- Tier n, we define some requirements to be met then we increase bandwidth accordingly.
  - Drop a Tier if machine doesn't maintain the specified requirements of that specific tier.
  - Advantage, monitoring bandwidth on the network is cheap while implementing what is stated above is not.
As a metaphor to our "real world society", bandwidth control can be treated as we do speed for cars.
- Certain areas need more free flowing traffic, so speed limits are increased. Others require a slower pace which is enforced. These "areas" can be translated to users or programs in our distributed OS model
- There are repercussions to breaking any of these imposed limits
- Throttling provides once possible implementation of these constraints

Bandwidth Hog Additional Sources and Information

1. A Solution to Bandwidth Hogs in a Cable Network

Starting at page 120 of this thesis is a proposed solution to bandwidth hogs on a cable network. In general, the proposal suggests a solution essentially equal to throttling however I did find the description of the solution to be helpful. I feel it may go well with our tiered suggestion if we were to keep the "earned trust" approach to bandwidth access but at the same time allow users in low congestion times to go above their tier. For example, if congestion is low, why not allow the people on the network to occupy much larger bandwidths. On the network include some form of monitoring protocol which can decide how much access a user is allowed. If more bandiwdth is available, let them have it if it is needed for their request. On the other hand, if congestion is high, the user will be capped at the upper limit of their bandwidth capacity if they are doing something that requires a large amount of bandwidth. In this manner each user will be guaranteed the amount they have earned at their tier, however if they do not want to earn a higher level for high usage timeframes they can instead opt to make use of low congestion timeframes and run their bandwidth heavy applications at that time. The network could also include live data regarding the current bandwidth usage levels as well as trending data so that people can plan when to start bandwidth heavy applications.

2. Why Flow-Completion Time is the Right Metric for Congestion Control

This is a short article which raises an interesting question related to our topic, how should we determine what is considered "bandwidth hogging". For example, do we look at the strain on the network in some capacity (i.e. dropped packets, usage level of the capacity of the pipe,etc.) which is important information for those who build the network; or do we make use of the time it takes for some transaction to occur when a user requests it? This article argues that from a user's point of view, they do not care how much bandwidth they get as long as the task they are requesting is completed as quickly as possible. In our discussion in class we had talked about how majority of people currently do not require large bandwidth needs for normal transactions ( email, web searching, wikis ;-) ), and a much smaller percentage of the population are the ones who actually eat up the larger bandwidth through hog-like applications. Maybe instead of focusing on the bandwidth as the main issue, we should think about how long it takes to complete tasks. Maybe our tiered system would also incorporate some aspect of this train of thought, i.e. people who only send email and surf the web are at tier one, people who use online storage and FTP are on level 2, people who stream movies and other data are at level 3, etc. Then, we could have each tier cost a separate amount and apply some form of control on the technologies available at each tier so that the restrictions of a tier are adhered to.

3. Who’s Hogging The Bandwidth?: The Consequences Of Revealing The Invisible In The Home

This article is from Micrsoft reasearch and it is an interesting look into controlling bandwidth usage by providing people with a tool to monitor the usage and alter how bandwidth is allocated. This tool essentially boils down to the social control idea that we discussed in class. If you know that your neighbours are hogging the bandwidth for very low priority issues then should you not be able to appeal to their conscience in order to gain usage of resources you need? The article provides some examples of homes they provided this control to and how the household politcs factored into the usage of the bandwidth. When usage was no longer hidden it seems as though it became easier to openly discuss how to divide the finite amount of bandwidth. Initial concerns revolved around people just hogging the bandwidth for themselves or playing practical jokes on others in the house by reducing their usage when they were in the middle of some task. Another issue that this type of control brings up is how to prioritize what tasks are "more important". One example given was if a Skype call to family and friends is more important than watching YouTube videos for a work related task. Interestingly the field studies provided some other examples of a "bandwidth etiqutte" that emerged. For example, it was considered very rude to limit somone's bandwidth when he/she was on a Skype call due to the immediate and negative effect but it was deemed acceptable to limit bandwidth during a file transfer as it just meant a few extra minutes for the transfer to complete.