Notes
Lecture 13: Cassandra & Dynamo
------------------------------
- haven't graded midterm, will do so by next week
- I had to submit a grant app on Monday
- will have grades on midterm before proposal is due
- you have until Nov 9th
Algorithm for making a proposal (for a lit review)
- find a paper you like/find interesting that is related to the course
- make sure it is a good one, published somewhere reputable
(ACM/IEEE isn't sufficient)
- preferably with reasonable number of citations
From that paper, find a related set of papers
- related to *one aspect* of the paper
- see citations (who the paper cites, who cites the paper)
- follow graph and search keywords to expand out
I prefer using a CS-standard citation format (like the papers we read use)
I prefer individual projects. If you want to do pairs it is possible -
but there has to be a clear division of responsibilities
(so assume no unless it makes sense for the topic and you ask me)
(Partners make more sense if you're building something.)
related to this class: can relate to "distributed" and "operating system"
- can't just be distributed or OS related
Try not to start with papers we've covered in class
- branch out, search!
- use ones related to your other interests
You should look for patterns amongst the papers you identify as being related. Your paper's argument is showing that that pattern exists and how it connects to the papers you find.
- don't just list summaries of papers, that's not a lit review
On to the papers
relational databases were inspired by the needs of airline reservation systems
dynamo was inspired by the needs of e-commerce shopping carts
- so writes (users selecting something to buy always gets saved)
other systems will refuse writes when they'll make the system inconsistent
- so client has to retry
dynamo always accepts writes, even when they'll make things inconsistent
- so how do they deal with inconsistency?
- when data is read
- use application specific semantics to reconcile conflicts
in the client
(like with source code version control - the programmer
has to manually figure out how to reconcile conflicts)
A hard problem in any distributed system is determining a canonical order of events
- in principle this can not exist, but we may need an ordering anyway
A ring is the simplest topolgy that remains connected when you lose a node