DistOS-2011W BigTable: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
=Introduction= | =Introduction= | ||
Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework. | Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework. | ||
The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms. | The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms. | ||
Line 43: | Line 34: | ||
[Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011) | [Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011) | ||
[Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011) | [Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011) | ||
[Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011) | [Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011) | ||
[Apache Hadoop]http://hadoop.apache.org/ | [Apache Hadoop]http://hadoop.apache.org/ | ||
[Apache Hadoop Wiki]http://wiki.apache.org/hadoop/ | [Apache Hadoop Wiki]http://wiki.apache.org/hadoop/ | ||
[Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop | [Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop |
Revision as of 23:50, 28 February 2011
Introduction
Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework.
The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms.
Real World Implementations
Bigtable is a proprietary system at Google, so it is currently only used and implemented by them. There are, however, other implementations that exist in the open source world; notably Hadoop, which is Apache’s implementation of Bigtable. It is based on the papers released by Google on their MapReduce and Google File System (GFS). Their framework allows applications to be run on large clusters of commodity hardware. Some organizations that make significant use Hadoop are: Yahoo, eBay, Facebook, Twitter, and IBM.
In the next two sections, we discuss the features and underlying’s of the two frameworks.
Google’s Implementation
Apache’s Implementation – Hadoop
Evaluated Systems/Programs
Describe the systems individually here - their key properties, etc. Use subsections to describe different implementations if you wish. Briefly explain why you made the selections you did.
Experiences/Comparison (multiple sections)
In multiple sections, describe what you learned.
Discussion
What was interesting? What was surprising? Here you can go out on tangents relating to your work
Conclusion
Summarize the report, point to future work.
References
[Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011)
[Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011)
[Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011)
[Apache Hadoop]http://hadoop.apache.org/
[Apache Hadoop Wiki]http://wiki.apache.org/hadoop/
[Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop