Difference between revisions of "DistOS-2011W BigTable"

From Soma-notes
Jump to navigation Jump to search
Line 1: Line 1:
=Introduction=
=Introduction=


Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework. For example,
Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework.
 
* Location Transparency
* Access Transparency
* Performance Transparency
* Scalability Transparency
* Concurrency Transparency
* Failure Transparency
* Migration Transparency
* Replication Transparency


The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms.
The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms.
Line 43: Line 34:


[Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011)
[Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011)
[Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011)
[Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011)
[Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011)
[Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011)
[Apache Hadoop]http://hadoop.apache.org/
[Apache Hadoop]http://hadoop.apache.org/
[Apache Hadoop Wiki]http://wiki.apache.org/hadoop/
[Apache Hadoop Wiki]http://wiki.apache.org/hadoop/
[Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop
[Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop

Revision as of 19:50, 28 February 2011

Introduction

Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework.

The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms.

Real World Implementations

Bigtable is a proprietary system at Google, so it is currently only used and implemented by them. There are, however, other implementations that exist in the open source world; notably Hadoop, which is Apache’s implementation of Bigtable. It is based on the papers released by Google on their MapReduce and Google File System (GFS). Their framework allows applications to be run on large clusters of commodity hardware. Some organizations that make significant use Hadoop are: Yahoo, eBay, Facebook, Twitter, and IBM.

In the next two sections, we discuss the features and underlying’s of the two frameworks.

Google’s Implementation

Apache’s Implementation – Hadoop

Evaluated Systems/Programs

Describe the systems individually here - their key properties, etc. Use subsections to describe different implementations if you wish. Briefly explain why you made the selections you did.

Experiences/Comparison (multiple sections)

In multiple sections, describe what you learned.

Discussion

What was interesting? What was surprising? Here you can go out on tangents relating to your work

Conclusion

Summarize the report, point to future work.

References

[Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011)

[Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011)

[Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011)

[Apache Hadoop]http://hadoop.apache.org/

[Apache Hadoop Wiki]http://wiki.apache.org/hadoop/

[Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop