Difference between revisions of "DistOS-2011W BigTable"

From Soma-notes
Jump to navigation Jump to search
Line 1: Line 1:
=Introduction=
=Introduction=


Describe the system(s) that you examined or compared. Why did you choose them?  Be sure to specify a thesis that you argue in the rest of the document. Since this is a report the thesis may be relatively weak; however, an appropriate thesis will help the reader understand why did what you did and why you wrote what you wrote.
Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework. For example,  


End with a paragraph outlining the rest of the document.
* Location Transparency
* Access Transparency
* Performance Transparency
* Scalability Transparency
* Concurrency Transparency
* Failure Transparency
* Migration Transparency
* Replication Transparency


Be sure to change the titles of the following sections to match the structure of your paper. In particular, please try to make them less generic.  What follows is just a suggestion; the document will be evaluated in part on the quality of writing, and good writing sometimes requires some flexibility.
The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms.


=Systems/Programs in the Space=
=Real World Implementations=


Give an overview of the area you are examiningWhat systems/programs are out there?
Bigtable is a proprietary system at Google, so it is currently only used and implemented by them. There are, however, other implementations that exist in the open source world; notably Hadoop, which is Apache’s implementation of Bigtable. It is based on the papers released by Google on their MapReduce and Google File System (GFS). Their framework allows applications to be run on large clusters of commodity hardwareSome organizations that make significant use Hadoop are: Yahoo, eBay, Facebook, Twitter, and IBM.
 
In the next two sections, we discuss the features and underlying’s of the two frameworks.
 
=Google’s Implementation=
 
=Apache’s Implementation – Hadoop=


=Evaluated Systems/Programs=
=Evaluated Systems/Programs=
Line 29: Line 42:
=References=
=References=


Give references in proper form (not just URLs if possible, give dates of access).
[Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011)
[Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011)
[Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011)
[Apache Hadoop]http://hadoop.apache.org/
[Apache Hadoop Wiki]http://wiki.apache.org/hadoop/
[Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop

Revision as of 19:49, 28 February 2011

Introduction

Bigtable is a distributed storage system used at Google. Its main purpose is to have an enormous amount of data scale reliably over a large number of computers. Distributed transparencies are of key importance in the overall framework. For example,

  • Location Transparency
  • Access Transparency
  • Performance Transparency
  • Scalability Transparency
  • Concurrency Transparency
  • Failure Transparency
  • Migration Transparency
  • Replication Transparency

The design and framework of Bigtable is of considerable interest, however, it is not open source and accessible to the general public. Clearly, this is a problem for an implementation report, so we will use Apache’s open-source implementation Hadoop to get an understanding of how to configure, run, and deploy applications across the system. Throughout the rest of the paper, we will contrast and comment on the similarities between the two platforms.

Real World Implementations

Bigtable is a proprietary system at Google, so it is currently only used and implemented by them. There are, however, other implementations that exist in the open source world; notably Hadoop, which is Apache’s implementation of Bigtable. It is based on the papers released by Google on their MapReduce and Google File System (GFS). Their framework allows applications to be run on large clusters of commodity hardware. Some organizations that make significant use Hadoop are: Yahoo, eBay, Facebook, Twitter, and IBM.

In the next two sections, we discuss the features and underlying’s of the two frameworks.

Google’s Implementation

Apache’s Implementation – Hadoop

Evaluated Systems/Programs

Describe the systems individually here - their key properties, etc. Use subsections to describe different implementations if you wish. Briefly explain why you made the selections you did.

Experiences/Comparison (multiple sections)

In multiple sections, describe what you learned.

Discussion

What was interesting? What was surprising? Here you can go out on tangents relating to your work

Conclusion

Summarize the report, point to future work.

References

[Practical Problem Solving With Apache-Hadoop]http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig (February 11, 2011) [Wikipedia - BigTable]http://en.wikipedia.org/wiki/BigTable (February 15, 2011) [Google Code University - Distributed Systems]http://code.google.com/edu/parallel/ (February 23, 2011) [Apache Hadoop]http://hadoop.apache.org/ [Apache Hadoop Wiki]http://wiki.apache.org/hadoop/ [Wikipedia – Hadoop]http://en.wikipedia.org/wiki/Hadoop