DistOS-2011W Globus: Difference between revisions
Line 28: | Line 28: | ||
The certification process is used in a Globus grid because of the security/authentication that is necessary for a scalable system to operate. If any random node in the world wanted to participate in the grid and use it or even abuse it without being accountable, it would not only cause possible harm to the work that is trying to be done on the grid, but also it will require the grid to be closely managed for these random inconsistencies in service and bad data. The system itself already tries to maintain the purity of nodes by doing multiple instances of the same operations over many nodes to have consistent results, as well as marking the nodes that may be malicious would be a nightmare if there was no certification process being implemented. The certification process allows for the host to know that a user is indeed a trusted user and that some organization is held responsible for vouching for this user. In real world implementations the Certified Authority(CA) (the one who signs the certificate) would normally be of some trusted organization such as how [https://www.verisign.com/ts-sem-page/?sl=t72320166440000002&gclid=CJH6qpyNoKcCFac65QodPjU4cA VeriSign] operates, but in the case of my testings I will simply allow for the globus user to be the CA. I used the tool that the toolkit supports called [http://www.globus.org/toolkit/docs/5.0/5.0.3/admin/install/#gtadmin-simpleca SimpleCA] which provides a wrapper around the OpenSSL CA functionality and is sufficient for simple Grid services. | The certification process is used in a Globus grid because of the security/authentication that is necessary for a scalable system to operate. If any random node in the world wanted to participate in the grid and use it or even abuse it without being accountable, it would not only cause possible harm to the work that is trying to be done on the grid, but also it will require the grid to be closely managed for these random inconsistencies in service and bad data. The system itself already tries to maintain the purity of nodes by doing multiple instances of the same operations over many nodes to have consistent results, as well as marking the nodes that may be malicious would be a nightmare if there was no certification process being implemented. The certification process allows for the host to know that a user is indeed a trusted user and that some organization is held responsible for vouching for this user. In real world implementations the Certified Authority(CA) (the one who signs the certificate) would normally be of some trusted organization such as how [https://www.verisign.com/ts-sem-page/?sl=t72320166440000002&gclid=CJH6qpyNoKcCFac65QodPjU4cA VeriSign] operates, but in the case of my testings I will simply allow for the globus user to be the CA. I used the tool that the toolkit supports called [http://www.globus.org/toolkit/docs/5.0/5.0.3/admin/install/#gtadmin-simpleca SimpleCA] which provides a wrapper around the OpenSSL CA functionality and is sufficient for simple Grid services. | ||
'''SimpleCA''' | '''SimpleCA''' | ||
The SimpleCA allows a user to become the CA for a grid and permits the CA to sign certificates that it has been issued for the grid host. Following the SimpleCA was easy since it provided a script that can be run that will ask for certain fields to be completed. | The SimpleCA allows a user to become the CA for a grid and permits the CA to sign certificates that it has been issued for the grid host.[[File:Screenshot2.png|thumb|alt=Example alt text|SimpleCA setup]]Following the SimpleCA was easy since it provided a script that can be run that will ask for certain fields to be completed. | ||
= Evaluation of Installation = | = Evaluation of Installation = |
Revision as of 06:19, 24 February 2011
Matthew Chou
Introduction
The system that I have attempted to implement was the Globus Toolkit, and from my current knowledge and understanding it seems quite difficult to create software that can be used on a grid system. While searching for different distributed systems I have come across different systems that are utilized for various fields of research, such as BOINC, folding@home, and many other @home projects as well as Condor.(list can be found here) Seeing as there are many of these types of systems being used around the world, I thought it would interesting to see what steps it would take to implement such a system and to see if I could get something to run on them.
Background
The basis of grid computing has been around for quite some time, starting with super computers which have many processors and vasts amounts of memory working under one machine. The idea behind a supercomputer, allowing the power of many machines to be run under one machine gives that one "single" machine a large threshold of computational power, and that itself is the idea that is to be implemented in grid computing. Multiple machines across various geographic distances working together to give power to any one machine that requires it. The thought of such a system seems like an obvious task to do, but with such a system it gives rise to different implementation issues that current operating systems have to deal with, such as heterogeneity, scalability, and adaptability[1]. Heterogeneity refers to having standards of use and data that the grid can follow so that when other domains are part of the grid there is no problems with integration and usage of the grid. Scalability is necessary for when the grid size increases because of the scalability of application usage on the grid and the organization/management of jobs distributed across it. Adaptability is also necessary because at any given point a node on the grid may go down and the job it had been doing must be done by another node, so with increased scalability the grid can utilize multiple nodes to adapt to dropped nodes as well as malicious nodes who might attempt to do incorrect jobs. The job of the Globus Toolkit is being the middleware, which provides the services that allow for grid computing to work successfully. Other known middleware implementations are glite, and UNICORE.
Installation
Environment Setup
On my journey to installing the Globus Toolkit, I have decided upon installing the latest version available which is the Globus Toolkit 5.0.3.
This grid implementation must be installed on a UNIX based OS which I chose to be Ubuntu 10.10 on a virtual machine operated under Oracle VM VirtualBox. The installation instructions I followed had a quickstart installation tutorial and an Admin Guide tutorial which I both read because some of the instructions were more simply explained in the quickstart tutorial than the Admin Guide. If I were to suggest how to set up one's environment, I would recommend following the Admin Guide and using the quickstart as a reference. The quickstart guide gives a list of required software that can be checked with a few simple commands in the terminal that check the versions/existence of openssl, libssl, zlib, gcc, g++, tar, sed, and make. There are specific instructions on each type of platform before installing the toolkit so make sure to check and see if you have any additional steps or compatibility issues that you should be aware of. There are some additional packages that need to be installed for implementing the hello world application, as well as some optional software such as a relational database, for this I installed PostgreSQL and psqlODBC which is the driver for PostgreSQL. Setting up the environment was easy enough to do for me to say that anyone should be able to follow the tutorial up to this point.
Globus Toolkit Installation
Before installing the toolkit itself, its good to understand the model that the Globus Toolkit is to follow. There will be a host, node A, which has a client, node B, as well as a Certificate Authority, node C (certificates explained in the next section). The host has clients who connect and can ask for jobs to be completed which is managed by the host, almost like a thread scheduler in an operating system, it manages the jobs that are given. The difference would be that the host has to also manage the resources being shared and it provides services such as control over Computing / Processing Power (GRAM), Data Management (GridFTP, DAI, RLS), Monitoring/Discovery (MDS), and Authorization/Security (CAS). The grid system is supposed to be used upon multiple machines, but for the sake of testing/simulation, it can be simply implemented upon a single computer, and then later branched out to other nodes if you wanted.
There are a couple of reoccuring steps that are necessary for the installation steps given to work. The first one being the creation of a non-privileged user named "globus", who is to perform the administrative tasks and deploy services, so the "globus" user would be the host node. I then made a directory to where I was going to install the toolkit and gave read/write permissions to the "globus" user on that folder. The next thing I had to do was sign in as the globus user and set the GLOBUS_LOCATION variable to the directory I had made by using the
globus@globus:export GLOBUS_LOCATION=/usr/local/globus-5.0.3
command in the terminal. The directory should contain the contents of the Globus Toolkit installation tar which can be downloaded from here. The GLOBUS_LOCATION variable should be always set since the tutorial asks for its use very often. Then I ran
globus@globus:./configure --prefix=$GLOBUS_LOCATION globus@globus:make
and it took approximately 30 minutes on my computer, then I ran
globus@globus:make install
If you have followed all of the steps before hand correctly, there shouldn't be any errors occurring, if there is then it most likely will be some error of the genre of "missing library" or something that can be easily installed. At the end of the installation I found myself relieved of the completed installation and thought to myself that this was not so hard to install, so at this point it seems that using the Toolkit is not so difficult after all.
Certification
The certification process is used in a Globus grid because of the security/authentication that is necessary for a scalable system to operate. If any random node in the world wanted to participate in the grid and use it or even abuse it without being accountable, it would not only cause possible harm to the work that is trying to be done on the grid, but also it will require the grid to be closely managed for these random inconsistencies in service and bad data. The system itself already tries to maintain the purity of nodes by doing multiple instances of the same operations over many nodes to have consistent results, as well as marking the nodes that may be malicious would be a nightmare if there was no certification process being implemented. The certification process allows for the host to know that a user is indeed a trusted user and that some organization is held responsible for vouching for this user. In real world implementations the Certified Authority(CA) (the one who signs the certificate) would normally be of some trusted organization such as how VeriSign operates, but in the case of my testings I will simply allow for the globus user to be the CA. I used the tool that the toolkit supports called SimpleCA which provides a wrapper around the OpenSSL CA functionality and is sufficient for simple Grid services. SimpleCA
The SimpleCA allows a user to become the CA for a grid and permits the CA to sign certificates that it has been issued for the grid host.
Following the SimpleCA was easy since it provided a script that can be run that will ask for certain fields to be completed.
Evaluation of Installation
Hello World Implementation
Implementation Overview
Discussion of Experience
Conclusion
Summarize the report, point to future work.
References
Give references in proper form (not just URLs if possible, give dates of access). Install guide: http://www.globus.org/toolkit/docs/latest-stable/admin/install/#gtadmin http://www.globus.org/toolkit/docs/5.0/5.0.3/admin/quickstart/ Installer source: http://www.globus.org/toolkit/downloads/5.0.3/