COMP 3000 Essay 2 2010 Question 10: Difference between revisions

From Soma-notes
Dagar (talk | contribs)
No edit summary
m Protected "COMP 3000 Essay 2 2010 Question 10" ([edit=sysop] (indefinite) [move=sysop] (indefinite))
 
(101 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''mClock: Handling Throughput Variability for Hypervisor I/O Scheduling'''
==Paper==
==Paper==


[http://www.usenix.org/events/osdi10/tech/full_papers/Gulati.pdf mClock: Handling Throughput Variability for Hypervisor IO Scheduling]
[http://www.usenix.org/events/osdi10/tech/full_papers/Gulati.pdf mClock: Handling Throughput Variability for Hypervisor I/O Scheduling]


'''Authors''':  
'''Authors''':  


: Ajay Gulati VMware Inc. Palo Alto, CA, 94304 agulati@vmware.com
Ajay Gulati VMware Inc. Palo Alto, CA, 94304 agulati@vmware.com


: Arif Merchant HP Labs Palo Alto, CA 94304 arif.merchant@acm.org
Arif Merchant HP Labs Palo Alto, CA 94304 arif.merchant@acm.org


: Peter J. Varman Rice University Houston, TX, 77005 pjv@rice.edu
Peter J. Varman Rice University Houston, TX, 77005 pjv@rice.edu


==Background Concepts==


==Background Concepts==
: Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.


'''Virtual machines''' (VMs) are becoming increasingly significant as they are used by everyone from university students to large gaming firms. One of the key issues with virtual machines is ensuring that all shared resources on the machine are utilized equitably. In order to do this, and to provide the illusion that the virtual machine is running on its own hardware, a hypervisor is required.


: Hypervisors are responsible for multiplexing hardware resources between virtual machines while providing isolation to an extent, using resource management. The three controls used are reservation where the minimum bounds are set, the limit where the maximum upperbound on the allocation is set, and shares which proportionally allocate the resources according to the certain weight each VM has, and also depending on the reservation and upperbound limits. This is interesting because virtualization has been very successful; people are comfortable with putting multiple VM on one HOST without worrying about the performance of each VM on another. However the contention for I/O resources can suddenly lower a VM’s allocation; the available throughput can change with time, and adjustments to allocations must be made dynamically. mClock is a better alternate because it supports all controls in a single algorithm, handles variable and unknown capacity, and fast to compute. This is interesting because there is a limit control on VM allocation, it does not weaken as each VM gets added on, and mClock reservations are met. [[User:npatel1|Niravkumar Patel]]
'''Hypervisors''' are responsible for multiplexing hardware resources between virtual machines while providing isolation to an extent, using resource management. Three controls used by the hypervisor are:  reservations, where the minimum bounds are set (in absolute units); limits, where the maximum upper bound on the allocation is set (again in absolute units); and shares, which proportionally allocate the resources according to the weight of each VM. These three controls have been supported for CPU and memory resource allocation since 2003. However, the current issue is I/O resource allocation. Currently, when more VMs are added to a host, the contention for input/output (I/O) resources can suddenly lower a VM’s allocation. Also, the available throughput can change with time, and adjustments to allocations must be made dynamically.


: more about mclock here
'''Throughput''' is the average rate of job completion. In general, a system is considered more efficient if it has a higher throughput <sup>[[#Foot6|6]]</sup>. In this paper, this term is used to discuss the fact that throughput varies, and the number of jobs a system wishes to complete varies as well. Therefore it is necessary to take throughput into account when scheduling I/O resources.


: mClock is a resource-allocation algorithm that helps hypervisors manage I/O requests from multiple virtual machines simultaneously. Essentially, mClock dynamically adjusts the proportions of resources each VM receives based on how active each VM currently is. While mClock constantly changes the physical resource allocation to each VM, it lets each VM hold onto the illusion that it has full control of all system resources. As a result, performance can be increased for VMs that need it, without letting the others know that “their” resources are being distributed to other machines. [[User:npatel1|Niravkumar Patel]]
'''SFQ''', or Start-Time Fair Queuing, is the traditional scheduler currently used to allocate resources. It follows a proportional-sharing algorithm which divides up the total throughput between the VMs in proportion to their assigned shares. The issue with this is that it does not consider reservations or limits in its allocation.


==Research problem==
==Research problem==
:  What is the research problem being addressed by the paper?  How does this problem relate to past related work?


: We use today, a very primitive kind of IO resource allocation in modern hypervisors.   Currently a algorithm called PARDA [1] is used to allocate IO resources to each VM running on a particular storage device. Unfortunately, the IO resource allocation algorithm of the hosts use a fair-scheduler called SFQ [2]. What this means is that PARDA allocates IO resources to VMs proportional to the number of IO shares on the host, but each host uses a fair scheduler which divides the IO shares amongst the VMs equally. This leads to the problem that whenever another VM is added or another background application is run on one of the VMs, all the other VMs suffer a huge performance lose.  In the case of adding another VM, there is a 40% performance drop. This is completely unacceptable when applications have minimum performance requirements to run effectively. An application with minimum resource requirements can be running fine on any given VM, but as soon as the load on the shared storage device increases, the application would run poorly, or could potentially crash.--[[User:Aellebla|Aaron Leblanc]] 21:30, 30 November 2010 (UTC)
====Problem====
Today, I/O resource allocation in modern hypervisors is very simple and somewhat primitive. Currently, an algorithm called PARDA (Proportional Allocation of Resources in Distributed storage Access) <sup>[[#Foot1|1]]</sup> is used to allocate I/O resources to each VM running on a particular storage device. Unfortunately, the I/O resource allocation algorithm of the hosts use a fair-scheduler called SFQ <sup>[[#Foot2|2]]</sup> <sup>[[#Foot3|3]]</sup>.SFQ(D) (Start-time Fair Queuing) performs fairly well for low-intensity workloads. However, as the workload with VMs multiplies, the constant need for faster performance and efficiency rises.
 
PARDA allocates I/O resources to VMs proportional to the number of I/O shares on the host, but each host uses a fair scheduler which divides the I/O shares amongst the VMs equally. This leads to the main problem; whenever another VM is added or another background application is run on one of the VMs, all other VMs suffer a huge performance loss; a 40% performance drop. This is completely unacceptable when applications have minimum performance requirements to run effectively. An application with minimum resource requirements can be running fine on any given VM, however as soon as the stress load on the shared storage device increases, the application might fail to run smoothly, or worse, crash.
 
====Solution====
Therefore, hypervisors require a better resource-allocation algorithm in order to meet the need for high performance VMs running concurrently. The mClock algorithm is the solution that Gulati, Varman, and Merchant have proposed in this paper.


==Contribution==
==Contribution==
: What are the research contribution(s) of this work?  Specifically, what are the key research results, and what do they mean?  (What was implemented?  Why is it any better than what came before?)


: This paper addresses the current limitations of IO resource allocation for hypervisors. The paper has proposed a new and more efficient algorithm to allocate IO resources. Older methods were limited solely by providing proportional shares. mClock incorporates proportional shares, as well as a minimum reservation of IO resources, and a maximum reservation. --[[User:Aellebla|Aaron Leblanc]] 22:08, 30 November 2010 (UTC)


: mClock was able to present VMs with a guaranteed minimum reservation of IO resources. This means that application performance will never drop below a certain point. This provides much better application stability on each of the VMs, and better overall performance. --[[User:Aellebla|Aaron Leblanc]] 22:19, 30 November 2010 (UTC)
This paper addresses the current limitations of I/O resource allocation for hypervisors. It proposes a new, more efficient algorithm for the allocation of I/O resources. Older methods were limited as they only provided proportional shares, such as SFQ. The proposed algorithm, mClock, incorporates proportional shares, as well as a minimum reservation of I/O resources, and a maximum reservation.
Older methods of I/O resource allocation had a significant performance disadvantage. Whenever the load on the shared storage device was increased, or when another VM was added, the performance of all hosts would drop significantly. Also, these older methods provided unreliable I/O management of hypervisors. Conversely, mClock is able to present VMs with a guaranteed minimum reservation of I/O resources. This means that application performance will never drop below a certain point. This provides much better application stability on each of the VMs, and better overall performance and efficiency level, compared to older methods such as SFQ.


mClock is a resource-allocation algorithm that helps hypervisors manage I/O requests from multiple virtual machines simultaneously. It is a better alternative to SFQ because it supports all controls in a single algorithm, handles variable and unknown capacity, and computes quickly. The algorithm does not weaken the performance level as each VM gets added on, and mClock reservations are met. Essentially, mClock dynamically adjusts the proportions of resources each VM receives based on current activity. While mClock constantly changes the physical resource allocation to each VM, it lets each VM hold onto the illusion that it has full control of all system resources. As a result, performance can be increased for VMs that need it, without letting the others know that “their” resources are being distributed to other machines. What mClock attempts to achieve is combining a constraint-based scheduler and a weight-based scheduler. Making sure the minimum I/O reservation limit is consistently met, yet not over the upper bound limit, would be handled by the constraint-based scheduler. Then, it is the responsibility of the weight-based scheduler to distribute the remaining I/O throughput to the rest of the VMs equally.


: - "dmClock (used for cluster-based storage systems) runs a modified version of mClock at each server. There is only one modification to the algorithm to account for the distributed model in the Tag-Assignment component." - from the paper [[User:tpham3|Tuan Pham]]
The mClock algorithm uses a tag-based scheduler with some modifications; like other tag-based schedulers all I/O requests are assigned tags and scheduled in order of their tag values. The modifications include the ability to use multiple tags based on the three controls (shares, reservations and limits) and to dynamically decide which tag to use for scheduling, while still synchronizing idle VMs <sup>[[#Foot2|2]]</sup>.
 
mClock was implemented on a modified version of VMware ESX server hypervisor <sup>[[#Foot4|4]]</sup> <sup>[[#Foot5|5]]</sup>. This modification only took around 200 lines of C code. This shows that it was not very difficult to implement mClock effectively in a existing product, and to improve performance. The mClock algorithm should be relatively portable and easy to implement in other hypervisors.
 
Another contribution was the introduction of Distributed mClock or '''dmClock''' which basically runs an altered version of mClock at each server. dmClock is mainly used for cluster-based storage system which are rising as an alternative to centralized disk arrays, and are better in terms of cost. The reservation in this modified algorithm gives higher preference to non-idle VMs to attain greater performance. dmClock proved to be effective with a simple, modified mClock algorithm which does not require complex synchronizations between servers.


==Critique==
==Critique==
: What is good and not-so-good about this paper?  You may discuss both the style and content; be sure to ground your discussion with specific references.  Simple assertions that something is good or bad is not enough - you must explain why.


The article introduces the mClock algorithm which handles I/O resource allocation on multiple VMs in a variable throughput environment. The Quality of Service (QoS) requirements for a VM are expressed as a minimum reservation, a maximum limit, and a proportional share. A positive aspect of this algorithm is that it is able to meet those controls in varying capacity. Also, it is significant that the algorithm was proven to be more efficient than existing methods at allocating I/O resources in clustered architectures while providing greater isolation between VMs. mClock allows users to be comfortable when working with multiple VMs on one host without the constant worry of performance levels, with each VM add-on.


: The article introduces a mClock algorithm which handles multiple VM in a variable throughput environment. The Quality of Service (QoS) requirements for a VM are expressed as a minimum reservation, a maximum limit, and a proportional share. mClock is able to meet those controls in varying capacity. the good thing about this is that the algorithms proves efficient in clustered architectures. Moreover, mClock provides greater isolation between VMs. [[User:npatel1|Niravkumar Patel]]
The paper proposes a better, and effective alternative to SFQ and other older methods and the mClock algorithm efficiently handles multiple VMs in a throughput environment (LUN, PARDA).


The negative aspect of this paper was the writing style used for displaying calculations. In many sections of the essay, the calculations are embedded in a sentence which makes them difficult to read and understand. One example is the following line:  "For a small reference I/O size of 8KB and using typical values for mechanical delay T<sub>m</sub> = 5ms and peak transfer rate, B<sub>peak</sub> = 60 MB/s, the numerator = Lat<sub>1</sub>*(1 + 8/300) &asymp; Lat<sub>1</sub>" <sup>[[#Foot2|2]]</sup>.<math></math>  The calculations were often about how the algorithm would calculate the resource allocation but it was not really necessary to include them; the essay is understandable without them.


: In this paper there were many terms that were used but never explained, such as orders (used in the graphs), LUN, PARDA, etc. Also i did not like the way the calcualtions were written in sentences, "For a small reference IO size of 8KB and using typical values for mechanical delay T<sub>m</sub> = 5ms and peak transfer rate, B<sub>peak</sub> = 60 MB/s, the numerator = Lat<sub>1</sub>*(1 + 8/300) &asymp; Lat<sub>1</sub>". To me this was very messy and made me skip through the calculations part of the sentence. [[User:tpham3|Tuan Pham]]
In general, however, the essay is clear and not difficult to understand. As well, the case for this algorithm appears well-presented and valid.


==References==
==References==


[1] A. Gulati, I. Ahmad, and C. Waldspurger. PARDA: Proportional
 
<span id="Foot1"><sup>1</sup> A. Gulati, I. Ahmad, and C. Waldspurger. PARDA: Proportional
Allocation of Resources in Distributed Storage
Allocation of Resources in Distributed Storage
Access. In (FAST ’09) Proceedings of the Seventh Usenix
Access. In (FAST ’09) Proceedings of the Seventh Usenix
Conference on File and Storage Technologies, Feb 2009.
Conference on File and Storage Technologies, Feb 2009.</span>


[2] W. Jin, J. S. Chase, and J. Kaur. Interposed proportional
<span id="Foot2"><sup>2</sup> W. Jin, J. S. Chase, and J. Kaur. Interposed proportional
sharing for a storage service utility. In ACM SIGMET-
sharing for a storage service utility. In ACM SIGMET-
RICS, 2004.
RICS, 2004. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.7012&rep=rep1&type=pdf Interposed proportional sharing for a storage service utility]
</span>
 
<span id="Foot3"><sup>3</sup>
P. Goyal, H. M. Vin, and H. Cheng. Start-Time Fair
Queuing: A scheduling algorithm for integrated services
packet switching networks. Technical Report CS-TR-96-
02, UT Austin, January 1996.
</span>
 
<span id="Foot4"><sup>4</sup>
VMware ESX Server User Manual, December 2007.
VMware Inc.
</span>
 
<span id="Foot5"><sup>5</sup>
VMware, Inc. Introduction to VMware Infrastructure.
2007. http://www.vmware.com/support/pubs/.
</span>
 
<span id="Foot6"><sup>6</sup>
A. S. Tanenbaum. Modern Operating Systems: Third Edition. Pearson Education, 2008.
</span>

Latest revision as of 19:45, 3 December 2010

mClock: Handling Throughput Variability for Hypervisor I/O Scheduling


Paper

mClock: Handling Throughput Variability for Hypervisor I/O Scheduling

Authors:

Ajay Gulati VMware Inc. Palo Alto, CA, 94304 agulati@vmware.com

Arif Merchant HP Labs Palo Alto, CA 94304 arif.merchant@acm.org

Peter J. Varman Rice University Houston, TX, 77005 pjv@rice.edu

Background Concepts

Virtual machines (VMs) are becoming increasingly significant as they are used by everyone from university students to large gaming firms. One of the key issues with virtual machines is ensuring that all shared resources on the machine are utilized equitably. In order to do this, and to provide the illusion that the virtual machine is running on its own hardware, a hypervisor is required.

Hypervisors are responsible for multiplexing hardware resources between virtual machines while providing isolation to an extent, using resource management. Three controls used by the hypervisor are: reservations, where the minimum bounds are set (in absolute units); limits, where the maximum upper bound on the allocation is set (again in absolute units); and shares, which proportionally allocate the resources according to the weight of each VM. These three controls have been supported for CPU and memory resource allocation since 2003. However, the current issue is I/O resource allocation. Currently, when more VMs are added to a host, the contention for input/output (I/O) resources can suddenly lower a VM’s allocation. Also, the available throughput can change with time, and adjustments to allocations must be made dynamically.

Throughput is the average rate of job completion. In general, a system is considered more efficient if it has a higher throughput 6. In this paper, this term is used to discuss the fact that throughput varies, and the number of jobs a system wishes to complete varies as well. Therefore it is necessary to take throughput into account when scheduling I/O resources.

SFQ, or Start-Time Fair Queuing, is the traditional scheduler currently used to allocate resources. It follows a proportional-sharing algorithm which divides up the total throughput between the VMs in proportion to their assigned shares. The issue with this is that it does not consider reservations or limits in its allocation.

Research problem

Problem

Today, I/O resource allocation in modern hypervisors is very simple and somewhat primitive. Currently, an algorithm called PARDA (Proportional Allocation of Resources in Distributed storage Access) 1 is used to allocate I/O resources to each VM running on a particular storage device. Unfortunately, the I/O resource allocation algorithm of the hosts use a fair-scheduler called SFQ 2 3.SFQ(D) (Start-time Fair Queuing) performs fairly well for low-intensity workloads. However, as the workload with VMs multiplies, the constant need for faster performance and efficiency rises.

PARDA allocates I/O resources to VMs proportional to the number of I/O shares on the host, but each host uses a fair scheduler which divides the I/O shares amongst the VMs equally. This leads to the main problem; whenever another VM is added or another background application is run on one of the VMs, all other VMs suffer a huge performance loss; a 40% performance drop. This is completely unacceptable when applications have minimum performance requirements to run effectively. An application with minimum resource requirements can be running fine on any given VM, however as soon as the stress load on the shared storage device increases, the application might fail to run smoothly, or worse, crash.

Solution

Therefore, hypervisors require a better resource-allocation algorithm in order to meet the need for high performance VMs running concurrently. The mClock algorithm is the solution that Gulati, Varman, and Merchant have proposed in this paper.

Contribution

This paper addresses the current limitations of I/O resource allocation for hypervisors. It proposes a new, more efficient algorithm for the allocation of I/O resources. Older methods were limited as they only provided proportional shares, such as SFQ. The proposed algorithm, mClock, incorporates proportional shares, as well as a minimum reservation of I/O resources, and a maximum reservation.

Older methods of I/O resource allocation had a significant performance disadvantage. Whenever the load on the shared storage device was increased, or when another VM was added, the performance of all hosts would drop significantly. Also, these older methods provided unreliable I/O management of hypervisors. Conversely, mClock is able to present VMs with a guaranteed minimum reservation of I/O resources. This means that application performance will never drop below a certain point. This provides much better application stability on each of the VMs, and better overall performance and efficiency level, compared to older methods such as SFQ.

mClock is a resource-allocation algorithm that helps hypervisors manage I/O requests from multiple virtual machines simultaneously. It is a better alternative to SFQ because it supports all controls in a single algorithm, handles variable and unknown capacity, and computes quickly. The algorithm does not weaken the performance level as each VM gets added on, and mClock reservations are met. Essentially, mClock dynamically adjusts the proportions of resources each VM receives based on current activity. While mClock constantly changes the physical resource allocation to each VM, it lets each VM hold onto the illusion that it has full control of all system resources. As a result, performance can be increased for VMs that need it, without letting the others know that “their” resources are being distributed to other machines. What mClock attempts to achieve is combining a constraint-based scheduler and a weight-based scheduler. Making sure the minimum I/O reservation limit is consistently met, yet not over the upper bound limit, would be handled by the constraint-based scheduler. Then, it is the responsibility of the weight-based scheduler to distribute the remaining I/O throughput to the rest of the VMs equally.

The mClock algorithm uses a tag-based scheduler with some modifications; like other tag-based schedulers all I/O requests are assigned tags and scheduled in order of their tag values. The modifications include the ability to use multiple tags based on the three controls (shares, reservations and limits) and to dynamically decide which tag to use for scheduling, while still synchronizing idle VMs 2.

mClock was implemented on a modified version of VMware ESX server hypervisor 4 5. This modification only took around 200 lines of C code. This shows that it was not very difficult to implement mClock effectively in a existing product, and to improve performance. The mClock algorithm should be relatively portable and easy to implement in other hypervisors.

Another contribution was the introduction of Distributed mClock or dmClock which basically runs an altered version of mClock at each server. dmClock is mainly used for cluster-based storage system which are rising as an alternative to centralized disk arrays, and are better in terms of cost. The reservation in this modified algorithm gives higher preference to non-idle VMs to attain greater performance. dmClock proved to be effective with a simple, modified mClock algorithm which does not require complex synchronizations between servers.

Critique

The article introduces the mClock algorithm which handles I/O resource allocation on multiple VMs in a variable throughput environment. The Quality of Service (QoS) requirements for a VM are expressed as a minimum reservation, a maximum limit, and a proportional share. A positive aspect of this algorithm is that it is able to meet those controls in varying capacity. Also, it is significant that the algorithm was proven to be more efficient than existing methods at allocating I/O resources in clustered architectures while providing greater isolation between VMs. mClock allows users to be comfortable when working with multiple VMs on one host without the constant worry of performance levels, with each VM add-on.

The paper proposes a better, and effective alternative to SFQ and other older methods and the mClock algorithm efficiently handles multiple VMs in a throughput environment (LUN, PARDA).

The negative aspect of this paper was the writing style used for displaying calculations. In many sections of the essay, the calculations are embedded in a sentence which makes them difficult to read and understand. One example is the following line: "For a small reference I/O size of 8KB and using typical values for mechanical delay Tm = 5ms and peak transfer rate, Bpeak = 60 MB/s, the numerator = Lat1*(1 + 8/300) ≈ Lat1" 2.<math></math> The calculations were often about how the algorithm would calculate the resource allocation but it was not really necessary to include them; the essay is understandable without them.

In general, however, the essay is clear and not difficult to understand. As well, the case for this algorithm appears well-presented and valid.

References

1 A. Gulati, I. Ahmad, and C. Waldspurger. PARDA: Proportional Allocation of Resources in Distributed Storage Access. In (FAST ’09) Proceedings of the Seventh Usenix Conference on File and Storage Technologies, Feb 2009.

2 W. Jin, J. S. Chase, and J. Kaur. Interposed proportional sharing for a storage service utility. In ACM SIGMET- RICS, 2004. Interposed proportional sharing for a storage service utility

3 P. Goyal, H. M. Vin, and H. Cheng. Start-Time Fair Queuing: A scheduling algorithm for integrated services packet switching networks. Technical Report CS-TR-96- 02, UT Austin, January 1996.

4 VMware ESX Server User Manual, December 2007. VMware Inc.

5 VMware, Inc. Introduction to VMware Infrastructure. 2007. http://www.vmware.com/support/pubs/.

6 A. S. Tanenbaum. Modern Operating Systems: Third Edition. Pearson Education, 2008.