COMP 3000 Essay 2 2010 Question 10

mClock: Handling Throughput Variability for Hypervisor IO Scheduling

Paper

mClock: Handling Throughput Variability for Hypervisor IO Scheduling

Authors:

Ajay Gulati VMware Inc. Palo Alto, CA, 94304 agulati@vmware.com

Arif Merchant HP Labs Palo Alto, CA 94304 arif.merchant@acm.org

Peter J. Varman Rice University Houston, TX, 77005 pjv@rice.edu

Background Concepts

Hypervisors are responsible for multiplexing hardware resources between virtual machines while providing isolation to an extent, using resource management. The three controls used are reservation where the minimum bounds are set, the limit where the maximum upper bound on the allocation is set, and shares which proportionally allocate the resources according to the certain weight each VM has, and also depending on the reservation and upper bound limits. This is interesting because virtualization has been very successful; people are comfortable with putting multiple VM on one HOST without worrying about the performance of each VM on another. However the contention for I/O resources can suddenly lower a VM’s allocation; the available throughput can change with time, and adjustments to allocations must be made dynamically. mClock is a better alternate because it supports all controls in a single algorithm, handles variable and unknown capacity, and fast to compute. This is interesting because there is a limit control on VM allocation, it does not weaken as each VM gets added on, and mClock reservations are met.

more about mclock here

mClock is a resource-allocation algorithm that helps hypervisors manage I/O requests from multiple virtual machines simultaneously. Essentially, mClock dynamically adjusts the proportions of resources each VM receives based on how active each VM currently is. While mClock constantly changes the physical resource allocation to each VM, it lets each VM hold onto the illusion that it has full control of all system resources. As a result, performance can be increased for VMs that need it, without letting the others know that “their” resources are being distributed to other machines.

Research problem

We use today, a very primitive kind of IO resource allocation in modern hypervisors. Currently an algorithm called PARDA (Proportional Allocation of Resources in Distributed storage Access) ¹ is used to allocate IO resources to each VM running on a particular storage device. Unfortunately, the IO resource allocation algorithm of the hosts use a fair-scheduler called SFQ (Start-time Fair Queuing) ². What this means is that PARDA allocates IO resources to VMs proportional to the number of IO shares on the host, but each host uses a fair scheduler which divides the IO shares amongst the VMs equally. This leads to the problem that whenever another VM is added or another background application is run on one of the VMs, all the other VMs suffer a huge performance lose. In the case of adding another VM, there is a 40% performance drop. This is completely unacceptable when applications have minimum performance requirements to run effectively. An application with minimum resource requirements can be running fine on any given VM, but as soon as the load on the shared storage device increases, the application would run poorly, or could potentially crash.

Contribution

This paper addresses the current limitations of IO resource allocation for hypervisors. The paper has proposed a new and more efficient algorithm to allocate IO resources. Older methods were limited solely by providing proportional shares. mClock incorporates proportional shares, as well as a minimum reservation of IO resources, and a maximum reservation.

Older methods of IO resource allocation had a terrible performance lose. Whenever the load on the shared storage device was increased, or when another VM was added, the performance of all hosts would drop considerably. Older methods provided unreliable IO management of hypervisors

mClock was able to present VMs with a guaranteed minimum reservation of IO resources. This means that application performance will never drop below a certain point. This provides much better application stability on each of the VMs, and better overall performance.

"dmClock (used for cluster-based storage systems) runs a modified version of mClock at each server. There is only one modification to the algorithm to account for the distributed model in the Tag-Assignment component." - from the paper

Critique

The article introduces the mClock algorithm which handles multiple VM in a variable throughput environment. The Quality of Service (QoS) requirements for a VM are expressed as a minimum reservation, a maximum limit, and a proportional share. This algorithm, mClock, is able to meet those controls in varying capacity. The good thing about this is that the algorithm proves efficient in clustered architectures. Moreover, it provides greater isolation between VMs.

In this paper there were many terms that were used but never explained, such as orders (used in the graphs), LUN, PARDA, etc. Also, I did not like the way the calculations were written in sentences, "For a small reference IO size of 8KB and using typical values for mechanical delay T_m = 5ms and peak transfer rate, B_peak = 60 MB/s, the numerator = Lat₁*(1 + 8/300) ≈ Lat₁". To me this was very messy and made me skip through the calculations part of the sentence.

References

¹ A. Gulati, I. Ahmad, and C. Waldspurger. PARDA: Proportional Allocation of Resources in Distributed Storage Access. In (FAST ’09) Proceedings of the Seventh Usenix Conference on File and Storage Technologies, Feb 2009.

² W. Jin, J. S. Chase, and J. Kaur. Interposed proportional sharing for a storage service utility. In ACM SIGMET- RICS, 2004. Interposed proportional sharing for a storage service utility

COMP 3000 Essay 2 2010 Question 10

Contents

Paper

Background Concepts

Research problem

Contribution

Critique

References

Navigation menu

COMP 3000 Essay 2 2010 Question 10

Paper

Background Concepts

Research problem

Contribution

Critique

References

Navigation menu

Search