An Analysis of Linux Scalability to Many Cores

Authors: Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris and Nickolai Zeldovich.

Affiliates: MIT CSAIL

Background Concepts

memcached: Section 3.2

memcached is an in-memory hash table server. One instance of memcached running on many different cores is bottlenecked by an internal lock, which is avoided by the MIT team by running one instance per core. Clients each connect to a single instance of memcached, allowing the server to simulate parallelism without needing to make major changes to the application or kernel. With few requests, memcached spends 80% of its time in the kernel on one core, mostly processing packets.¹

Apache: Section 3.3

Apache is a web server that has been used in previous Linux scalability studies. In the case of this study, Apache has been configured to run a separate process on each core. Each process, in turn, has multiple threads (making it a perfect example of parallel programming). Each process uses one of their threads to accepting incoming connections and others are used to process these connections. On a single core processor, Apache spends 60% of its execution time in the kernel.¹

gmake: Section 3.5

gmake is an unofficial default benchmark in the Linux community which is used in this paper to build the Linux kernel. gmake takes a file called a makefile and processes its recipes for the requisite files to determine how and when to remake or recompile code. With a simple command -j or --jobs, gmake can process many of these recipes in parallel. Since gmake creates more processes than cores, it can make proper use of multiple cores to process the recipes.⁵ Since gmake involves much reading and writing, in order to prevent bottlenecks due to the filesystem or hardware, the test cases use an in-memory filesystem tmpfs, which gives them a backdoor around the bottlenecks for testing purposes. In addition to this, gmake is limited in scalability due to the serial processes that run at the beginning and end of its execution, which limits its scalability to a small degree. gmake spends much of its execution time with its compiler, processing the recipes and recompiling code, but still spend 7.6% of its time in system time.¹

Research problem

As technology progresses, the number of core a main processor can have is increasing at an impressive rate. Soon personal computers will have so many cores that scalability will be an issue. There has to be a way that standard user level Linux kernel will scale with a 48-core system¹. The problem with a standard Linux OS is that they are not designed for massive scalability, which will soon prove to be a problem. The issue with scalability is that a solo core will perform much more work compared to a single core working with 47 other cores. Although traditional logic states that the situation makes sense because there are 48 cores dividing the work, the information should be processed as fast as possible with each core doing as much work as possible.

To fix those scalability issues, it is necessary to focus on three major areas: the Linux kernel, user level design and how applications use kernel services. The Linux kernel can be improved by optimizing sharing and use the current advantages of recent improvement to scalability features. On the user level design, applications can be improved so that there is more focus on parallelism since some programs have not implemented those improved features. The final aspect of improving scalability is how an application uses kernel services to better share resources so that different aspects of the program are not conflicting over the same services. All of the bottlenecks are found easily and actually only take simple changes to correct or avoid.¹

This research uses a foundation of previous research discovered during the development of scalability in UNIX systems. The major developments from shared memory machines² and wait-free synchronization to fast message passing ended up creating a base set of techniques, which can be used to improve scalability. These techniques have been incorporated in all major operating system including Linux, Mac OS X and Windows. Linux has been improved with kernel subsystems, such as Read-Copy-Update, which is an algorithm that is used to avoid locks and atomic instructions which affect scalability.³ There is an excellent base of research on Linux scalability studies that have already been written, on which this research paper can model its testing standards. These papers include research on improving scalability on a 32-core machine.⁴ In addition, the base of studies can be used to improve the results of these experiments by learning from the previous results. This research may also aid in identifying bottlenecks which speed up creating solutions for those problems.

Contribution

What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Critique

What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.

References

¹Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. An Analysis of Linux Scalability to Many Cores. MIT CSAIL, 2010. http://www.usenix.org/events/osdi10/tech/full_papers/Boyd-Wickizer.pdf

²J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH multiprocessor. In Proc. of the 21st ISCA, pages 302–313,1994.

³P. E. McKenney, D. Sarma, A. Arcangeli, A. Kleen, O. Krieger, and R. Russell. Read-copy-update. In Proceedings of the Linux Symposium 2002, pages 338-367, Ottawa Ontario, June 2002

⁴C. Yan, Y. Chen, and S. Yuanchun. OSMark: A benchmark suite for understanding parallel scalability of operating systems on large scale multi-cores. In 2009 2nd International Conference on Computer Science and Information Technology, pages 313–317, 2009

⁵GNU.org. GNU Make Manual. [Online] http://www.gnu.org/software/make/manual/