DistOS-2011W Akamai and CDN

From Soma-notes
Revision as of 06:38, 8 March 2011 by Frahman (talk | contribs) (→‎Conclusion)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Fahim Rahman

Introduction

Content Distribution Networks (CDN) have increased their position as a vital distributed system to enhance the use of the internet, as applications and media streaming increase in demand with time. A CDN is a distributed system of computers containing copies of data at various points within a network. This aids in maximizing bandwidth for users to access data from different points within the network. The issue of latency is relieved in terms of how an application or data transfer behaves, as the position of these servers tends to be closer to any group of individual users. At its simplest makeup, a CDN is a mirroring mechanism solving part of the last mile issue to ensure a positive user experience when it comes to streaming media or using a web-based application.

Content and associated applications are dictated on the internet by the cost shouldered by the publisher. An individual publisher, or service provider, can reach a large audience by having the funding to administer the combination of load-balanced servers and fast network connections. This can be a significant barrier to entry for a smaller, unfunded service provider. Popularity and trends on the internet can be measured in a wave formation. A small website can experience something referred to as the “Slashdot” effect, where a certain website will experience a load of traffic all at once, as the wave of popularity comes in. Given that many small-time providers of content are not prepared for this kind of popularity, an unintended downtime will be experienced. The levels of traffic are simply unsustainable to the smaller provider. A plausible solution for this issue lies within a content distribution network.

Mirroring has presented itself as a natural solution to provide static content. This requires a voluntary effort with people using their own servers and networks to lend a hand to a provider perceived with value that they want to support. Peer to peer networks and file sharing also displays the effort individual users are willing to put forth to distribute this valued content. The sustainability of this effort is questionable given that it requires on the value proposition to be high enough among a user base with resources to spare. This paper will explore in detail two specific content distribution mechanisms; Coral-CDN, a publicly available resource and Akamai Technologies, a commercial solution. Many other CDNs are available out there, but these two solutions will be explored in detail. To be revealed will be their general approach (section 2), their technical approach (section 3), and the user experience (section 4). A discussion will follow, highlighting issues that each solution experiences and needs to consider for the future (section 5) as well as a conclusion on the CDN space as it relates to the use of the internet (section 6).

CDN Origins

The foundations of CDNs lie in mirroring of websites. A mirror site is a separate site that is set up as an identical copy of another site, commonly used to increase access to identical information. This has been advantageous in the easiest sense to solve the latency issue; a website based in Canada could be mirrored in Australia to allow Australian users quicker access to it. Many issues abound with this approach including synchronization, load requirements and BGP concerns.

Local clustering provides another solution that offers improved scalability. The fault of this approach lies in the fact that there is still a single point of failure, in that if the ISP fails, the entire site will suffer. Significant planning is required of the site/content provider, as the resident servers at any location have to be able to handle these peak loads.

Evaluated Systems/Programs

CoralCDN

CoralCDN offers a structure which leads to the democratization of content publication. It is structured in part as a peer-to-peer network, which makes use of voluntary aggregate bandwidth to minimize the effect of a mass amount of traffic to a particular website. CoralCDN empowers any user to use the service by appending a simple string to a URL.

The process is as follows:

1. A client sends a DNS request to its local resolver by appending “.nyud.net:8090” to the URL. As an example, “http://www.x.com.nyud.net”

2. Using the Coral DNS server, the client’s resolver attempts to resolve the hostname. A starting point will likely be at the .net domain and what’s registered under it.

3. A Coral DNS server determines a round-trip time by probing the client.

4. The probe results allow the DNS server to check Coral for any known nameservers and/or HTTP proxies close to the client.

5. If a server is found via Coral, the DNS server returns this information. If none are found, a rand set of nameservers and proxies are returned. The DNS server is to be close to the client as it only returns nodes that are close to itself.

6. A Coral HTTP proxy for www.x.com.nyud.net is returned by the client’s resolver

7. The specified proxy is sent by the client. The process continues, unless the proxy has a cache of the file locally, in which case it returns the file and stops.

8. The object’s URL is looked up in Coral by the proxy.

9. The proxy gets the object from the node if Coral returns the address of a node with the object cached. If this is not the case, the proxy downloads the object from the originating server.

10. The client browser gets the web object from the proxy.

11. Coral now has a reference to the proxy that is now caching the URL.

Akamai

Akamai began as an academic exercise at MIT in the late 1990s. It has now grown within a decade to provide a service for many high profile content providers. Broadcast networks and application service providers have benefited handsomely from the distributed nature that Akamai orients itself on. Simplistically, Akamai installs thousands of servers within thousands of networks to relay the service that their clients demand.

Akamai works as a preconfigured proxy to serve content. A content provider will orient their HTML code to access pre-existing information on the Akamai servers. In the figure, the process is as simple as a series of redirects. The example highlights (1), a web user requesting a page containing a video clip from Canada’s ctv.ca. The HTML (2) that displays and positions this content links to the Akamai server which then determines an actual box to serve the content to via its internal DNS resolution. This server will be one of thousands, likely within miles of the user. The content is then streamed (3) to the user from this server.

Akamai boasts a current implementation of 84 000 servers distributed in 72 countries and 1100 networks. A claim is also made that 15 to 20% of overall web traffic is handled daily.

Experiences/Comparison

To test out the implementation of both the CoralCDN process and Akamai, sites were browsed employing the two methods while performing a packet capture. In the case of CoralCDN, a heavily trafficked site experiencing the so-called “Slashdot” effect was accessed conventionally and via the CoralCDN method. For Akamai, a video was streamed off a broadcast network’s site. The packet captures revealed where the servers were located in proximity to the test run performed in Ottawa, Ontario.

CoralCDN

Consider the use case of an internet user (browser) seeking out some content from a specific provider. The provider, or user, can call on the Coral system (Resolver) to retrieve the content in question. The user or provider simply needs to append “.nyud.net:8090” to the URL to make use of the system. In this example, a specific image file is requested from a website, using the CoralCDN system.

The example used was a website experiencing high traffic at the time of study. http://www.livethesheendream.com was used as an example as the site experienced the “Slashdot” effect at time of study.(http://www.torontosun.com/entertainment/celebrities/2011/03/02/17469286.html)

From packet captures, when accessing the base image http://livethesheendream.com/images/sheen.jpg, the server IP was traced to 64.207.144.170, resolving to a server in Culver City, California, United States. When attempting to access the appended URL of http://livethesheendream.com.nyud.net:8090/images/sheen.jpg , the packet trace revealed that the content came from 130.127.39.152, resolving to Anderson City, South Carolina, United States). This server is geographically closer to the user in Ottawa, showing that the algorithm and process is helpful. The user experience was also much more positive as the site loaded quickly in comparison to directly accessing the site. CoralCDN was helpful in accessing a site that was experiencing more traffic than it could handle.

Akamai

The use case scenario for an Akamai example involved streaming video from Canada’s CTV network. CTV is a listed customer of Akamai and makes use of their services to distribute their content. CTV makes available much of the programming they air on the network on their website. Usually, content is posted on the website within a few hours of it airing on the station. In looking at the HTML code for the player page, the redirects requesting specific Akamai resources are revealed. As the earlier figure illustrates, a subdomain is used to refer to the Akamai server (hdtoken.ctvdigital.net).

The user experience was positive in viewing the content. The video loaded within a reasonable amount of time on a 16mbps DSL connection. Viewing the packet capture revealed that the server was based in Montreal, which is within a 200 kilometre range of the testing area. This can be considered a surprise as Akamai does have servers installed in the Ottawa region. It could go to show, however, that the closest server with enough resources was located 200 kilometres away.

Discussion

Challenges

CoralCDN

Among many challenges, security will remain the highest priority issue for the CoralCDN process. The integrity of cached data is a major issue and will continue to evolve with passing time. Just like problems with spam, the caching servers may be infiltrated with much unwanted information, especially given the voluntary nature of the mirroring process.

Suggested solutions have included “self-certifying pathnames” within the Coral URLs. The drawback with this approach is the requirement that the server approves this. Logging and tracking of known “bad” systems could help, but would require constant monitoring and administration. However, maintaining this could prevent clients from inadvertently accessing a malicious proxy.

Bandwidth hogs have the potential to wreak havoc and abuse on the overall system by preventing access to valid servers and proxies.

Akamai

With Akamai’s implementation, there are plenty of issues that are a constant challenge. On the administration side, they have ensure system scalability which involves controlling and monitoring thousands of distributed servers. There needs to be mechanisms in place to attend to incomplete or out of date information, reacting to varying loads and measuring overall internet connections. Troubleshooting with this setup takes significant effort and requires many protocols.

Overall system reliability has to be ensured given the amount of traffic that is relying to go across the system. The system relies on multiple caches which inherently require some form of consistency. Dealing with dynamic data can be tricky as this type of data cannot be cached. A mechanism is employed for this type of application which relies on mirroring only static content, yet still having a connection back to a single processing server. This allows for much traffic to be offloaded, but will continue to evolve in order to provide real-time consistency.

“Attack traffic” and dDos attempts will always be an issue for this type of system. Given the vastness of Akamai’s system, they are able to observe trends and signatures of this type of attack. With these trends monitored, it is easier to trace the source of any of this traffic, or eliminate it based on these patterns. Constant research and evolution is continued as attack patterns change and evolve themselves.

Conclusion

Content Distribution Networks are an excellent example of the power of a distributed system. The traffic off-loaded and diverted by these networks reduces considerable strain on individual content and service providers resulting in an internet user’s experience to be much more positive. Such a network cements itself into the structure of the internet as the demand for complex applications and streaming media increases.

The CoralCDN provides a decent open environment for users and providers alike to access content that does not have the funding required for the pre-emptive infrastructure that a commercial solution like Akamai provides. In the realm of openness, CoralCDN does provide a quality solution to stick to their slogan of democratizing content publication. The peer-to-peer model makes it ideal for a contributory and voluntary, open system. It is the user community that works to distribute content that they themselves find useful.

Akamai Technologies controls a massive system that has spread around the world at a steady rate. With the amount of distribution they have raises significant monitoring and administration challenges, but also allows them to provide a robust, commercial solutions. Any provider expecting a significant amount of traffic for an application or media streaming is a good candidate to benefit from the service.

Future work, as highlighted in the discussion, will be to evolve with security, reliability, administration and malicious traffic challenges and concerns. Like any distributed system, the issues will also revolve around DDos prevention, bandwidth hogs, and spreading malware. Constant, efficient monitoring of all traffic is required to improve on this distribution network foundation.

References

Globally distributed content delivery (Akamai) - Accessed Feb. 10, 2011

Coral-CDN paper - Accessed Feb. 18, 2011

The Slashdot Effect (Wikipedia) - Accessed Feb. 24, 2011

Akamai - Why the Edge? - Accessed Feb. 25, 2011

How to build your own CDN... - Accessed Feb 24, 2011

The Design of CoralCDN - Accessed Feb 27, 2011

Akamai - State of the Internet - Accessed January 25, 2011

Akamai - Online Video Publishers - Accessed February 28, 2011

Sheen tribute site 'just explodes' - Accessed March 2, 2011