It is now common for individual applications such as search engines and social networks to serve a billion users. Enabling that growth has been a relentless pursuit of scalability, in hardware and software, that has culminated in expensive and power-hungry data centers. Yet the delivered performance of these systems has lagged behind their potential capability by an order of magnitude or more.
In this talk, I will describe two projects focused on improving data center resource efficiency. The first is a MapReduce implementation built upon an IO-optimized distributed sorting system called TritonSort. When applied to the 100 TB GraySort benchmark, it improved upon the absolute performance of the previous world record holder by 25%, using 66 times fewer servers. As a result, it has attained the 100 TB JouleSort record for energy-efficient data processing. The second project focuses on the design of the underlying data center network. Existing scalable data center network designs promise full bisection bandwidth between all servers, though with significant cost, complexity, and power consumption. Instead, we propose a hybrid electrical/optical switch architecture that can deliver a nearly 3x reduction in cost and 6x reduction in power consumption relative to the state of the art.
George Porter is a Research Scientist at UCSD and the Associate Director of UCSD’s Center for Networked Systems. His research interests include data-intensive computing and data center networking. He has received a Google Focused Research Award and a NetApp Faculty Fellowship. He received his B.S. from the University of Texas at Austin and his Ph.D. from the University of California, Berkeley.