Increasing energy costs of large-scale server systems has led to a demand for
innovative methods for optimizing resource utilization in these systems. Such
methods aim to reduce server energy consumption, cooling requirements, carbon
footprint, and so on, thereby leading to improved holistic sustainability of
the overall server infrastructure. At the core of many of these methods lie
reliable workload-prediction techniques that guide in identifying servers,
time intervals, and other parameters that are needed for building
sustainability solutions based on techniques like virtualization and server
consolidation for server systems. Many workload prediction methods have been
proposed in the recent literature, but unfortunately they do not deal
adequately with the issues that arise specifically in large-scale server
systems, viz., extensive non-stationarity of server workloads, and massive
online streaming data. In this paper, we fill this gap by proposing two
online ensemble learning methods for workload prediction, which address these
issues in large-scale server systems. The proposed algorithms are motivated
from the Weighted Majority and Simulatable Experts approaches, which we extend
and adapt to the large-scale workload prediction problem. We demonstrate the
effectiveness of our algorithms using real and synthetic data sets, and show
that using the proposed algorithms, the workloads of approximately 91% of
servers in a real data center can be predicted with accuracy greater than 89%,
whereas using baseline approaches, the workloads of only 13--24% of the
servers can be predicted with similar accuracy.