Datacenter Cooling Optimization using Deep Reinforcement Learning

Washington University in St. Louis, Fall 2024

Developed a comprehensive deep reinforcement learning solution for optimizing datacenter cooling systems using EnergyPlus simulations. The project was completed as part of Washington University in St. Louis’s CSE 510A: Deep Reinforcement Learning course, focusing on implementing and comparing multiple DRL algorithms for energy efficiency optimization.

Research Context

With the growing popularity of deep learning and big data, data centers have become essential to modern infrastructure, leading to increased energy consumption both for computing operations and cooling needs. While large-scale data centers have been extensively studied, small- to mid-sized data centers remain understudied despite occupying 42.5% and 19.5% of the market share respectively. Our project specifically addresses this gap by optimizing cooling efficiency in these smaller facilities.

As machines within a data center complete their tasks, they generate heat, creating a complex spatial cooling problem. Our approach leverages DRL to dynamically adapt to changing conditions such as machine workload and external temperature.

Key Features

Multiple DRL algorithm implementations:
- Dueling Deep Q-Network (DDQN) - Our novel contribution to datacenter cooling
- Proximal Policy Optimization (PPO) with Generalized Advantage Estimation
- Soft Actor-Critic (SAC) with automatic entropy tuning
Integration with EnergyPlus for accurate building energy simulation
Custom environment for datacenter cooling optimization
Performance metrics and energy efficiency tracking
Configurable hyperparameters for training
Baselines for comparison (Random, Rules-Based Controller, Rules-Based Incremental)

Implementation Details

The project consists of three main DRL implementations, each serving different purposes:

DDQN for discretized action spaces (our best performer with 35.8% improvement over baseline)
PPO for stable training performance (achieved 10.9% improvement over baseline)
SAC for continuous action spaces with exploration (showed limited improvement at 0.284%)

The system integrates with EnergyPlus for realistic building energy simulations, considering:

Dynamic thermal conditions
Energy consumption patterns
Cooling system efficiency
Environmental impact

Experimental Setup

We used the Sinergym Python package to simulate a small datacenter through the Eplus-datacenter-mixed-continuous-stochastic-v1 environment. The environment simulates:

A 491.3 m² building divided into two asymmetrical zones (west and east)
Each zone equipped with an HVAC system
Hosted servers as primary heat sources
Stochastic weather conditions with 1.5 standard deviations of normal amplification
Training period from June 1st to August 31st

Key Findings

Our experiments revealed:

DDQN significantly outperformed other approaches, showing a 35.8% improvement over the Rules-Based Incremental Controller
PPO achieved a 10.9% improvement over the baseline
SAC showed limited improvement (0.284%) compared to the baseline
Weather forecasting data generally reduced model performance across configurations
Model-free approaches like DDQN offer promising results for small to mid-sized data centers with limited computational resources

Technologies Used

Python 3.12
PyTorch 2.5.1
EnergyPlus 24.2.0
Gymnasium 1.0.0
Sinergym 3.7.0
NumPy
Tensorboard 2.18.0
Version Control (Git)

Aadarsha Gopala Reddy