Juan Pérez, Gustavo Lobos, Milena Bonacic
Abstract
Deep Reinforcement Learning (DRL) shows good performance for optimizing battery energy storage systems (BESS) coordinated operations with photovoltaic plants (PV), yet most studies rely on simulations. Bridging the gap to practical application requires validation using real-world operational data. This paper provides such empirical evidence by developing and rigorously evaluating an integrated forecast-and-control framework on three distinct utility-scale PV-BESS assets in Chile. The framework couples a Sequence-to-Sequence (Seq2Seq) LSTM for point forecaster embedded in a probabilistic scenario-generation pipeline of PV generation with nodal prices with DRL agents (Proximal Policy Optimization – PPO and Soft Actor-Critic – SAC) trained on 1,000 generated scenarios per site. Using two years (2022–2023) of operational plant data, meteorology, and market prices, we benchmark DRL policies against theoretical limits (Oracle), a deterministic predict-then-optimize baseline, a scenario-based model predictive control (MPC), and a random Dummy policy over 14-day horizons using a 900/100 train–test split. The Seq2Seq forecaster improves accuracy (e.g., 34.5% reduction in RMSE for prices vs. SARIMAX). We find that the DRL agents consistently outperform the predict–then–optimize baseline, achieving mean 14-day profits near USD 55k, and exhibiting robust, adaptive contracyclical behavior without excessive cycling. Our study provides a reproducible blueprint and empirical validation for data-driven BESS control, demonstrating its practical viability and economic benefits in real-world operating conditions.