Hi Databricks Community,
I built a retail sales forecasting system on Databricks Free Edition using the Rossmann Store Sales dataset — about 1,115 stores with daily sales over two and a half years. The goal was a 48-day forecast, the same horizon as the original Kaggle competition.
The problem is one every retailer has: how much will we sell over the next few weeks? That number drives ordering, staffing, and logistics. A decent short-horizon forecast makes all of those decisions better. This is a real business problem we deal with in our enterprise right now — so this isn't some new innovation, it's a very common need across companies. The difference is that not long ago you had to invest a lot of resources to solve it: dedicated infrastructure, a data engineering team, long build cycles. With today's cloud platforms and features like managed Spark, serverless compute, and Delta Lake, the same thing is far easier and faster to put together — which is part of what I wanted to show with this project.
I started with a baseline — predicting each day from the same weekday a year earlier. That gave an RMSPE of 0.2147, and I made it a rule that nothing ships unless it beats that. I then trained one global LightGBM model across all stores, validated with walk-forward cross-validation (no random splits — they leak future data into the past). It came out at 0.1099, about 49% better than the baseline. An XGBoost model landed at 0.1121, which told me the result was real and not a quirk of one library. A late Kaggle submission scored 0.1203 on the hidden test set — close enough to my own held-out number that I trusted the validation.


After that I built the production side: a retraining pipeline that only promotes a new model if it still beats the baseline, basic drift monitoring, and confidence flags for stores that were recently closed or have thin history. The whole flow is coded as one end-to-end pipeline — new data comes in from Google Drive (via a service account), the history extends, features rebuild, the model retrains, and a new model is only deployed if it passes the quality gate. In a normal setup this would be event-driven: a new file lands and the pipeline fires automatically.

Free Edition has no scheduled jobs or file triggers on serverless, so I had to simulate that — the pipeline is wired to run automatically end to end, but I trigger it manually with one notebook instead of on a file-arrival event. The logic for "new data flows through and only a better model ships" is all there; only the trigger is manual.
The interesting part was working around Free Edition. No MLflow Model Registry, no AutoML, no scheduled jobs on serverless. So I logged experiments to a Delta table instead of MLflow, versioned models as files in a Volume with a registry table, and used XGBoost where I'd have used AutoML. The biggest recurring headache was stale session state on serverless — it looked like bugs but was usually just cells running out of order.
The main thing I took away: most of the work isn't the model, it's the discipline around it — a baseline you have to beat, honest validation, gated retraining, and being upfront about which forecasts are shaky. A single model you can trust beats a complex one you can't maintain.
Thanks,
Daniil