Get Started Discussions

Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.

Hey Everyone, Hope you’re all doing great!

A friend of mine recently went through a data engineering interview at Walmart and shared his experience with me. I thought it would be really useful to pass along what he encountered. The interview had some pretty interesting questions, so let’s dive into one of them, break it down, and figure out how to tackle it. Here we go!

The Interview Story

My friend mentioned that the interview was a mix of technical and practical questions, focusing on how well you can handle real-life data scenarios. One question he found really interesting was about processing sales data, which is pretty common for companies like Walmart. They wanted to see if he could work with big datasets and write efficient PySpark code to get useful insights.

The Question

Here’s the main question he was asked:

You have a big dataset with daily sales information from different Walmart stores. Each row in the dataset shows details like store ID, product ID, sale date, quantity sold, and total sales amount. You need to write a PySpark program that calculates the following: