Hey Everyone, Hope youโre all doing great!
A friend of mine recently went through a data engineering interview at Walmart and shared his experience with me. I thought it would be really useful to pass along what he encountered. The interview had some pretty interesting questions, so letโs dive into one of them, break it down, and figure out how to tackle it. Here we go!
The Interview Story
My friend mentioned that the interview was a mix of technical and practical questions, focusing on how well you can handle real-life data scenarios. One question he found really interesting was about processing sales data, which is pretty common for companies like Walmart. They wanted to see if he could work with big datasets and write efficient PySpark code to get useful insights.
The Question
Hereโs the main question he was asked:
You have a big dataset with daily sales information from different Walmart stores. Each row in the dataset shows details like store ID, product ID, sale date, quantity sold, and total sales amount. You need to write a PySpark program that calculates the following:
- Total sales for each store on a daily basis.
- Top 5 products with the highest sales across all stores for a given day.
- Weekly moving average of sales for each store.
Read full story at - https://medium.com/towards-data-engineering/pyspark-interview-question-inside-a-walmart-data-enginee...
If you enjoy reading my posts and would you like to encourage me to come up with many more, consider clicking 'Kudos'!
Thanks and happy data engineering!