Pyspark Interview Question: Inside a Walmart Data Engineering Interview

Brahmareddy — Tue, 20 Aug 2024 19:45:40 GMT

Hey Everyone, Hope you’re all doing great!

A friend of mine recently went through a data engineering interview at Walmart and shared his experience with me. I thought it would be really useful to pass along what he encountered. The interview had some pretty interesting questions, so let’s dive into one of them, break it down, and figure out how to tackle it. Here we go!

The Interview Story

My friend mentioned that the interview was a mix of technical and practical questions, focusing on how well you can handle real-life data scenarios. One question he found really interesting was about processing sales data, which is pretty common for companies like Walmart. They wanted to see if he could work with big datasets and write efficient PySpark code to get useful insights.

The Question

Here’s the main question he was asked:

You have a big dataset with daily sales information from different Walmart stores. Each row in the dataset shows details like store ID, product ID, sale date, quantity sold, and total sales amount. You need to write a PySpark program that calculates the following:

Total sales for each store on a daily basis.
Top 5 products with the highest sales across all stores for a given day.
Weekly moving average of sales for each store.

Read full story at - https://medium.com/towards-data-engineering/pyspark-interview-question-inside-a-walmart-data-engineering-interview-defe9c730aa5?sk=6989d7692d538af93a8fbbf53651d56f

If you enjoy reading my posts and would you like to encourage me to come up with many more, consider clicking 'Kudos'!

Thanks and happy data engineering!

topic Pyspark Interview Question: Inside a Walmart Data Engineering Interview in Get Started Discussions

Pyspark Interview Question: Inside a Walmart Data Engineering Interview

The Interview Story

The Question