yesterday
- last edited
7 hours ago
by
Sujitha
Hello Community,
Let me start off with a quick question:
Have you ever...
Migrated your workloads from on-prem Spark to Databricks and encountered a bug and thought, “I wish I could repro this locally to debug the issue without burning cluster hours?”
If yes, this project is literally built for you.
------------------------------------------------------------------------------------------
I’ve open-sourced a project called Spark with Hadoop Anywhere – a production-like Spark + Hadoop + Hive stack on Docker, specifically designed for:
On-prem Open source Spark and Databricks users (Support, SREs, Data Eng etc.)
🔗 Project page (docs + overview):
https://anudeepkonaboina.github.io/spark-with-hadoop-anywhere/
🔗 GitHub repo (please ⭐ and fork!):
https://github.com/AnudeepKonaboina/spark-with-hadoop-anywhere
Each Git branch is pinned to a specific Spark / Scala / Java combo, aligned with the OSS Spark versions used by Databricks Runtime (DBR).
More Details here: https://anudeepkonaboina.github.io/spark-with-hadoop-anywhere/#dbr-underlying-spark-oss-compatible-b...
You don’t just get a Spark binary thrown in a container.
You get a single-node analytics stack:
Spark (standalone: master + worker in one container)
setup-spark.shThe entire stack is orchestrated through a single script, and just by running a single command, you will have a single-node cluster-like setup on your laptop
This project was born out of exactly the kind of pain users working with OSS spark or Databricks face:
Support / SRE / PS engineers needing fast, realistic repros
If that sounds like you, I’d honestly love it if you:
Fork the repo
---------------------------------
👉 Fork and explore it & star the repo if you like it:
https://github.com/AnudeepKonaboina/spark-with-hadoop-anywhere
👉 Docs/overview (easier to share inside your team):
https://anudeepkonaboina.github.io/spark-with-hadoop-anywhere/
If you end up using this to debug a challenging Spark/HDFS/Hive or DBR issue, please leave a comment or open an issue in the repository – I’d love to hear about your experience and what would make the stack even more useful for the Databricks community.
yesterday
This is amazing, @K_Anudeep. Users will benefit from this.
yesterday
Fantastic @K_Anudeep, this is truly amazing, and this will help a lot and provide such a lightweight environment.
yesterday
This is fantastic @K_Anudeep, and really helpful.
8 hours ago
Hey, I have tried this out on my laptop and it hardly takes 3 minutes to setup a cluster like env locally..This is really helpful.Great work @K_Anudeep 👏👏
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now