Hello Community,
Let me start off with a quick question:
Have you ever...
If yes, this project is literally built for you.
------------------------------------------------------------------------------------------
Meet Spark with Hadoop Anywhere
I’ve open-sourced a project called Spark with Hadoop Anywhere – a production-like Spark + Hadoop + Hive stack on Docker, specifically designed for:
🔗 Project page (docs + overview):
https://anudeepkonaboina.github.io/spark-with-hadoop-anywhere/
🔗 GitHub repo (please ⭐ and fork!):
https://github.com/AnudeepKonaboina/spark-with-hadoop-anywhere
Key features (why you should at least fork it 👇)
1. DBR-aligned version matrix
Each Git branch is pinned to a specific Spark / Scala / Java combo, aligned with the OSS Spark versions used by Databricks Runtime (DBR).
More Details here: https://anudeepkonaboina.github.io/spark-with-hadoop-anywhere/#dbr-underlying-spark-oss-compatible-b...
2. Full analytics node, not just Spark
You don’t just get a Spark binary thrown in a container.
You get a single-node analytics stack:
3. One-command setup via setup-spark.sh
The entire stack is orchestrated through a single script, and just by running a single command, you will have a single-node cluster-like setup on your laptop
Why I’m posting this on Databricks Community
This project was born out of exactly the kind of pain users working with OSS spark or Databricks face:
If that sounds like you, I’d honestly love it if you:
-
Fork the repo
- Spin up the stack for the Spark version you care about
- Try reproducing one of your current/old issues
- Share feedback/issues / PRs
---------------------------------
Interested??
👉 Fork and explore it & star the repo if you like it:
https://github.com/AnudeepKonaboina/spark-with-hadoop-anywhere
👉 Docs/overview (easier to share inside your team):
https://anudeepkonaboina.github.io/spark-with-hadoop-anywhere/
If you end up using this to debug a challenging Spark/HDFS/Hive or DBR issue, please leave a comment or open an issue in the repository – I’d love to hear about your experience and what would make the stack even more useful for the Databricks community.
Anudeep