Databricks Community

BS_THE_ANALYST · ‎09-22-2025

Episode 1: Getting Data In
Learning Databricks one brick at a time, using the Free Edition.

Project Intro
Welcome to everyone reading. My name’s Ben, a.k.a BS_THE_ANALYST, and I’m going to share my experiences as I learn the world of Databricks. My objective is to master Data Engineering on Databricks, weave in AI & ML, and perform analyses.

I hope this content serves two purposes. Firstly, a motivation for people who want to explore and hone their skills. Secondly, for seasoned Databricks users, I’m keen to learn best practices. I encourage you to reach out or provide feedback on what you’d do differently. This is the beauty of learning.

Today’s Objectives
So let’s begin with Getting Data Into Databricks. This step is no small feat; it could be a series in itself, as there’s much to consider depending on project requirements.

In today’s world, Unity Catalog is the recommended way to manage and store data in Databricks. For older users not leveraging Unity Catalog, DBFS or mounted cloud storage are still valid approaches. For those wondering, DBFS was the default storage location for the Unity Catalog’s predecessor, the hive_metastore. If you’d like a solid breakdown of Unity Catalog, here’s a great read:
👉Understanding Unity Catalog

Project
Picture this: I’m a user wanting to get some data into Databricks. On my computer, I’ve got some flat files (CSVs, maybe an XLSX). The first question I ask myself:

Do I want to manually upload the data, or do it programmatically?

Typically, manual is the route to prove a concept works. If I need automation later, I’ll convert that process into something programmatic.

Manually Uploading CSV (Data) to a Unity Catalog Volume
Uploading manually is straightforward:

Press New
Select Add or Upload Data
Select Upload files to Volume

Think of a volume as a storage bucket inside Unity Catalog where you can drop files.
There’s also a Create or Modify Table option, which can be a good starting point. However, if you check the supported file types, you’ll notice .xlsx is not an option. That’s a topic we’ll dive deeper into in a future episode.

In the GIF below, you can see me selecting a Catalog, selecting (or create) a schema, selecting (or create a Volume). Then select file to upload.

Just like that, your file will be available to use in Databricks. You can view your data in a notebook or using the SQL editor like I am below:

Programmatically Uploading CSV (data) to a Unity Catalog Volume

DISCUSSION
We’ve got a few choices to upload data programmatically from our local machine to Databricks. The CLI, SDK, or API. Here’s a great explanation about what each of them offers: https://alexott.blogspot.com/2024/09/databricks-sdks-vs-cli-vs-rest-apis-vs.html

For today, we’ll be using the Databricks CLI. There’s fantastic documentation on how to do this: https://docs.databricks.com/aws/en/dev-tools/cli/tutorial.

Let me just point out the following useful parts:
In the navigation pane on the Left Hand Side you’ll find the tutorial that helps you install and set up the CLI to access your Databricks Environment. Make sure to select the correct Operating System as seen in the picture below. The Command Reference section contains EVERYTHING. It’s the gold mine for CLI commands.

AUTHENTICATION
If you’re curious, I authenticated with a Personal Access Token:
Generate a Personal Access Token 👇
https://docs.databricks.com/aws/en/dev-tools/auth/pat

EXAMPLE OF ME USING THE CLI TO UPLOAD A CSV INTO A VOLUME
Installing
1. I’m on Windows and needed to download the CLI. I used command prompt and command prompt’s winget to download the CLI:

Authenticating
2. Enter your databricks host and personal access token to authenticate

Creating a Schema in a Catalog
3. Using the docs with CLI Commands, I want to create a schema (database) and a volume within it. My catalog is called Workspace. If you want to create a fresh catalog, you can use the CLI if you’d like. Recall, there’s a 3 level name-space with the unity catalog: Catalog>Schema>Volume/Table/Model
https://docs.databricks.com/aws/en/dev-tools/cli/reference/

Action 👉 I'm Creating a Schema in my Catalog called “ZW_Bootcamp”, note catalog is called "workspace"

Creating a Volume in a Schema
4. Create Volume called "datadumps" in the Schema called "ZW_Bootcamp"

Uploading CSV from local machine into Volume
5. Upload CSV into the Volume you created. For this, we'll be using the FS (file system) command group, which allows us to perform file system operations. Documentation here: https://docs.databricks.com/aws/en/dev-tools/cli/reference/fs-commands
... note, I got caught out on this part, make sure you prefix your volume path with "DBFS" as seen in the pictures below. The top line is my command

databricks fs cp "{PATH TO CSV HERE}" "DBFS:/Volumes/{catalog}/{schema}/{volume}"

Verify your upload 🙌🥳🍾
6. Check the Databricks UI to see if the upload was successful and voila

Till next time
That’s all for now, folks! There’s still plenty to uncover! What about hitting data sources that aren’t on our machines? How do we interact with APIs? Where do we store the credentials in our pipelines? How do we interact with Databases? There’s still so much in store for connecting to Data. How do we automate everything? So much to learn, so little time.

All the best,
BS

TheOC · ‎09-23-2025

Super cool blog @BS_THE_ANALYST
I have to confess, I haven't touched the CLI as much as I'd have liked to. This gives me confidence to go do that though.

Cool to see both ways: Simple via the UI and programmatic through the CLI. Excited to see the next instalment 🫡

Cheers,
TheOC

BS_THE_ANALYST · ‎09-23-2025

Hey @TheOC 👋.

Not sure if you caught this link in the blog: https://alexott.blogspot.com/2024/09/databricks-sdks-vs-cli-vs-rest-apis-vs.html

I'd really advise having a 5 min read through that, it's fab: "CLI is also a home for Databricks Asset Bundles". There's a great discussion around SDK vs CLI vs API.

Looking forward to seeing your next blog, I liked the episode about the Widgets, they're awesome 🐐.

All the best,
BS

Advika · ‎09-23-2025

Wonderfully illustrated, @BS_THE_ANALYST, can’t wait for the next episode of your One Brick at a Time series!!!

BS_THE_ANALYST · ‎09-23-2025

Thanks @Advika! I'm looking forward to doing the next one already 🙌. Blogging is a new area for me, feedback is more than welcome. I appreciate this post was pretty long 🤪😆.

All the best,
BS

bhanu_gautam · ‎09-25-2025

@BS_THE_ANALYST , It looks awesome, I am waiting for Agent bricks to launch in our region , so that I can also try new things, this will be a good starter

Regards
Bhanu Gautam

Kudos are appreciated

BS_THE_ANALYST · ‎09-26-2025

@bhanu_gautam I too am eagerly waiting for Agentbricks in my region 😏.

Although, I believe you'd be able to experiment with it through some of the labs in either the ML or GenAI certification paths 🤝.

@bhanu_gautam I look forward to hearing about your projects aswell 😊.

All the best,

BS

MandyR · ‎09-26-2025

Awesome post! I love your step by step, friendly approach. Great work!

BS_THE_ANALYST · ‎09-30-2025

Thanks, @MandyR. I'm enjoying the blogs! It's got me thinking more about getting video content out there as well. Perhaps the videos could be tutorials and it'd compliment the Databricks series 😊.

Lots to look forward to 🙌.

All the best,
BS

MandyR · ‎10-01-2025

We love video content @BS_THE_ANALYST !

szymon_dybczak · ‎10-01-2025

Thank you, @BS_THE_ANALYST , for sharing this. I didn’t have much time to read it last week since I was preparing for the Databricks Professional exam, but today I finally had the chance, and I have to say - it’s a great article.
I really appreciate that you included a project introduction along with clear objectives to accomplish. This aligns with my learning style - scenario-based and closely tied to real-life projects.

BS_THE_ANALYST · ‎10-01-2025

Thanks for the kind words @szymon_dybczak!

Also, have you sat the exam yet? If not, best of luck, I'm sure you're going to smash it! 🙌👏. Are you going for the Data Engineer Professional Exam? ... I wish your certifications were on your profile so I could see 😂.

All the best,
BS

szymon_dybczak · ‎10-02-2025

Thanks. Yes, I took the Data Engineer Professional Exam and luckily I managed to pass it 😉
If you’re planning to take the exam in the near future, a lot of the questions were about streaming. There were also a few questions related to Delta protocol internals (e.g., statistics in a Delta table are created on how many of the first columns etc.)

BS_THE_ANALYST · ‎10-02-2025

🥳🙌 Congrats @szymon_dybczak. I don't think it was luck, you're a machine 😀😂.

I am planning on getting back on track with the studies, I am moving house next week, and my little brother has recently moved in with me so I've been getting him settled into life in the North of England 😂.

I'm aiming to get the DE Associate cleared within the next 3 weeks. By that time, there might be a blog from @szymon_dybczak detailing the best practices to prepare for the Professional one? 👀. I appreciate you're probably busy but I'd love to try my hand at any projects you put on the community. I would definitely follow them and provide a write up 🙌.

All the best,
BS

szymon_dybczak · ‎10-03-2025

Thanks @BS_THE_ANALYST ! Absolutely, family comes first! It sounds like you’ve had a full plate lately with the move and getting your brother settled - big respect for handling all of that. 😄
And for sure! I’ll try to prepare a list of best practices for anyone preparing for the Databricks Professional Engineer exam.
If you’re aiming for the DE Associate, I think it’s sufficient to follow the Databricks Academy Data Engineering Learning Path. Focus primarily on the notebooks - read every comment carefully, because some of the answers to exam questions can be found there!

Also, check Derar Alhussein test exam. He's currently updating the content and his question are really similar to the ones in real exam

Keep me posted on your progress! 🙌