cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

(Episode 1: Getting Data In) - Learning Databricks one brick at a time, using the Free Edition

BS_THE_ANALYST
Esteemed Contributor

Episode 1: Getting Data In
Learning Databricks one brick at a time, using the Free Edition.

Project Intro
Welcome to everyone reading. My name’s Ben, a.k.a BS_THE_ANALYST, and I’m going to share my experiences as I learn the world of Databricks. My objective is to master Data Engineering on Databricks, weave in AI & ML, and perform analyses.

I hope this content serves two purposes. Firstly, a motivation for people who want to explore and hone their skills. Secondly, for seasoned Databricks users, I’m keen to learn best practices. I encourage you to reach out or provide feedback on what you’d do differently. This is the beauty of learning.

Today’s Objectives
So let’s begin with Getting Data Into Databricks. This step is no small feat; it could be a series in itself, as there’s much to consider depending on project requirements.

In today’s world, Unity Catalog is the recommended way to manage and store data in Databricks. For older users not leveraging Unity Catalog, DBFS or mounted cloud storage are still valid approaches. For those wondering, DBFS was the default storage location for the Unity Catalog’s predecessor, the hive_metastore. If you’d like a solid breakdown of Unity Catalog, here’s a great read:
👉Understanding Unity Catalog

Project
Picture this: I’m a user wanting to get some data into Databricks. On my computer, I’ve got some flat files (CSVs, maybe an XLSX). The first question I ask myself:

Do I want to manually upload the data, or do it programmatically?

Typically, manual is the route to prove a concept works. If I need automation later, I’ll convert that process into something programmatic.

Manually Uploading CSV (Data) to a Unity Catalog Volume
Uploading manually is straightforward:

  1. Press New
  2. Select Add or Upload Data
  3. Select Upload files to Volume
    • Think of a volume as a storage bucket inside Unity Catalog where you can drop files.
    • There’s also a Create or Modify Table option, which can be a good starting point. However, if you check the supported file types, you’ll notice .xlsx is not an option. That’s a topic we’ll dive deeper into in a future episode.

BS_THE_ANALYST_14-1758564518121.png

BS_THE_ANALYST_15-1758564518126.png


In the GIF below, you can see me selecting a Catalog, selecting (or create) a schema, selecting (or create a Volume). Then select file to upload.
Adobe Express - 2025-09-21 14-06-29.gif

Just like that, your file will be available to use in Databricks. You can view your data in a notebook or using the SQL editor like I am below:
BS_THE_ANALYST_18-1758564518403.png

Programmatically Uploading CSV (data) to a Unity Catalog Volume

DISCUSSION
We’ve got a few choices to upload data programmatically from our local machine to Databricks. The CLI, SDK, or API. Here’s a great explanation about what each of them offers: https://alexott.blogspot.com/2024/09/databricks-sdks-vs-cli-vs-rest-apis-vs.html

For today, we’ll be using the Databricks CLI. There’s fantastic documentation on how to do this: https://docs.databricks.com/aws/en/dev-tools/cli/tutorial.

Let me just point out the following useful parts:
In the navigation pane on the Left Hand Side you’ll find the tutorial that helps you install and set up the CLI to access your Databricks Environment. Make sure to select the correct Operating System as seen in the picture below. The Command Reference section contains EVERYTHING. It’s the gold mine for CLI commands.

BS_THE_ANALYST_29-1758565259943.png


AUTHENTICATION
If you’re curious, I authenticated with a Personal Access Token:
Generate a Personal Access Token 👇
https://docs.databricks.com/aws/en/dev-tools/auth/pat
BS_THE_ANALYST_20-1758564518410.png
EXAMPLE OF ME USING THE CLI TO UPLOAD A CSV INTO A VOLUME
Installing
1. I’m on Windows and needed to download the CLI. I used command prompt and command prompt’s winget to download the CLI: BS_THE_ANALYST_21-1758564518411.png

Authenticating
2. Enter your databricks host and personal access token to authenticate
BS_THE_ANALYST_22-1758564518413.png

Creating a Schema in a Catalog
3. Using the docs with CLI Commands, I want to create a schema (database) and a volume within it. My catalog is called Workspace. If you want to create a fresh catalog, you can use the CLI if you’d like. Recall, there’s a 3 level name-space with the unity catalog: Catalog>Schema>Volume/Table/Model
https://docs.databricks.com/aws/en/dev-tools/cli/reference/   

Action 👉 I'm Creating a Schema in my Catalog called “ZW_Bootcamp”, note catalog is called "workspace"
BS_THE_ANALYST_23-1758564518424.png

Creating a Volume in a Schema
4. Create Volume called "datadumps" in the Schema called "ZW_Bootcamp"
BS_THE_ANALYST_24-1758564518436.png

Uploading CSV from local machine into Volume
5. Upload CSV into the Volume you created. For this, we'll be using the FS (file system) command group, which allows us to perform file system operations. Documentation here: https://docs.databricks.com/aws/en/dev-tools/cli/reference/fs-commands 
... note, I got caught out on this part, make sure you prefix your volume path with "DBFS" as seen in the pictures below. The top line is my command 

databricks fs cp "{PATH TO CSV HERE}" "DBFS:/Volumes/{catalog}/{schema}/{volume}"

BS_THE_ANALYST_25-1758564518439.png

BS_THE_ANALYST_26-1758564518442.png

Verify your upload 🙌🥳🍾
6.   Check the Databricks UI to see if the upload was successful and voila

BS_THE_ANALYST_27-1758564518445.png

Till next time
That’s all for now, folks! There’s still plenty to uncover! What about hitting data sources that aren’t on our machines? How do we interact with APIs? Where do we store the credentials in our pipelines? How do we interact with Databases? There’s still so much in store for connecting to Data. How do we automate everything? So much to learn, so little time. 

ThisIsFineGIF.gif
All the best,
BS

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now