Databricks Community

alwaysmoredata · ‎12-30-2024

Is it possible to load data only using Databricks SDK?

I have custom library that has to load data to a table, and I know about other features like autoloader, COPY INTO, notebook with spark dataframe... but I wonder if it is possible to load data directly to a table just using the Databricks SDK using files from local disk.

Thanks

Walter_C · ‎12-30-2024

it is not possible to load data directly into a table using the Databricks SDK by reading files from the local disk. The Databricks SDK primarily focuses on managing and interacting with Databricks resources such as clusters, jobs, and libraries, but it does not provide direct functionality for loading data from local disk files into tables.

However, you can use other methods to achieve this. One approach is to use Databricks file system utilities (dbutils.fs) to move files from the local disk to DBFS (Databricks File System) and then use Spark to load the data into a table. Here is a general outline of the steps:

Move Files to DBFS: Use dbutils.fs.cp or %fs cp to copy files from the local disk to DBFS.
Load Data with Spark: Use Spark to read the files from DBFS and load them into a table.

alwaysmoredata · ‎12-30-2024

Thanks for the incredibly quick reply - you’re faster than any AI assistant on the market!

I had a feeling this was the case, but it’s great to have it confirmed.

What options do I have for loading data without using Spark? I’m working with a custom library that I’d like data scientists to use within Databricks notebooks to upload data in a standardized way. My library doesn't use spark, currently using "COPY INTO from s3" but I wonder if I can upload files from notebook avoiding have to configure something like s3 (stage).

Would it be a bad idea to upload the files to a Volume and then execute a COPY INTO command to load the data from the Volume? Is this the only option available? I tried to run COPY INTO from workspace file system but it failed with a forbidden error.

Additionally, can I use the following snippet to upload data to DBFS?

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

dbutils = w.dbutils

dbutils.fs.cp ...

Walter_C · ‎12-30-2024

Can you provide the specific error message when you tried to load the data from the workspace file system?

In regards your SDK code, it seems to be correct to load data to DBFS.

alwaysmoredata · ‎12-30-2024

SparkConnectGrpcException: (java.lang.SecurityException) Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbidden

COPY INTO my_company.test.event_sample

FROM 'file:/Workspace/Users/alwaysmoredata@mycompany.com/sample.json'

FILEFORMAT = JSON;

Walter_C · ‎12-30-2024

Are you using a shared access mode cluster to run this? If yes, can you try it with single user mode?

alwaysmoredata · ‎12-30-2024

Currently using the Serverless compute but ideally my custom library shouldn't limit the cluster choice.

It feels like I only have some options:

- upload data to volume and run COPY INTO

- upload data to DBFS and run COPY INTO

- or leverage the pre-configured spark client session, and use spark in my custom library

I am not a databricks expert - please correct me if I am wrong.

szymon_dybczak · ‎12-30-2024

Hi @alwaysmoredata ,

Uploading data to DBFS and then running COPY INTO won't work if you want to use cluster with shared access mode. This is because in Shared access mode, the driver node's local file system is not accessible.

So, based on your requirements the best way is to upload data to volume and run COPY INTO command.

Walter_C · ‎12-30-2024

Got it, the reason of the cluster is because with shared access cluster the access to local system is more restricted that with single user cluster due to security constraints. As you are using serverless it acts as a shared cluster, on this case your above statements will be correct with the usage of serverless.

Databricks Community

Is it possible to load data only using Databricks SDK?

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐