cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Deploying Data Source API code

Rjdudley
Contributor II

This might be a stupid question but there's just no mention of what to do here.  I'm looking at the blog (https://www.databricks.com/blog/simplify-data-ingestion-new-python-data-source-api) and documentation (https://learn.microsoft.com/en-us/azure/databricks/pyspark/datasources) for the Python Data Source API, and I don't see how to deploy the custom library.  Do we need to create a wheel file and upload it?  Do we use regular .py files in our workspace and %run them?  Any guidance would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Rjdudley,

Thanks for your question - You can create regular .py files in your workspace and use the %run magic command to include them in your notebooks. This method is straightforward and good for development and testing.

%run /path/to/your/custom_datasource_file

For a more production-ready approach, you can create a wheel file of your custom data source implementation and upload it to your cluster or workspace. This method is preferred for sharing across multiple notebooks or jobs

โ€ขPackage your code into a wheel file

โ€ขUpload the wheel file to your Databricks workspace or a accessible location (e.g., DBFS)

โ€ขInstall the wheel file on your cluster using init scripts or pip install commands

You can also package your custom data source as a library and install it directly on your cluster

View solution in original post

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Rjdudley,

Thanks for your question - You can create regular .py files in your workspace and use the %run magic command to include them in your notebooks. This method is straightforward and good for development and testing.

%run /path/to/your/custom_datasource_file

For a more production-ready approach, you can create a wheel file of your custom data source implementation and upload it to your cluster or workspace. This method is preferred for sharing across multiple notebooks or jobs

โ€ขPackage your code into a wheel file

โ€ขUpload the wheel file to your Databricks workspace or a accessible location (e.g., DBFS)

โ€ขInstall the wheel file on your cluster using init scripts or pip install commands

You can also package your custom data source as a library and install it directly on your cluster

Rjdudley
Contributor II

@Alberto_Umana Brilliant, thank you!

Alberto_Umana
Databricks Employee
Databricks Employee

You're very welcome!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group