Databricks Community

Rjdudley · ‎01-10-2025

This might be a stupid question but there's just no mention of what to do here. I'm looking at the blog (https://www.databricks.com/blog/simplify-data-ingestion-new-python-data-source-api) and documentation (https://learn.microsoft.com/en-us/azure/databricks/pyspark/datasources) for the Python Data Source API, and I don't see how to deploy the custom library. Do we need to create a wheel file and upload it? Do we use regular .py files in our workspace and %run them? Any guidance would be appreciated.

Alberto_Umana · ‎01-10-2025

Hi @Rjdudley,

Thanks for your question - You can create regular .py files in your workspace and use the %run magic command to include them in your notebooks. This method is straightforward and good for development and testing.

%run /path/to/your/custom_datasource_file

For a more production-ready approach, you can create a wheel file of your custom data source implementation and upload it to your cluster or workspace. This method is preferred for sharing across multiple notebooks or jobs

•Package your code into a wheel file

•Upload the wheel file to your Databricks workspace or a accessible location (e.g., DBFS)

•Install the wheel file on your cluster using init scripts or pip install commands

You can also package your custom data source as a library and install it directly on your cluster

View solution in original post

Alberto_Umana · ‎01-10-2025