How does Databricks handle registration and discov...

mnissen1337 · ‎06-30-2026

I'm working with Databricks declarative pipelines and have defined a custom PySpark data source (CDS) in its own standalone Python module. I include this module as part of the pipeline resources.

What I find interesting is that, even without explicitly importing this module in my pipeline code, the custom data source is registered and available when I reference it with spark.read.format("my_custom_source")

I’m trying to understand how Databricks manages the registration and discovery of custom data sources in this scenario. Specifically:

Does Databricks automatically scan and execute code from modules included as pipeline resources for data source registration when a custom format is referenced?
Is there any documentation or explanation for this behavior?

Any insights or pointers to relevant documentation would be greatly appreciated!

Thanks in advance!

How does Databricks handle registration and discovery of custom PySpark data sources in SDPs?