- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday - last edited yesterday
I'm working with Databricks declarative pipelines and have defined a custom PySpark data source (CDS) in its own standalone Python module. I include this module as part of the pipeline resources.
What I find interesting is that, even without explicitly importing this module in my pipeline code, the custom data source is registered and available when I reference it with spark.read.format("my_custom_source")
Iām trying to understand how Databricks manages the registration and discovery of custom data sources in this scenario. Specifically:
- Does Databricks automatically scan and execute code from modules included as pipeline resources for data source registration when a custom format is referenced?
- Is there any documentation or explanation for this behavior?
Any insights or pointers to relevant documentation would be greatly appreciated!
Thanks in advance!