Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-18-2022 01:11 PM
I think solution for your problem is use auto loader stream to read data as it support schema hints. If you don't want to use it as stream is enough to specify there trigger once (so once all json are loaded it will finish a job).
Here is about loading json:
https://docs.databricks.com/spark/latest/structured-streaming/auto-loader-json.html
then you can specify schema hints:
https://docs.databricks.com/spark/latest/structured-streaming/auto-loader-schema.html#schema-hints
additionally you can experiment with different schema evolution options for stream
My blog: https://databrickster.medium.com/