- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday
Spark Structured Streaming writes to file sinks and generally it uses a phased commit by writing temporary files to the output directory followed by writing metadata with references and a final commit by moving/renaming temp files to final names.
You can verify the Docker side volume mount misconfigurations as some docker configurations use temporary filesystems that get cleaned up or a background process removes the files. The files are written but immediately deleted.
You can also verify that /opt/spark/app/data is actually mounted to the host & ensure that the permissions of _spark_metadata directories and the other directories remain the same - read/write for Spark to perform all operations seamlessly.
You can change the code to write data to a path that has read/write access for Spark to perform all operations & validate & confirm.