Re: PySpark AnalysisException: Ambiguous reference...

balajij8 · yesterday

Spark Structured Streaming writes to file sinks and generally it uses a phased commit by writing temporary files to the output directory followed by writing metadata with references and a final commit by moving/renaming temp files to final names.

You can verify the Docker side volume mount misconfigurations as some docker configurations use temporary filesystems that get cleaned up or a background process removes the files. The files are written but immediately deleted.

You can also verify that /opt/spark/app/data is actually mounted to the host & ensure that the permissions of _spark_metadata directories and the other directories remain the same - read/write for Spark to perform all operations seamlessly.

You can change the code to write data to a path that has read/write access for Spark to perform all operations & validate & confirm.