Re: PySpark AnalysisException: Ambiguous reference...

VikasM · a month ago

Hello Balajij8,

I just wanted to let you know that the issue I posted regarding Spark not writing Parquet files was actually due to my own mistake.

I had mounted the data volume only in my Spark job (driver/scheduler) container instead of the Spark worker container. Since the worker executes the tasks and writes the output, the Parquet files were being stored inside the worker's filesystem. I was checking the job container, which only had the checkpoint and metadata directories because those were the only volumes I had mounted in my Docker Compose configuration.

Thank you for your time and for helping me investigate the issue. I really appreciate your guidance.