Since yesterday, reading a file copied into the cluster is no longer working.
What used to work:
blob = gcs_bucket.get_blob("dev/data.ndjson") -> works
blob.download_to_filename("/tmp/data-copy.ndjson") -> works
df = spark.read.json("/tmp/data-copy.ndjson") -> fails
When calling os.listdir('/tmp'), the file is listed as expected.
This worked yesterday. Has something changed?