Avoiding metadata information when sending data to...

aswinvishnu · ‎05-17-2025

Hi all,

I have use case where I need to push the table data to GCS bucket,

query = "${QUERY}"

df = spark.sql(query)

gcs_path = "${GCS_PATH}"

df.write.option("maxRecordsPerFile", int("${MAX_RECORDS_PER_FILE}")).mode("${MODE}").json(gcs_path)

This can push the results of the query to GCS, but this is generating some metadata files in the location
'_started_...'

'_committed_..'

I want to avoid this as I can't easily do a post processing in the bucket. Any help is appreciated.

Thanks,

Aswin Vishnu

Avoiding metadata information when sending data to GCS