Hi all,
I have use case where I need to push the table data to GCS bucket,
query = "${QUERY}"
df = spark.sql(query)
gcs_path = "${GCS_PATH}"
df.write.option("maxRecordsPerFile", int("${MAX_RECORDS_PER_FILE}")).mode("${MODE}").json(gcs_path)
This can push the results of the query to GCS, but this is generating some metadata files in the location
'_started_...'
'_committed_..'
I want to avoid this as I can't easily do a post processing in the bucket. Any help is appreciated.
Thanks,
Aswin Vishnu