Resolved! Dataframe rows missing after write_to_delta and read_from_delta
Hi, i am trying to load mongo into s3 using pyspark 3.1.1 by reading them into a parquet. My code snippets are like:df = spark \ .read \ .format("mongo") \ .options(**read_options) \ .load(schema=schema)df = df.coalesce(64)write_df_to_del...
- 6339 Views
- 8 replies
- 10 kudos
Latest Reply
So i think i have solved the mystery here it was to do with the retention config. By setting the retentionEnabled to True and rention hours being 0, we somewhat loses a few rows in the first file as they were mistaken as files from last session and ...
- 10 kudos