Databricks Community

mimezzz · 11-02-2022

Hi, i am trying to load mongo into s3 using pyspark 3.1.1 by reading them into a parquet. My code snippets are like:df = spark \ .read \ .format("mongo") \ .options(**read_options) \ .load(schema=schema)df = df.coalesce(64)write_df_to_del...

mimezzz · 01-26-2023

So i think i have solved the mystery here it was to do with the retention config. By setting the retentionEnabled to True and rention hours being 0, we somewhat loses a few rows in the first file as they were mistaken as files from last session and...

mimezzz · 11-29-2022

still havent found an answer to this, just got back from holiday. will keep digging in if i found any cause will update here.

mimezzz · 11-03-2022

hi @Hubert Dudek thanks for the reply, yes maybe worth trying, i am also considering removing format("delta") to see if the issue persists, to diagnose whether this is a delta-related issue

mimezzz · 11-03-2022

hi @May Olszewski thanks for replying. the mode i used was "overwrite" initially already, i forgot to put it in the above demo code sorry as it's predefined. any other sugestions? i also did vacume that directory before writing the new delta table i...

mimezzz · 11-03-2022

Hi Debayan no no error reported thruout

Databricks Community

User Stats

User Activity

Dataframe rows missing after write_to_delta and read_from_delta

Re: Dataframe rows missing after write_to_delta and read_from_delta

Re: Dataframe rows missing after write_to_delta and read_from_delta

Re: Dataframe rows missing after write_to_delta and read_from_delta

Re: Dataframe rows missing after write_to_delta and read_from_delta

Re: Dataframe rows missing after write_to_delta and read_from_delta