cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mimezzz
by Contributor
  • 2642 Views
  • 8 replies
  • 10 kudos

Resolved! Dataframe rows missing after write_to_delta and read_from_delta

Hi, i am trying to load mongo into s3 using pyspark 3.1.1 by reading them into a parquet. My code snippets are like:df = spark \ .read \ .format("mongo") \ .options(**read_options) \ .load(schema=schema)df = df.coalesce(64)write_df_to_del...

  • 2642 Views
  • 8 replies
  • 10 kudos
Latest Reply
mimezzz
Contributor
  • 10 kudos

So i think i have solved the mystery here it was to do with the retention config. By setting the retentionEnabled to True and rention hours being 0, we somewhat loses a few rows in the first file as they were mistaken as files from last session and ...

  • 10 kudos
7 More Replies
SRK
by Contributor III
  • 4059 Views
  • 2 replies
  • 0 kudos

How to get the count of dataframe rows when reading through spark.readstream using batch jobs?

I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.My CODE:df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx") .option("subscr...

  • 4059 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Honored Contributor III
  • 0 kudos

You can try this approach:https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733ReadStream is running a thread in background so there's no easy way like df.show().

  • 0 kudos
1 More Replies
Labels