cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gdoron
by New Contributor
  • 1085 Views
  • 2 replies
  • 0 kudos

using pyspark can I write to an s3 path I don't have GetObject permission to?

After spark finishes writing the dataframe to S3, it seems like it checks the validity of the files it wrote with: `getFileStatus` that is `HeadObject` behind the scenes.What if I'm only granted write and list objects permissions but not GetObject? I...

  • 1085 Views
  • 2 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

It is not possible in my opinion.

  • 0 kudos
1 More Replies
mimezzz
by Contributor
  • 4460 Views
  • 8 replies
  • 10 kudos

Resolved! Dataframe rows missing after write_to_delta and read_from_delta

Hi, i am trying to load mongo into s3 using pyspark 3.1.1 by reading them into a parquet. My code snippets are like:df = spark \ .read \ .format("mongo") \ .options(**read_options) \ .load(schema=schema)df = df.coalesce(64)write_df_to_del...

  • 4460 Views
  • 8 replies
  • 10 kudos
Latest Reply
mimezzz
Contributor
  • 10 kudos

So i think i have solved the mystery here it was to do with the retention config. By setting the retentionEnabled to True and rention hours being 0, we somewhat loses a few rows in the first file as they were mistaken as files from last session and ...

  • 10 kudos
7 More Replies
Labels