Data Engineering

by gdoron • New Contributor

05-29-2023 4:16:47 PM

2183 Views
2 replies
0 kudos

using pyspark can I write to an s3 path I don't have GetObject permission to?

After spark finishes writing the dataframe to S3, it seems like it checks the validity of the files it wrote with: `getFileStatus` that is `HeadObject` behind the scenes.What if I'm only granted write and list objects permissions but not GetObject? I...

Data Engineering

2183 Views
2 replies
0 kudos

05-29-2023 4:16:47 PM

View Replies

Latest Reply

Lakshay
Databricks Employee

05-30-2023 4:00:29 AM

0 kudos

It is not possible in my opinion.

0 kudos

05-30-2023 4:00:29 AM

1 More Replies

by mimezzz • Contributor

11-02-2022 6:46:44 PM

12177 Views
8 replies
10 kudos

Resolved! Dataframe rows missing after write_to_delta and read_from_delta

Hi, i am trying to load mongo into s3 using pyspark 3.1.1 by reading them into a parquet. My code snippets are like:df = spark \ .read \ .format("mongo") \ .options(**read_options) \ .load(schema=schema)df = df.coalesce(64)write_df_to_del...

Data Engineering

12177 Views
8 replies
10 kudos

11-02-2022 6:46:44 PM

View Replies

Latest Reply

mimezzz
Contributor

01-26-2023 9:45:26 PM

10 kudos

So i think i have solved the mystery here it was to do with the retention config. By setting the retentionEnabled to True and rention hours being 0, we somewhat loses a few rows in the first file as they were mistaken as files from last session and ...

10 kudos

01-26-2023 9:45:26 PM

7 More Replies

Databricks Community

Forum Posts

using pyspark can I write to an s3 path I don't have GetObject permission to?

Resolved! Dataframe rows missing after write_to_delta and read_from_delta