Topics with Label: Parquet file writes

Forum Posts

Sorted by:

by Nazar • New Contributor II

09-23-2021 3:06:15 PM

5858 Views
3 replies
4 kudos

Resolved! Incremental write

Hi All,I have a daily spark job that reads and joins 3-4 source tables and writes the df in a parquet format. This data frame consists of 100+ columns. As this job run daily, our deduplication logic identifies the latest record from each of source t...

Data Engineering

5858 Views
3 replies
4 kudos

09-23-2021 3:06:15 PM

View Replies

Latest Reply

Nazar
New Contributor II

09-27-2021 2:55:33 PM

4 kudos

Thanks werners

4 kudos

09-27-2021 2:55:33 PM

2 More Replies

by rami1 • New Contributor II

07-29-2021 7:56:45 AM

889 Views
0 replies
0 kudos

Data bricks Write Performance

I have a requirement to replay ingestion from landing data and build silver table. I am trying to write delta file from raw Avro files based in landing zone. The raw files are located in folder based on date. I am currently using streaming to read d...

Data Engineering

889 Views
0 replies
0 kudos

07-29-2021 7:56:45 AM

by prakharjain • New Contributor

03-02-2020 10:34:16 AM

21367 Views
2 replies
0 kudos

Resolved! I need to edit my parquet files, and change field name, replacing space by underscore

Hello, I am facing trouble as mentioned in following topics in stackoverflow, https://stackoverflow.com/questions/45804534/pyspark-org-apache-spark-sql-analysisexception-attribute-name-contains-inv https://stackoverflow.com/questions/38191157/spark-...

Data Engineering

21367 Views
2 replies
0 kudos

03-02-2020 10:34:16 AM

View Replies

Latest Reply

DimitriBlyumin
New Contributor III

05-21-2020 4:48:22 AM

0 kudos

One option is to use something other than Spark to read the problematic file, e.g. Pandas, if your file is small enough to fit on the driver node (Pandas will only run on the driver). If you have multiple files - you can loop through them and fix on...

0 kudos

05-21-2020 4:48:22 AM

1 More Replies

by 1stcommander • New Contributor II

11-11-2019 6:10:40 AM

8747 Views
2 replies
0 kudos

Parquet partitionBy - date column to nested folders

Hi, when writing a DataFrame to parquet using partitionBy(<date column>), the resulting folder structure looks like this: root |----------------- day1 |----------------- day2 |----------------- day3 Is it possible to create a structure like to foll...

Data Engineering

8747 Views
2 replies
0 kudos

11-11-2019 6:10:40 AM

View Replies

Latest Reply

Saphira
New Contributor II

11-13-2019 6:09:41 AM

0 kudos

Hey @1stcommander You'll have to create those columns yourself. If it's something you will have to do often you could always write a function. In any case, imho it's not that much work. Im not sure what your problem is with the partition pruning. It...

0 kudos

11-13-2019 6:09:41 AM

1 More Replies

Databricks Community

Resolved! Incremental write

Data bricks Write Performance

Resolved! I need to edit my parquet files, and change field name, replacing space by underscore

Parquet partitionBy - date column to nested folders