Topics with Label: Parquet files

Forum Posts

Sorted by:

by gbalboa • New Contributor

05-19-2022 8:01:54 AM

8661 Views
1 replies
4 kudos

Resolved! How do temp views actually work?

So I'm querying data from parquet files that have a couple of billions records (table 1 or t1), and then have to filter and then join with other parquet files with another couple of billions records (t2). This takes quite a long time to run (like 10h...

Data Engineering

8661 Views
1 replies
4 kudos

05-19-2022 8:01:54 AM

View Replies

Latest Reply

PeteStern
New Contributor III

05-20-2022 8:42:23 AM

4 kudos

Your intuition about views is correct. Views are not materialized, so they are basically just a saved query. Every time you access a view it will have to be recomputed. This is certainly not ideal if it take a long time (like 10hrs) to materialize a ...

4 kudos

05-20-2022 8:42:23 AM

by JEAG • New Contributor III

10-19-2021 1:44:06 AM

24141 Views
15 replies
5 kudos

Resolved! Error writing parquet files

Hi, we are having this chain of errors every day in different files and processes:An error occurred while calling o11255.parquet.: org.apache.spark.SparkException: Job aborted.Caused by: org.apache.spark.SparkException: Job aborted due to stage failu...

Data Engineering

24141 Views
15 replies
5 kudos

10-19-2021 1:44:06 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-24-2022 2:59:18 PM

5 kudos

Hi @Jose Eliseo Aznarte Garcia , This is expected behaviour when you update some rows in the table and immediately query the table.From the error message: It is possible the underlying files have been updated. You can explicitly invalidate the cache...

5 kudos

02-24-2022 2:59:18 PM

14 More Replies

by fff_ds • New Contributor

02-15-2022 1:05:44 PM

731 Views
1 replies
1 kudos

Manual overwrite in s3 console of a collection of parquet files and now we can't read them.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while rea...

Data Engineering

731 Views
1 replies
1 kudos

02-15-2022 1:05:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2022 8:24:05 AM

1 kudos

Hello, @Lili Ehrlich. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.Thanks in advance for your patience.

1 kudos

02-16-2022 8:24:05 AM

by AzureDatabricks • New Contributor III

11-21-2021 11:25:29 PM

2767 Views
8 replies
4 kudos

Resolved! Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded

Truncate False not working in Delta table. df_delta.show(df_delta.count(),False)Computer size Single Node - Standard_F4S - 8GB Memory, 4 coresHow much max data we can persist in Delta table in Parquet file and How fast we can retrieve data.

Data Engineering

2767 Views
8 replies
4 kudos

11-21-2021 11:25:29 PM

View Replies

Latest Reply

AzureDatabricks
New Contributor III

11-22-2021 7:47:01 PM

4 kudos

thank you !!!

4 kudos

11-22-2021 7:47:01 PM

7 More Replies

by Gapy • New Contributor II

10-31-2021 6:43:09 AM

1032 Views
1 replies
1 kudos

Auto Loader Schema-Inference and Evolution for parquet files

Dear all,will (and when) will Auto Loader also support Schema-Inference and Evolution for parquet files, at this point it is only for JSON and CSV supported if i am not mistaken?Thanks and regards,Gapy

Data Engineering

1032 Views
1 replies
1 kudos

10-31-2021 6:43:09 AM

View Replies

Latest Reply

Sandeep
Contributor III

11-10-2021 7:46:01 AM

1 kudos

@Gasper Zerak , This will be available in near future (DBR 10.3 or later). Unfortunately, we don't have an SLA at this moment.

1 kudos

11-10-2021 7:46:01 AM

by User16826992666 • Valued Contributor

06-16-2021 7:57:27 PM

975 Views
2 replies
0 kudos

Resolved! Can I convert parquet files to Delta?

I am already storing my data as parquet files and have registered them as a table in Databricks. If I want to convert the table to be a Delta table, do I have to do a full read of the data and rewrite it in the Delta format?

Data Engineering

975 Views
2 replies
0 kudos

06-16-2021 7:57:27 PM

View Replies

Latest Reply

User16752244127
Contributor

08-12-2021 6:23:08 AM

0 kudos

more details and programmatic options can be found in the Porting Guide.

0 kudos

08-12-2021 6:23:08 AM

1 More Replies

by User16783853501 • New Contributor II

06-23-2021 2:36:44 PM

711 Views
1 replies
1 kudos

Converting data that is in Delta format to plain parquet format

Many a times there is a need to convert Delta tables from Delta format to plain parquet format for a number of reasons, what is the best way to do that?

Data Engineering

711 Views
1 replies
1 kudos

06-23-2021 2:36:44 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-23-2021 6:11:30 PM

1 kudos

You can easily convert a Delta table back to a Parquet table using the following steps:If you have performed Delta Lake operations that can change the data files (for example, delete or merge, run vacuum with retention of 0 hours to delete all data f...

1 kudos

06-23-2021 6:11:30 PM

by olisch • New Contributor

09-26-2019 3:37:08 AM

15926 Views
3 replies
0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

Data Engineering

15926 Views
3 replies
0 kudos

09-26-2019 3:37:08 AM

View Replies

Latest Reply

saravananraju
New Contributor II

09-03-2020 3:41:16 PM

0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

0 kudos

09-03-2020 3:41:16 PM

2 More Replies

by richard1_558848 • New Contributor II

05-19-2015 2:57:02 AM

5562 Views
3 replies
0 kudos

How to set size of Parquet output files ?

Hi I'm using Parquet for format to store Raw Data. Actually the part file are stored on S3 I would like to control the file size of each parquet part file. I try this sqlContext.setConf("spark.parquet.block.size", SIZE.toString) sqlContext.setCon...

Data Engineering

5562 Views
3 replies
0 kudos

05-19-2015 2:57:02 AM

View Replies

Latest Reply

manjeet_chandho
New Contributor II

01-04-2017 9:58:14 PM

0 kudos

Hi All can anyone tell me what is the default Raw Group size while writing via SparkSql

0 kudos

01-04-2017 9:58:14 PM

2 More Replies