- 8661 Views
- 1 replies
- 4 kudos
So I'm querying data from parquet files that have a couple of billions records (table 1 or t1), and then have to filter and then join with other parquet files with another couple of billions records (t2). This takes quite a long time to run (like 10h...
- 8661 Views
- 1 replies
- 4 kudos
Latest Reply
Your intuition about views is correct. Views are not materialized, so they are basically just a saved query. Every time you access a view it will have to be recomputed. This is certainly not ideal if it take a long time (like 10hrs) to materialize a ...
by
JEAG
• New Contributor III
- 24141 Views
- 15 replies
- 5 kudos
Hi, we are having this chain of errors every day in different files and processes:An error occurred while calling o11255.parquet.: org.apache.spark.SparkException: Job aborted.Caused by: org.apache.spark.SparkException: Job aborted due to stage failu...
- 24141 Views
- 15 replies
- 5 kudos
Latest Reply
Hi @Jose Eliseo Aznarte Garcia​ , This is expected behaviour when you update some rows in the table and immediately query the table.From the error message: It is possible the underlying files have been updated. You can explicitly invalidate the cache...
14 More Replies
- 731 Views
- 1 replies
- 1 kudos
org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while rea...
- 731 Views
- 1 replies
- 1 kudos
Latest Reply
Hello, @Lili Ehrlich​. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.Thanks in advance for your patience.
- 2767 Views
- 8 replies
- 4 kudos
Truncate False not working in Delta table. df_delta.show(df_delta.count(),False)Computer size Single Node - Standard_F4S - 8GB Memory, 4 coresHow much max data we can persist in Delta table in Parquet file and How fast we can retrieve data.
- 2767 Views
- 8 replies
- 4 kudos
by
Gapy
• New Contributor II
- 1032 Views
- 1 replies
- 1 kudos
Dear all,will (and when) will Auto Loader also support Schema-Inference and Evolution for parquet files, at this point it is only for JSON and CSV supported if i am not mistaken?Thanks and regards,Gapy
- 1032 Views
- 1 replies
- 1 kudos
Latest Reply
@Gasper Zerak​ , This will be available in near future (DBR 10.3 or later). Unfortunately, we don't have an SLA at this moment.
- 975 Views
- 2 replies
- 0 kudos
I am already storing my data as parquet files and have registered them as a table in Databricks. If I want to convert the table to be a Delta table, do I have to do a full read of the data and rewrite it in the Delta format?
- 975 Views
- 2 replies
- 0 kudos
Latest Reply
more details and programmatic options can be found in the Porting Guide.
1 More Replies
- 711 Views
- 1 replies
- 1 kudos
Many a times there is a need to convert Delta tables from Delta format to plain parquet format for a number of reasons, what is the best way to do that?
- 711 Views
- 1 replies
- 1 kudos
Latest Reply
You can easily convert a Delta table back to a Parquet table using the following steps:If you have performed Delta Lake operations that can change the data files (for example, delete or merge, run vacuum with retention of 0 hours to delete all data f...
- 15926 Views
- 3 replies
- 0 kudos
How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file?
If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...
- 15926 Views
- 3 replies
- 0 kudos
Latest Reply
Hi,
You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe
Df.write.format("parquet").mode("overwrite").insertInto("/file_path")
~Saravanan
2 More Replies
- 5562 Views
- 3 replies
- 0 kudos
Hi
I'm using Parquet for format to store Raw Data. Actually the part file are stored on S3
I would like to control the file size of each parquet part file.
I try this
sqlContext.setConf("spark.parquet.block.size", SIZE.toString)
sqlContext.setCon...
- 5562 Views
- 3 replies
- 0 kudos
Latest Reply
Hi All can anyone tell me what is the default Raw Group size while writing via SparkSql
2 More Replies