cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gbalboa
by New Contributor
  • 8661 Views
  • 1 replies
  • 4 kudos

Resolved! How do temp views actually work?

So I'm querying data from parquet files that have a couple of billions records (table 1 or t1), and then have to filter and then join with other parquet files with another couple of billions records (t2). This takes quite a long time to run (like 10h...

  • 8661 Views
  • 1 replies
  • 4 kudos
Latest Reply
PeteStern
New Contributor III
  • 4 kudos

Your intuition about views is correct. Views are not materialized, so they are basically just a saved query. Every time you access a view it will have to be recomputed. This is certainly not ideal if it take a long time (like 10hrs) to materialize a ...

  • 4 kudos
JEAG
by New Contributor III
  • 24141 Views
  • 15 replies
  • 5 kudos

Resolved! Error writing parquet files

Hi, we are having this chain of errors every day in different files and processes:An error occurred while calling o11255.parquet.: org.apache.spark.SparkException: Job aborted.Caused by: org.apache.spark.SparkException: Job aborted due to stage failu...

  • 24141 Views
  • 15 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @Jose Eliseo Aznarte Garcia​ , This is expected behaviour when you update some rows in the table and immediately query the table.From the error message: It is possible the underlying files have been updated. You can explicitly invalidate the cache...

  • 5 kudos
14 More Replies
fff_ds
by New Contributor
  • 731 Views
  • 1 replies
  • 1 kudos

Manual overwrite in s3 console of a collection of parquet files and now we can't read them.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while rea...

  • 731 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hello, @Lili Ehrlich​. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.Thanks in advance for your patience.

  • 1 kudos
AzureDatabricks
by New Contributor III
  • 2767 Views
  • 8 replies
  • 4 kudos

Resolved! Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded

Truncate False not working in Delta table.  df_delta.show(df_delta.count(),False)Computer size Single Node - Standard_F4S - 8GB Memory, 4 coresHow much max data we can persist in Delta table in Parquet file and How fast we can retrieve data.

  • 2767 Views
  • 8 replies
  • 4 kudos
Latest Reply
AzureDatabricks
New Contributor III
  • 4 kudos

thank you !!!

  • 4 kudos
7 More Replies
Gapy
by New Contributor II
  • 1032 Views
  • 1 replies
  • 1 kudos

Auto Loader Schema-Inference and Evolution for parquet files

Dear all,will (and when) will Auto Loader also support Schema-Inference and Evolution for parquet files, at this point it is only for JSON and CSV supported if i am not mistaken?Thanks and regards,Gapy

  • 1032 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sandeep
Contributor III
  • 1 kudos

@Gasper Zerak​ , This will be available in near future (DBR 10.3 or later). Unfortunately, we don't have an SLA at this moment.

  • 1 kudos
User16826992666
by Valued Contributor
  • 975 Views
  • 2 replies
  • 0 kudos

Resolved! Can I convert parquet files to Delta?

I am already storing my data as parquet files and have registered them as a table in Databricks. If I want to convert the table to be a Delta table, do I have to do a full read of the data and rewrite it in the Delta format?

  • 975 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16752244127
Contributor
  • 0 kudos

more details and programmatic options can be found in the Porting Guide.

  • 0 kudos
1 More Replies
User16783853501
by New Contributor II
  • 711 Views
  • 1 replies
  • 1 kudos

Converting data that is in Delta format to plain parquet format

Many a times there is a need to convert Delta tables from Delta format to plain parquet format for a number of reasons, what is the best way to do that?

  • 711 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

You can easily convert a Delta table back to a Parquet table using the following steps:If you have performed Delta Lake operations that can change the data files (for example, delete or merge, run vacuum with retention of 0 hours to delete all data f...

  • 1 kudos
olisch
by New Contributor
  • 15926 Views
  • 3 replies
  • 0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

  • 15926 Views
  • 3 replies
  • 0 kudos
Latest Reply
saravananraju
New Contributor II
  • 0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

  • 0 kudos
2 More Replies
richard1_558848
by New Contributor II
  • 5562 Views
  • 3 replies
  • 0 kudos

How to set size of Parquet output files ?

Hi I'm using Parquet for format to store Raw Data. Actually the part file are stored on S3 I would like to control the file size of each parquet part file. I try this sqlContext.setConf("spark.parquet.block.size", SIZE.toString) sqlContext.setCon...

  • 5562 Views
  • 3 replies
  • 0 kudos
Latest Reply
manjeet_chandho
New Contributor II
  • 0 kudos

Hi All can anyone tell me what is the default Raw Group size while writing via SparkSql

  • 0 kudos
2 More Replies
Labels