cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

apiury
by New Contributor III
  • 3944 Views
  • 4 replies
  • 2 kudos

Delta file question

Hi! Im using Autoloader to ingest Binary files into delta format. I have 7 binary files but delta generate 3 files and the format is part-0000, part-0001... Why generate this files with format part-000...

image
  • 3944 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Alejandro Piury Pinzón​ We haven't heard from you since the last response from @Lakshay Goel​ r​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be hel...

  • 2 kudos
3 More Replies
Ovi
by New Contributor III
  • 2024 Views
  • 1 replies
  • 0 kudos

Spark Dataframe write to Delta format doesn't create a _delta_log

Hello everyone, I have an intermittent issue when trying to create a Delta table for the first time in Databricks: all the data gets converted into parquet at the specified location but the _delta_log is not created or, if created, it's left empty, t...

  • 2024 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Can you list (display) the folder location "deltaLocation"? what files do you see here? have you try to use a new location for testing? do you get the same behavior?

  • 0 kudos
pvignesh92
by Honored Contributor
  • 1005 Views
  • 0 replies
  • 0 kudos

Very often, we need to know how many files my table path contains and the overall size of the path for various optimizations. In the past, I had to wr...

Very often, we need to know how many files my table path contains and the overall size of the path for various optimizations. In the past, I had to write my own logic to accomplish this.Delta Lake is making life easier. See how simple it is to obtain...

1684878098472
  • 1005 Views
  • 0 replies
  • 0 kudos
wim_schmitz_per
by New Contributor II
  • 3635 Views
  • 2 replies
  • 2 kudos

Transforming/Saving Python Class Instances to Delta Rows

I'm trying to reuse a Python Package to do a very complex series of parsing binary files into workable data in Delta Format. I have made the first part (binary file parsing) work with a UDF:asffileparser = F.udf(File()._parseBytes,AsfFileDelta.getSch...

  • 3635 Views
  • 2 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, did you try to follow, "Fix it by registering a custom IObjectConstructor for this class."?Also, could you please provide us the full error?

  • 2 kudos
1 More Replies
joakon
by New Contributor III
  • 10620 Views
  • 6 replies
  • 6 kudos
  • 10620 Views
  • 6 replies
  • 6 kudos
Latest Reply
huyd
New Contributor III
  • 6 kudos

check your read cell, "Delimeter"

  • 6 kudos
5 More Replies
Biber
by New Contributor III
  • 3294 Views
  • 5 replies
  • 8 kudos

Resolved! Change schema when writing to the Delta format

Is it possible to reapply schema in delta files? For example, we have a history with field string but from some point, we need to replace string with struct.In my case merge option and overwrite schema don't work.

  • 3294 Views
  • 5 replies
  • 8 kudos
Latest Reply
Biber
New Contributor III
  • 8 kudos

Hi guys! Definitely, thank you for your support.

  • 8 kudos
4 More Replies
Anonymous
by Not applicable
  • 8662 Views
  • 8 replies
  • 7 kudos

Resolved! data frame takes unusually long time to write for small data sets

We have configured workspace with own vpc. We need to extract data from DB2 and write as delta format. we tried to for 550k records with 230 columns, it took 50mins to complete the task. 15mn records takes more than 18hrs. Not sure why this takes suc...

  • 8662 Views
  • 8 replies
  • 7 kudos
Latest Reply
elgeo
Valued Contributor II
  • 7 kudos

Hello. We face exactly the same issue. Reading is quick but writing takes long time. Just to clarify that it is about a table with only 700k rows. Any suggestions please? Thank youremote_table = spark.read.format ( "jdbc" ) \.option ( "driver" , "com...

  • 7 kudos
7 More Replies
StephanieAlba
by Databricks Employee
  • 2193 Views
  • 3 replies
  • 6 kudos
  • 2193 Views
  • 3 replies
  • 6 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 6 kudos

Hi @Stephanie Rivera​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 6 kudos
2 More Replies
KKo
by Contributor III
  • 5102 Views
  • 3 replies
  • 4 kudos

Resolved! Reading multiple parquet files from same _delta_log under a path

I have a path where there is _delta_log and 3 snappy.parquet files. I am trying to read all those .parquet using spark.read.format('delta').load(path) but I am getting data from only one same file all the time. Can't I read from all these files? If s...

  • 5102 Views
  • 3 replies
  • 4 kudos
Latest Reply
KKo
Contributor III
  • 4 kudos

@Werner Stinckens​ Thanks for the reply and explanation, that was helpful to understand the delta feature.

  • 4 kudos
2 More Replies
Frankooo
by New Contributor III
  • 6947 Views
  • 8 replies
  • 7 kudos

How to optimize exporting dataframe to delta file?

Scenario : I have a dataframe that have 5 billion records/rows and 100+ columns. Is there a way to write this in a delta format efficiently. I have tried to export it but cancelled it after 2 hours (write didnt finish) as this processing time is not ...

  • 6947 Views
  • 8 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

Hi @Franco Sia​ ,I will recommend to avoid to use the repartition(50), instead enable optimizes writes on your Delta table. You can find more details hereEnable optimized writes and auto compaction on your Delta table. Use AQE (docs here) to have eno...

  • 7 kudos
7 More Replies
StephanieAlba
by Databricks Employee
  • 1826 Views
  • 1 replies
  • 6 kudos
  • 1826 Views
  • 1 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Hi as it is transaction tables (there are history commits and snapshot). I would not store there images or videos as it can be saved few times and you will have high storage costs, it can also be slow when data is big.I would definitely store images,...

  • 6 kudos
User16783853501
by Databricks Employee
  • 2050 Views
  • 1 replies
  • 1 kudos

Converting data that is in Delta format to plain parquet format

Many a times there is a need to convert Delta tables from Delta format to plain parquet format for a number of reasons, what is the best way to do that?

  • 2050 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

You can easily convert a Delta table back to a Parquet table using the following steps:If you have performed Delta Lake operations that can change the data files (for example, delete or merge, run vacuum with retention of 0 hours to delete all data f...

  • 1 kudos
Labels