cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

apiury
by New Contributor III
  • 1901 Views
  • 4 replies
  • 2 kudos

Delta file question

Hi! Im using Autoloader to ingest Binary files into delta format. I have 7 binary files but delta generate 3 files and the format is part-0000, part-0001... Why generate this files with format part-000...

image
  • 1901 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Alejandro Piury Pinzón​ We haven't heard from you since the last response from @Lakshay Goel​ r​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be hel...

  • 2 kudos
3 More Replies
Ovi
by New Contributor III
  • 1151 Views
  • 1 replies
  • 0 kudos

Spark Dataframe write to Delta format doesn't create a _delta_log

Hello everyone, I have an intermittent issue when trying to create a Delta table for the first time in Databricks: all the data gets converted into parquet at the specified location but the _delta_log is not created or, if created, it's left empty, t...

  • 1151 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Can you list (display) the folder location "deltaLocation"? what files do you see here? have you try to use a new location for testing? do you get the same behavior?

  • 0 kudos
pvignesh92
by Honored Contributor
  • 484 Views
  • 0 replies
  • 0 kudos

Very often, we need to know how many files my table path contains and the overall size of the path for various optimizations. In the past, I had to wr...

Very often, we need to know how many files my table path contains and the overall size of the path for various optimizations. In the past, I had to write my own logic to accomplish this.Delta Lake is making life easier. See how simple it is to obtain...

1684878098472
  • 484 Views
  • 0 replies
  • 0 kudos
wim_schmitz_per
by New Contributor II
  • 1692 Views
  • 2 replies
  • 2 kudos

Transforming/Saving Python Class Instances to Delta Rows

I'm trying to reuse a Python Package to do a very complex series of parsing binary files into workable data in Delta Format. I have made the first part (binary file parsing) work with a UDF:asffileparser = F.udf(File()._parseBytes,AsfFileDelta.getSch...

  • 1692 Views
  • 2 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi, did you try to follow, "Fix it by registering a custom IObjectConstructor for this class."?Also, could you please provide us the full error?

  • 2 kudos
1 More Replies
joakon
by New Contributor III
  • 4300 Views
  • 7 replies
  • 6 kudos
  • 4300 Views
  • 7 replies
  • 6 kudos
Latest Reply
huyd
New Contributor III
  • 6 kudos

check your read cell, "Delimeter"

  • 6 kudos
6 More Replies
Biber
by New Contributor III
  • 1482 Views
  • 5 replies
  • 8 kudos

Resolved! Change schema when writing to the Delta format

Is it possible to reapply schema in delta files? For example, we have a history with field string but from some point, we need to replace string with struct.In my case merge option and overwrite schema don't work.

  • 1482 Views
  • 5 replies
  • 8 kudos
Latest Reply
Biber
New Contributor III
  • 8 kudos

Hi guys! Definitely, thank you for your support.

  • 8 kudos
4 More Replies
Anonymous
by Not applicable
  • 5129 Views
  • 9 replies
  • 6 kudos

Resolved! data frame takes unusually long time to write for small data sets

We have configured workspace with own vpc. We need to extract data from DB2 and write as delta format. we tried to for 550k records with 230 columns, it took 50mins to complete the task. 15mn records takes more than 18hrs. Not sure why this takes suc...

  • 5129 Views
  • 9 replies
  • 6 kudos
Latest Reply
elgeo
Valued Contributor II
  • 6 kudos

Hello. We face exactly the same issue. Reading is quick but writing takes long time. Just to clarify that it is about a table with only 700k rows. Any suggestions please? Thank youremote_table = spark.read.format ( "jdbc" ) \.option ( "driver" , "com...

  • 6 kudos
8 More Replies
StephanieRivera
by Valued Contributor II
  • 1021 Views
  • 3 replies
  • 6 kudos
  • 1021 Views
  • 3 replies
  • 6 kudos
Latest Reply
jose_gonzalez
Moderator
  • 6 kudos

Hi @Stephanie Rivera​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 6 kudos
2 More Replies
Frankooo
by New Contributor III
  • 3211 Views
  • 9 replies
  • 7 kudos

How to optimize exporting dataframe to delta file?

Scenario : I have a dataframe that have 5 billion records/rows and 100+ columns. Is there a way to write this in a delta format efficiently. I have tried to export it but cancelled it after 2 hours (write didnt finish) as this processing time is not ...

  • 3211 Views
  • 9 replies
  • 7 kudos
Latest Reply
Kaniz
Community Manager
  • 7 kudos

Hi @Franco Sia​ , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

  • 7 kudos
8 More Replies
KKo
by Contributor III
  • 3094 Views
  • 5 replies
  • 4 kudos

Resolved! Reading multiple parquet files from same _delta_log under a path

I have a path where there is _delta_log and 3 snappy.parquet files. I am trying to read all those .parquet using spark.read.format('delta').load(path) but I am getting data from only one same file all the time. Can't I read from all these files? If s...

  • 3094 Views
  • 5 replies
  • 4 kudos
Latest Reply
KKo
Contributor III
  • 4 kudos

@Werner Stinckens​ Thanks for the reply and explanation, that was helpful to understand the delta feature.

  • 4 kudos
4 More Replies
StephanieRivera
by Valued Contributor II
  • 952 Views
  • 1 replies
  • 5 kudos
  • 952 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Hi as it is transaction tables (there are history commits and snapshot). I would not store there images or videos as it can be saved few times and you will have high storage costs, it can also be slow when data is big.I would definitely store images,...

  • 5 kudos
User16783853501
by New Contributor II
  • 638 Views
  • 1 replies
  • 1 kudos

Converting data that is in Delta format to plain parquet format

Many a times there is a need to convert Delta tables from Delta format to plain parquet format for a number of reasons, what is the best way to do that?

  • 638 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

You can easily convert a Delta table back to a Parquet table using the following steps:If you have performed Delta Lake operations that can change the data files (for example, delete or merge, run vacuum with retention of 0 hours to delete all data f...

  • 1 kudos
Labels