cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jingalls
by New Contributor II
  • 363 Views
  • 1 replies
  • 2 kudos

The Data + AI summit is a blast so far. There are so many new technologies being released such as Delta Lake ​2.0 being open source.

The Data + AI summit is a blast so far. There are so many new technologies being released such as Delta Lake ​2.0 being open source.

  • 363 Views
  • 1 replies
  • 2 kudos
Latest Reply
Zzof
New Contributor II
  • 2 kudos

Agreed! You should check out the Azure booth if you haven't already they have a really cool demo.​

  • 2 kudos
Braxx
by Contributor II
  • 2564 Views
  • 2 replies
  • 1 kudos

Resolved! delta table storage

I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.Few questions:1) Where does delta tables are stored? Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"So where exactly i...

  • 2564 Views
  • 2 replies
  • 1 kudos
Latest Reply
Braxx
Contributor II
  • 1 kudos

thanks, very helpful

  • 1 kudos
1 More Replies
Development
by New Contributor III
  • 2614 Views
  • 8 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 2614 Views
  • 8 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
7 More Replies
Taha_Hussain
by Valued Contributor II
  • 779 Views
  • 1 replies
  • 6 kudos

Databricks Office Hours Our next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT. Do you have questions about how to set up or...

Databricks Office HoursOur next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT.Do you have questions about how to set up or use Databricks? Do you want to learn more about the best practices for deploying your use case or tip...

  • 779 Views
  • 1 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Just registered!

  • 6 kudos
WojtekJ
by New Contributor
  • 4100 Views
  • 2 replies
  • 3 kudos

Is it possible to use Iceberg instead of DeltaLake?

Hi.Do you know if it is possible to use Iceberg table format instead DeltaLake?Ideally, I would like to see the tables in Databricks stored as Iceberg and use them as usual in the notebooks.I read that there is also an option to link external metasto...

  • 4100 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Wojtek J​ , Here's a thorough comparison of Delta Lake, Iceberg and Hudi.This talk shares the research that we did for the comparison of the key features and designs these table format holds, the maturity of features, such as APIs expose to end u...

  • 3 kudos
1 More Replies
AmanSehgal
by Honored Contributor III
  • 1637 Views
  • 4 replies
  • 10 kudos

Migrating data from delta lake to RDS MySQL and ElasticSearch

There are mechanisms (like DMS) to get data from RDS to delta lake and store the data in parquet format, but is it possible to reverse of this in AWS?I want to send data from data lake to MySQL RDS tables in batch mode.And the next step is to send th...

  • 1637 Views
  • 4 replies
  • 10 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 10 kudos

@Kaniz Fatma​  and @Hubert Dudek​  - writing to MySQL RDS is relatively simpler. I'm finding ways to export data into Elasticsearch

  • 10 kudos
3 More Replies
hari
by Contributor
  • 3525 Views
  • 8 replies
  • 4 kudos

Resolved! How to write Change Data from Delta Lake to aws dynamodb

Is there some direct way to write data from DeltaLake to AWS DynamoDB.If there is none, Is there any way to do the same.

  • 3525 Views
  • 8 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @Harikrishnan P H​ ,Did @Werner Stinckens​ reply help you to resolved your issue? if yes, please mark it as best. if not, please let us know.

  • 4 kudos
7 More Replies
alejandrofm
by Valued Contributor
  • 2829 Views
  • 3 replies
  • 3 kudos

Resolved! Delta, the specified key does not exist error

Hi, I'm having this error too frequently on a few tables, I check on S3 and the partition exists and the file is there on the partition.error: Spectrum Scan Error: DeltaManifestcode: 15005context: Error fetching Delta Lake manifest delta/product/sub_...

  • 2829 Views
  • 3 replies
  • 3 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 3 kudos

@Hubert Dudek​ , I'll add that sometimes, just running:GENERATE symlink_format_manifest FOR TABLE schema.tablesolves it, but, how can the symlink get broken?Thanks!

  • 3 kudos
2 More Replies
pjp94
by Contributor
  • 7013 Views
  • 5 replies
  • 4 kudos

Resolved! Difference between DBFS and Delta Lake?

Would like a deeper dive/explanation into the difference. When I write to a table with the following code:spark_df.write.mode("overwrite").saveAsTable("db.table")The table is created and can be viewed in the Data tab. It can also be found in some DBF...

  • 7013 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct). this can be parq...

  • 4 kudos
4 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 760 Views
  • 1 replies
  • 15 kudos

Resolved! Write to Azure Delta Lake - optimization request

Databricks/Delta team could optimize some commands which writes to Azure Blob Storage as Azure display that message:

image
  • 760 Views
  • 1 replies
  • 15 kudos
Latest Reply
Anonymous
Not applicable
  • 15 kudos

Hey there. Thank you for your suggestion. I'll pass this up to the team.

  • 15 kudos
Disney
by New Contributor II
  • 742 Views
  • 1 replies
  • 5 kudos

Resolved! We have hundreds of ETL process (Informatica) with a lot of logic pulling various data from applications into a relational db (Target DB). Can we use Delta Lake as the Target DB?

Hi DB Support,Can we use DB's Delta Lake as our Target DB? Here's our situation...We have hundreds of ETL jobs pulling from these Sources. (SAP, Siebel/Oracle, Cognos, Postgres) .Our ETL Process has all of the logic and our Target DB is an MPP syst...

  • 742 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Hi yes you can the best is to create sql endpoint in premium workspace and just write to delta lake as to sql. This is community forum not support. You can contact databricks via https://databricks.com/company/contact or via AWS, Azure if you have su...

  • 5 kudos
MadelynM
by New Contributor III
  • 338 Views
  • 0 replies
  • 1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

  • 338 Views
  • 0 replies
  • 1 kudos
FemiAnthony
by New Contributor III
  • 1875 Views
  • 5 replies
  • 3 kudos

Resolved! Location of customer_t1 dataset

Can anyone tell me how I can access the customer_t1 dataset that is referenced in the book "Delta Lake - The Definitive Guide "? I am trying to follow along with one of the examples.

  • 1875 Views
  • 5 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Some files are visualized here https://github.com/vinijaiswal/delta_time_travel/blob/main/Delta%20Time%20Travel.ipynb but it is quite strange that there is no source in repository. I think only one way is to write to Vini Jaiswal on github.

  • 3 kudos
4 More Replies
prasadvaze
by Valued Contributor
  • 1676 Views
  • 2 replies
  • 1 kudos

Delta RUST API (not REST )

@dennylee Delta RUST API seems a good option to query delta table without spinning up spark cluster so I am trying out this - https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html using Python app"Read...

  • 1676 Views
  • 2 replies
  • 1 kudos
Latest Reply
prasadvaze
Valued Contributor
  • 1 kudos

https://github.com/delta-io/delta-rs/issues/392 This issue is being actively worked on .

  • 1 kudos
1 More Replies
JigaoLuo
by New Contributor
  • 4141 Views
  • 3 replies
  • 0 kudos

OPTIMIZE error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'OPTIMIZE'

Hi everyone. I am trying to learn the keyword OPTIMIZE from this blog using scala: https://docs.databricks.com/delta/optimizations/optimization-examples.html#delta-lake-on-databricks-optimizations-scala-notebook. But my local spark seems not able t...

  • 4141 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi Jigao, OPTIMIZE isn't in the open source delta API, so won't run on your local Spark instance - https://docs.delta.io/latest/api/scala/io/delta/tables/index.html?search=optimize

  • 0 kudos
2 More Replies
Labels