Topics with Label: Delta Tables

Forum Posts

Sorted by:

by tom_shaffner • New Contributor III

05-18-2022 9:07:12 AM

5061 Views
3 replies
2 kudos

How to take only the most recent record from a variable number of tables in a stream

Short version: I need a way to take only the most recent record from a variable number of tables in a stream. This is a relatively easy problem in sql or python pandas (group by and take the newest) but in a stream I keep hitting blocks. I could do i...

Data Engineering

5061 Views
3 replies
2 kudos

05-18-2022 9:07:12 AM

View Replies

Latest Reply

Håkon_Åmdal
New Contributor III

05-24-2022 3:11:26 AM

2 kudos

Did you try storing it all to a DELTA table with a MERGE INTO [1]? You can optionally specify a condition on "WHEN MATCHED" such that you only insert if the timestamp is newer.[1] https://docs.databricks.com/spark/latest/spark-sql/language-manual/del...

2 kudos

05-24-2022 3:11:26 AM

2 More Replies

by tom_shaffner • New Contributor III

05-18-2022 9:14:18 AM

4171 Views
2 replies
3 kudos

"Detected a data update", what changed?

In streaming flows I periodically get a "Detected a data update" error. This error generally seem to indicate that something has changed in the source table schema, but it's not immediately apparent what. In one case yesterday I pulled the source tab...

Data Engineering

4171 Views
2 replies
3 kudos

05-18-2022 9:14:18 AM

View Replies

Latest Reply

tom_shaffner
New Contributor III

05-19-2022 10:33:07 AM

3 kudos

@Kaniz Fatma , Thanks, that helps. I was assuming this warning indicated a schema evolution, and based on what you say it likely wasn't and I just have to turn on IgnoreChanges any time I have a stream from a table that receives updates/upserts.To b...

3 kudos

05-19-2022 10:33:07 AM

1 More Replies

by Development • New Contributor III

04-12-2022 11:25:00 PM

2619 Views
8 replies
5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

Data Engineering

2619 Views
8 replies
5 kudos

04-12-2022 11:25:00 PM

View Replies

Latest Reply

Development
New Contributor III

04-27-2022 8:27:46 AM

5 kudos

@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....

5 kudos

04-27-2022 8:27:46 AM

7 More Replies

by Rahul_Samant • Contributor

03-14-2022 3:55:28 AM

7133 Views
5 replies
3 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

Data Engineering

7133 Views
5 replies
3 kudos

03-14-2022 3:55:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-10-2022 5:57:58 AM

3 kudos

Hi @Rahul Samant , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

3 kudos

05-10-2022 5:57:58 AM

4 More Replies

by athjain • New Contributor III

03-07-2022 12:24:37 AM

1655 Views
5 replies
9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

Data Engineering

1655 Views
5 replies
9 kudos

03-07-2022 12:24:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:04:16 AM

9 kudos

Hey @Athlestan Jain Just checking in. Do you think you were able to find a solution to your problem from the above answers? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

9 kudos

04-22-2022 7:04:16 AM

4 More Replies

by PJ • New Contributor III

04-21-2022 12:45:17 PM

1045 Views
3 replies
3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

Data Engineering

1045 Views
3 replies
3 kudos

04-21-2022 12:45:17 PM

View Replies

Latest Reply

PJ
New Contributor III

04-21-2022 1:45:33 PM

3 kudos

Thank you @Hubert Dudek !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

3 kudos

04-21-2022 1:45:33 PM

2 More Replies

by Constantine • Contributor III

04-10-2022 10:56:12 PM

1934 Views
2 replies
5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

Data Engineering

1934 Views
2 replies
5 kudos

04-10-2022 10:56:12 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-13-2022 3:01:47 AM

5 kudos

Hi @John Constantine , Did the above suggestions provided by @Hubert Dudek help your case?

5 kudos

04-13-2022 3:01:47 AM

1 More Replies

by Constantine • Contributor III

04-11-2022 12:54:25 PM

1323 Views
2 replies
5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

Data Engineering

1323 Views
2 replies
5 kudos

04-11-2022 12:54:25 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-13-2022 2:37:19 AM

5 kudos

Hi @John Constantine , Did you try the above suggestions?

5 kudos

04-13-2022 2:37:19 AM

1 More Replies

by Databricks_7045 • New Contributor III

03-03-2022 9:45:31 AM

2325 Views
2 replies
4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

Data Engineering

2325 Views
2 replies
4 kudos

03-03-2022 9:45:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2022 9:29:43 AM

4 kudos

Hi @Rajesh Vinukonda Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

4 kudos

04-12-2022 9:29:43 AM

1 More Replies

by Shehan92 • New Contributor II

03-13-2022 8:29:13 PM

2191 Views
3 replies
4 kudos

Resolved! Error in accessing Delta Tables

I'm getting attached error in accessing delta lake tables in the data bricks workspaceSummary of error: Could not connect to md1n4trqmokgnhr.csnrqwqko4ho.ap-southeast-1.rds.amazonaws.com:3306 : Connection resetAttached detailed error

Data Engineering

2191 Views
3 replies
4 kudos

03-13-2022 8:29:13 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-21-2022 8:58:12 AM

4 kudos

Hi @Shehan Madusanka , Are you still seeing the error or were you able to resolve it?

4 kudos

03-21-2022 8:58:12 AM

2 More Replies

by Abel_Martinez • Contributor

03-11-2022 3:06:29 AM

1061 Views
3 replies
1 kudos

Resolved! Create data bricks service account

Hi all, I need to create service account users who can only query some delta tables. I guess I do that by creating the user and granting select right to the desired tables. But Data bricks requests a mail account for these users. Is there a way to cr...

Data Engineering

1061 Views
3 replies
1 kudos

03-11-2022 3:06:29 AM

View Replies

Latest Reply

Abel_Martinez
Contributor

03-16-2022 8:16:00 AM

1 kudos

HI @Kaniz Fatma , I've checked the link but the standard method requires a mailbox and the user creation using SCIM API looks too complicated. I solved the issue, I created a mailbox for the service account and I created the user using that mailbox....

1 kudos

03-16-2022 8:16:00 AM

2 More Replies

by prasadvaze • Valued Contributor

10-20-2021 12:56:28 PM

12316 Views
8 replies
3 kudos

Resolved! How to make delta table column values case-insensitive?

we have many delta tables with string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case. Its lot of code change to use upper/lower function on column value compare (in ...

Data Engineering

12316 Views
8 replies
3 kudos

10-20-2021 12:56:28 PM

View Replies

Latest Reply

lizou
Contributor II

11-28-2021 8:49:48 PM

3 kudos

Well, the unintended benefit is now I am using int\big int as surrogate keysfor all tables (preferred in DW). All joins are made on integer data types. Query efficiency is also improved.The string matching using upper() is done only on ETL when com...

3 kudos

11-28-2021 8:49:48 PM

7 More Replies

by AzureDatabricks • New Contributor III

11-21-2021 11:18:10 PM

5887 Views
7 replies
2 kudos

Resolved! Can we store 300 million records and what is the preferable compute type and config?

How we can persist 300 million records? What is the best option to persist data databricks hive metastore/Azure storage/Delta table?What is the limitations we have for deltatables of databricks in terms of data?We have usecase where testers should be...

Data Engineering

5887 Views
7 replies
2 kudos

11-21-2021 11:18:10 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-21-2021 11:26:42 PM

2 kudos

You can certainly store 300 million records without any problem.The best option kinda depends on the use case. If you want to do a lot of online querying on the table, I suggest using delta lake, which is optimeized (using z-order, bloom filter, par...

2 kudos

11-21-2021 11:26:42 PM

6 More Replies

by SRS • New Contributor II

11-16-2021 1:40:28 AM

2071 Views
3 replies
5 kudos

Resolved! Delta Tables incremental backup method

Hello,Does anyone tried to create an incremental backup on delta tables? What I mean is to load into the backup storage only the latest parquet files part of the Delta Table and to refresh the _delta_log folder, instead of copying the whole files aga...

Data Engineering

2071 Views
3 replies
5 kudos

11-16-2021 1:40:28 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

11-17-2021 11:47:09 AM

5 kudos

Hi @Stefan Stegaru ,You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep...

5 kudos

11-17-2021 11:47:09 AM

2 More Replies

by Anonymous • Not applicable

06-21-2021 2:03:47 PM

924 Views
2 replies
0 kudos

Resolved! What are the advantages of using Delta if I am using MLflow? How is Delta useful for DS/ML use cases?

I am already using MLflow. What benefit would Delta provide me since I am not really working on Data engineering workloads

Data Engineering

924 Views
2 replies
0 kudos

06-21-2021 2:03:47 PM

View Replies

Latest Reply

Sebastian
Contributor

09-13-2021 6:00:01 AM

0 kudos

The most important aspect is your experiment can track the version of the data table. So during audits you will be able to trace back why a specific prediction was made.

0 kudos

09-13-2021 6:00:01 AM

1 More Replies