Data Engineering

Forum Posts

Sorted by:

by HamidHamid_Mora • New Contributor II

04-14-2023 12:53:03 AM

3420 Views
4 replies
3 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

Data Engineering

3420 Views
4 replies
3 kudos

04-14-2023 12:53:03 AM

View Replies

Latest Reply

h_h_ak
Contributor

11-07-2024 1:45:21 PM

3 kudos

You should have a look here: https://community.databricks.com/t5/data-engineering/azure-databricks-metrics-to-prometheus/td-p/71569

3 kudos

11-07-2024 1:45:21 PM

3 More Replies

by Chris_Shehu • Valued Contributor III

03-21-2022 9:59:31 PM

23324 Views
5 replies
5 kudos

Resolved! What is the best way to handle big data sets?

I'm trying to find the best strategy for handling big data sets. In this case I have something that is 450 million records. I'm pulling the data from SQL Server very quickly but when I try to push the data to the Delta Table OR a Azure Container the...

Data Engineering

23324 Views
5 replies
5 kudos

03-21-2022 9:59:31 PM

View Replies

Latest Reply

Wilynan
New Contributor II

08-11-2023 6:41:05 AM

5 kudos

I think you should consult experts in Big Data for advice on this issue

5 kudos

08-11-2023 6:41:05 AM

4 More Replies

by Constantine • Contributor III

03-30-2022 9:19:56 AM

7529 Views
2 replies
4 kudos

Resolved! How does merge schema work

Let's say I create a table like CREATE TABLE IF NOT EXISTS new_db.data_table ( key STRING, value STRING, last_updated_time TIMESTAMP ) USING DELTA LOCATION 's3://......';Now when I insert into this table I insert data which has say 20 columns a...

Data Engineering

7529 Views
2 replies
4 kudos

03-30-2022 9:19:56 AM

View Replies

Latest Reply

timdriscoll22
New Contributor II

07-11-2023 12:51:24 PM

4 kudos

I tried running "REFRESH TABLE tablename;" but I still do not see the added columns in the data explorer columns, while I do see the added columns in the sample data

4 kudos

07-11-2023 12:51:24 PM

1 More Replies

by martindlarsson • New Contributor III

05-31-2023 1:43:47 AM

1014 Views
0 replies
0 kudos

Autoloader and deletion vectors (Predictive IO)

We are looking into enabling Predictive IO on our delta tables. In the ingest process we are using autoloader and I am wondering if autoloader will gett a flag to enable deletion vectors at table creation? Deletion vectors is a requirement for Predic...

Data Engineering

1014 Views
0 replies
0 kudos

05-31-2023 1:43:47 AM

by DrK • New Contributor III

04-24-2023 3:57:51 AM

1785 Views
1 replies
1 kudos

Current_Database() function unexpected results when queried with PowerBI

Hi,I`m creating some views to be queried by PowerBI. In our delta tables we have a column called database name which contains the source systems database name. What I`m doing is using this to filter data WHERE databaseName = current_database(). No...

Data Engineering

1785 Views
1 replies
1 kudos

04-24-2023 3:57:51 AM

View Replies

by SS0201 • New Contributor II

02-01-2023 8:43:48 AM

3989 Views
4 replies
0 kudos

Slow updates/upserts in Delta tables

When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightni...

Data Engineering

3989 Views
4 replies
0 kudos

02-01-2023 8:43:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 9:28:02 PM

0 kudos

Hi @Surya Agarwal Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

0 kudos

04-08-2023 9:28:02 PM

3 More Replies

by Sid1805 • New Contributor II

03-26-2023 6:21:40 AM

7090 Views
5 replies
0 kudos

Resolved! Calling Delta Tables using JDBC

Hi team,If we kill - clusters every-time will the connection details changes.if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.Also if I want to call a Delta Table from an API using JDBC - s...

Data Engineering

7090 Views
5 replies
0 kudos

03-26-2023 6:21:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:05:02 PM

0 kudos

Hi @Siddharth Krishna Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...

0 kudos

03-31-2023 7:05:02 PM

4 More Replies

by thushar • Contributor

03-22-2023 7:24:51 AM

14544 Views
2 replies
2 kudos

Resolved! Explicit transaction blocks

I know delta tables are supporting the ACID properties and my understanding is Merge, Insert, delete, etc. are inside a transaction by default and if any error occurred during these operations, that transaction will be roll backed. I hope this unders...

Data Engineering

14544 Views
2 replies
2 kudos

03-22-2023 7:24:51 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

03-22-2023 12:17:31 PM

2 kudos

@Thushar R Yes you are right. As Delta table is keeping a transaction log and maintain version history of your data, it can easily roll back your transaction in case of a failure -> i.e. Once transaction in successfully committed, that is when the ...

2 kudos

03-22-2023 12:17:31 PM

1 More Replies

by sanjay • Valued Contributor II

03-20-2023 10:56:56 PM

17655 Views
4 replies
3 kudos

Resolved! How to backup databrick delta tables or workspace

Hi,I am trying to understand how to take backup of databrick delta tables/workspace and how to restore in case if any failure.Or suggest me any alternative solution to revert back if data is corrupted.Regards,Sanjay

Data Engineering

17655 Views
4 replies
3 kudos

03-20-2023 10:56:56 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

03-21-2023 4:13:53 AM

3 kudos

Hi @Sanjay Jain ,Here are some of the ways Deep Clone: https://www.databricks.com/wp-content/uploads/notebooks/using-deep-clone-disaster-recovery-delta-lake-databricks.htmlRepos for notebooks and code: https://docs.databricks.com/repos/index.htmlht...

3 kudos

03-21-2023 4:13:53 AM

3 More Replies

by Ovi • New Contributor III

03-14-2023 3:48:02 AM

5224 Views
5 replies
3 kudos

Resolved! Filter only Delta tables from an S3 folders list

Hello everyone,From a list of folders on s3, how can I filter which ones are Delta tables, without trying to read each one at a time?Thanks,Ovi

Data Engineering

5224 Views
5 replies
3 kudos

03-14-2023 3:48:02 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

03-14-2023 4:13:52 AM

3 kudos

Hello @Ovidiu Eremia ,To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named _delta_log, which is located at the root of ...

3 kudos

03-14-2023 4:13:52 AM

4 More Replies

by Leszek • Contributor

09-09-2022 6:47:13 AM

1199 Views
1 replies
3 kudos

How to handle schema changes in streaming Delta Tables?

I'm using Structure Streaming when moving data from one Delta Table to another.How to handle schema changes in those tables (e.g. adding new column)?

Data Engineering

1199 Views
1 replies
3 kudos

09-09-2022 6:47:13 AM

View Replies

Latest Reply

Murthy1
Contributor II

02-08-2023 1:02:53 PM

3 kudos

Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.

3 kudos

02-08-2023 1:02:53 PM

by Dipesh • New Contributor II

01-31-2023 6:19:58 AM

2351 Views
1 replies
1 kudos

Resolved! Bulk updating Delta tables in Databricks

Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...

Data Engineering

2351 Views
1 replies
1 kudos

01-31-2023 6:19:58 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-31-2023 11:12:31 AM

1 kudos

yes by using MERGE statment

1 kudos

01-31-2023 11:12:31 AM

by naveen123 • New Contributor II

12-23-2022 10:59:06 PM

1857 Views
3 replies
3 kudos

Previous data is getting wiped off for delta tables

I am using only insert sql query to insert the hist. load but previous data getting deleted.Tried with python query also but same issue persists.Reading the data from gcp bucket(parquet file)writing the data into gcp bucket(delta file)..the deleted f...

Data Engineering

1857 Views
3 replies
3 kudos

12-23-2022 10:59:06 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

12-27-2022 3:04:40 PM

3 kudos

Share your query and also look for any error messages in the driver logs. This might help to undertand better what is happening.

3 kudos

12-27-2022 3:04:40 PM

2 More Replies

by User16826992666 • Valued Contributor

06-15-2021 10:48:56 AM

18803 Views
2 replies
2 kudos

Can I query my Delta tables with PowerBI?

I would like to connect to the Delta tables I have created with PowerBI to use for reporting. Is it possible to do this with Databricks or do I have to write my data to some other serving layer?

Data Engineering

18803 Views
2 replies
2 kudos

06-15-2021 10:48:56 AM

View Replies

Latest Reply

gbrueckl
Contributor II

12-16-2022 12:07:37 AM

2 kudos

if you want to read your Delta Lake table directly from the storage without the need of having a Databricks cluster up and running you can also use the official connector Power BI connector for Delta Lake https://github.com/delta-io/connectors/tree/m...

2 kudos

12-16-2022 12:07:37 AM

1 More Replies

by gauthamchettiar • New Contributor II

12-13-2022 5:57:27 AM

1937 Views
0 replies
1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

Data Engineering

1937 Views
0 replies
1 kudos

12-13-2022 5:57:27 AM