cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Chris_Shehu
by Valued Contributor III
  • 12214 Views
  • 5 replies
  • 5 kudos

Resolved! What is the best way to handle big data sets?

I'm trying to find the best strategy for handling big data sets. In this case I have something that is 450 million records. I'm pulling the data from SQL Server very quickly but when I try to push the data to the Delta Table OR a Azure Container the...

  • 12214 Views
  • 5 replies
  • 5 kudos
Latest Reply
Wilynan
New Contributor II
  • 5 kudos

I think you should consult experts in Big Data for advice on this issue

  • 5 kudos
4 More Replies
Constantine
by Contributor III
  • 5128 Views
  • 2 replies
  • 4 kudos

Resolved! How does merge schema work

Let's say I create a table like CREATE TABLE IF NOT EXISTS new_db.data_table ( key STRING, value STRING, last_updated_time TIMESTAMP ) USING DELTA LOCATION 's3://......';Now when I insert into this table I insert data which has say 20 columns a...

  • 5128 Views
  • 2 replies
  • 4 kudos
Latest Reply
timdriscoll22
New Contributor II
  • 4 kudos

I tried running "REFRESH TABLE tablename;" but I still do not see the added columns in the data explorer columns, while I do see the added columns in the sample data 

  • 4 kudos
1 More Replies
martindlarsson
by New Contributor III
  • 696 Views
  • 0 replies
  • 0 kudos

Autoloader and deletion vectors (Predictive IO)

We are looking into enabling Predictive IO on our delta tables. In the ingest process we are using autoloader and I am wondering if autoloader will gett a flag to enable deletion vectors at table creation? Deletion vectors is a requirement for Predic...

  • 696 Views
  • 0 replies
  • 0 kudos
DrK
by New Contributor III
  • 1158 Views
  • 2 replies
  • 2 kudos

Resolved! Current_Database() function unexpected results when queried with PowerBI

Hi,I`m creating some views to be queried by PowerBI. In our delta tables we have a column called database name which contains the source systems database name. What I`m doing is using this to filter data WHERE databaseName = current_database(). No...

  • 1158 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Andy Skinner​, Thank you for contacting us regarding the issue you're experiencing when querying views using the current_database() function in Power BI. I understand that the ideas work as expected within a worksheet, but no data is returned whe...

  • 2 kudos
1 More Replies
HamidHamid_Mora
by New Contributor II
  • 1836 Views
  • 2 replies
  • 2 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

  • 1836 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Ganglia is only supported on Databricks Runtime versions 12 and below. From Databricks Runtime 13, Ganglia is replaced by a new Databricks metrics system offering more features and integrations. To export metrics to external services, you can use Dat...

  • 2 kudos
1 More Replies
SS0201
by New Contributor II
  • 2510 Views
  • 4 replies
  • 0 kudos

Slow updates/upserts in Delta tables

When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightni...

  • 2510 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Surya Agarwal​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
3 More Replies
Sid1805
by New Contributor II
  • 5149 Views
  • 5 replies
  • 0 kudos

Resolved! Calling Delta Tables using JDBC

Hi team,If we kill - clusters every-time will the connection details changes.if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.Also if I want to call a Delta Table from an API using JDBC - s...

  • 5149 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Siddharth Krishna​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...

  • 0 kudos
4 More Replies
thushar
by Contributor
  • 6028 Views
  • 3 replies
  • 3 kudos

Resolved! Explicit transaction blocks

I know delta tables are supporting the ACID properties and my understanding is Merge, Insert, delete, etc. are inside a transaction by default and if any error occurred during these operations, that transaction will be roll backed. I hope this unders...

  • 6028 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Thushar R​​, We can build a thriving shared knowledge and insights community. Return and mark the best answers to contribute to our ongoing pursuit of excellence.

  • 3 kudos
2 More Replies
sanjay
by Valued Contributor II
  • 6598 Views
  • 4 replies
  • 2 kudos

Resolved! How to backup databrick delta tables or workspace

Hi,I am trying to understand how to take backup of databrick delta tables/workspace and how to restore in case if any failure.Or suggest me any alternative solution to revert back if data is corrupted.Regards,Sanjay

  • 6598 Views
  • 4 replies
  • 2 kudos
Latest Reply
NandiniN
Honored Contributor
  • 2 kudos

Hi @Sanjay Jain​ ,Here are some of the ways Deep Clone: https://www.databricks.com/wp-content/uploads/notebooks/using-deep-clone-disaster-recovery-delta-lake-databricks.htmlRepos for notebooks and code: https://docs.databricks.com/repos/index.htmlht...

  • 2 kudos
3 More Replies
Ovi
by New Contributor III
  • 3571 Views
  • 5 replies
  • 3 kudos

Resolved! Filter only Delta tables from an S3 folders list

Hello everyone,From a list of folders on s3, how can I filter which ones are Delta tables, without trying to read each one at a time?Thanks,Ovi

  • 3571 Views
  • 5 replies
  • 3 kudos
Latest Reply
NandiniN
Honored Contributor
  • 3 kudos

Hello @Ovidiu Eremia​ ,To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named _delta_log, which is located at the root of ...

  • 3 kudos
4 More Replies
Leszek
by Contributor
  • 794 Views
  • 1 replies
  • 3 kudos

How to handle schema changes in streaming Delta Tables?

I'm using Structure Streaming when moving data from one Delta Table to another.How to handle schema changes in those tables (e.g. adding new column)?

  • 794 Views
  • 1 replies
  • 3 kudos
Latest Reply
Murthy1
Contributor II
  • 3 kudos

Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.

  • 3 kudos
Dipesh
by New Contributor II
  • 1412 Views
  • 1 replies
  • 1 kudos

Resolved! Bulk updating Delta tables in Databricks

Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...

  • 1412 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

yes by using MERGE statment

  • 1 kudos
naveen123
by New Contributor II
  • 1258 Views
  • 3 replies
  • 3 kudos

Previous data is getting wiped off for delta tables

I am using only insert sql query to insert the hist. load but previous data getting deleted.Tried with python query also but same issue persists.Reading the data from gcp bucket(parquet file)writing the data into gcp bucket(delta file)..the deleted f...

  • 1258 Views
  • 3 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Share your query and also look for any error messages in the driver logs. This might help to undertand better what is happening.

  • 3 kudos
2 More Replies
User16826992666
by Valued Contributor
  • 13921 Views
  • 2 replies
  • 2 kudos

Can I query my Delta tables with PowerBI?

I would like to connect to the Delta tables I have created with PowerBI to use for reporting. Is it possible to do this with Databricks or do I have to write my data to some other serving layer?

  • 13921 Views
  • 2 replies
  • 2 kudos
Latest Reply
gbrueckl
Contributor II
  • 2 kudos

if you want to read your Delta Lake table directly from the storage without the need of having a Databricks cluster up and running you can also use the official connector Power BI connector for Delta Lake https://github.com/delta-io/connectors/tree/m...

  • 2 kudos
1 More Replies
gauthamchettiar
by New Contributor II
  • 1350 Views
  • 0 replies
  • 1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

BroadCastJoin 1M
  • 1350 Views
  • 0 replies
  • 1 kudos
Labels