cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HamidHamid_Mora
by New Contributor II
  • 3214 Views
  • 4 replies
  • 3 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

  • 3214 Views
  • 4 replies
  • 3 kudos
Latest Reply
h_h_ak
Contributor
  • 3 kudos

You should have a look here: https://community.databricks.com/t5/data-engineering/azure-databricks-metrics-to-prometheus/td-p/71569

  • 3 kudos
3 More Replies
Chris_Shehu
by Valued Contributor III
  • 20981 Views
  • 5 replies
  • 5 kudos

Resolved! What is the best way to handle big data sets?

I'm trying to find the best strategy for handling big data sets. In this case I have something that is 450 million records. I'm pulling the data from SQL Server very quickly but when I try to push the data to the Delta Table OR a Azure Container the...

  • 20981 Views
  • 5 replies
  • 5 kudos
Latest Reply
Wilynan
New Contributor II
  • 5 kudos

I think you should consult experts in Big Data for advice on this issue

  • 5 kudos
4 More Replies
Constantine
by Contributor III
  • 6768 Views
  • 2 replies
  • 4 kudos

Resolved! How does merge schema work

Let's say I create a table like CREATE TABLE IF NOT EXISTS new_db.data_table ( key STRING, value STRING, last_updated_time TIMESTAMP ) USING DELTA LOCATION 's3://......';Now when I insert into this table I insert data which has say 20 columns a...

  • 6768 Views
  • 2 replies
  • 4 kudos
Latest Reply
timdriscoll22
New Contributor II
  • 4 kudos

I tried running "REFRESH TABLE tablename;" but I still do not see the added columns in the data explorer columns, while I do see the added columns in the sample data 

  • 4 kudos
1 More Replies
martindlarsson
by New Contributor III
  • 965 Views
  • 0 replies
  • 0 kudos

Autoloader and deletion vectors (Predictive IO)

We are looking into enabling Predictive IO on our delta tables. In the ingest process we are using autoloader and I am wondering if autoloader will gett a flag to enable deletion vectors at table creation? Deletion vectors is a requirement for Predic...

  • 965 Views
  • 0 replies
  • 0 kudos
SS0201
by New Contributor II
  • 3722 Views
  • 4 replies
  • 0 kudos

Slow updates/upserts in Delta tables

When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightni...

  • 3722 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Surya Agarwal​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
3 More Replies
Sid1805
by New Contributor II
  • 6734 Views
  • 5 replies
  • 0 kudos

Resolved! Calling Delta Tables using JDBC

Hi team,If we kill - clusters every-time will the connection details changes.if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.Also if I want to call a Delta Table from an API using JDBC - s...

  • 6734 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Siddharth Krishna​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...

  • 0 kudos
4 More Replies
thushar
by Contributor
  • 13426 Views
  • 2 replies
  • 2 kudos

Resolved! Explicit transaction blocks

I know delta tables are supporting the ACID properties and my understanding is Merge, Insert, delete, etc. are inside a transaction by default and if any error occurred during these operations, that transaction will be roll backed. I hope this unders...

  • 13426 Views
  • 2 replies
  • 2 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 2 kudos

@Thushar R​  Yes you are right. As Delta table is keeping a transaction log and maintain version history of your data, it can easily roll back your transaction in case of a failure -> i.e. Once transaction in successfully committed, that is when the ...

  • 2 kudos
1 More Replies
sanjay
by Valued Contributor II
  • 14625 Views
  • 4 replies
  • 2 kudos

Resolved! How to backup databrick delta tables or workspace

Hi,I am trying to understand how to take backup of databrick delta tables/workspace and how to restore in case if any failure.Or suggest me any alternative solution to revert back if data is corrupted.Regards,Sanjay

  • 14625 Views
  • 4 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Hi @Sanjay Jain​ ,Here are some of the ways Deep Clone: https://www.databricks.com/wp-content/uploads/notebooks/using-deep-clone-disaster-recovery-delta-lake-databricks.htmlRepos for notebooks and code: https://docs.databricks.com/repos/index.htmlht...

  • 2 kudos
3 More Replies
Ovi
by New Contributor III
  • 4946 Views
  • 5 replies
  • 3 kudos

Resolved! Filter only Delta tables from an S3 folders list

Hello everyone,From a list of folders on s3, how can I filter which ones are Delta tables, without trying to read each one at a time?Thanks,Ovi

  • 4946 Views
  • 5 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

Hello @Ovidiu Eremia​ ,To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named _delta_log, which is located at the root of ...

  • 3 kudos
4 More Replies
Leszek
by Contributor
  • 1132 Views
  • 1 replies
  • 3 kudos

How to handle schema changes in streaming Delta Tables?

I'm using Structure Streaming when moving data from one Delta Table to another.How to handle schema changes in those tables (e.g. adding new column)?

  • 1132 Views
  • 1 replies
  • 3 kudos
Latest Reply
Murthy1
Contributor II
  • 3 kudos

Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.

  • 3 kudos
Dipesh
by New Contributor II
  • 2160 Views
  • 1 replies
  • 1 kudos

Resolved! Bulk updating Delta tables in Databricks

Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...

  • 2160 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

yes by using MERGE statment

  • 1 kudos
naveen123
by New Contributor II
  • 1771 Views
  • 3 replies
  • 3 kudos

Previous data is getting wiped off for delta tables

I am using only insert sql query to insert the hist. load but previous data getting deleted.Tried with python query also but same issue persists.Reading the data from gcp bucket(parquet file)writing the data into gcp bucket(delta file)..the deleted f...

  • 1771 Views
  • 3 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Share your query and also look for any error messages in the driver logs. This might help to undertand better what is happening.

  • 3 kudos
2 More Replies
User16826992666
by Valued Contributor
  • 17688 Views
  • 2 replies
  • 2 kudos

Can I query my Delta tables with PowerBI?

I would like to connect to the Delta tables I have created with PowerBI to use for reporting. Is it possible to do this with Databricks or do I have to write my data to some other serving layer?

  • 17688 Views
  • 2 replies
  • 2 kudos
Latest Reply
gbrueckl
Contributor II
  • 2 kudos

if you want to read your Delta Lake table directly from the storage without the need of having a Databricks cluster up and running you can also use the official connector Power BI connector for Delta Lake https://github.com/delta-io/connectors/tree/m...

  • 2 kudos
1 More Replies
gauthamchettiar
by New Contributor II
  • 1818 Views
  • 0 replies
  • 1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

BroadCastJoin 1M
  • 1818 Views
  • 0 replies
  • 1 kudos
Labels