cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Chris_Shehu
by Valued Contributor III
  • 8919 Views
  • 5 replies
  • 5 kudos

Resolved! What is the best way to handle big data sets?

I'm trying to find the best strategy for handling big data sets. In this case I have something that is 450 million records. I'm pulling the data from SQL Server very quickly but when I try to push the data to the Delta Table OR a Azure Container the...

  • 8919 Views
  • 5 replies
  • 5 kudos
Latest Reply
Wilynan
New Contributor II
  • 5 kudos

I think you should consult experts in Big Data for advice on this issue

  • 5 kudos
4 More Replies
Constantine
by Contributor III
  • 4451 Views
  • 2 replies
  • 4 kudos

Resolved! How does merge schema work

Let's say I create a table like CREATE TABLE IF NOT EXISTS new_db.data_table ( key STRING, value STRING, last_updated_time TIMESTAMP ) USING DELTA LOCATION 's3://......';Now when I insert into this table I insert data which has say 20 columns a...

  • 4451 Views
  • 2 replies
  • 4 kudos
Latest Reply
timdriscoll22
New Contributor II
  • 4 kudos

I tried running "REFRESH TABLE tablename;" but I still do not see the added columns in the data explorer columns, while I do see the added columns in the sample data 

  • 4 kudos
1 More Replies
martindlarsson
by New Contributor III
  • 411 Views
  • 0 replies
  • 0 kudos

Autoloader and deletion vectors (Predictive IO)

We are looking into enabling Predictive IO on our delta tables. In the ingest process we are using autoloader and I am wondering if autoloader will gett a flag to enable deletion vectors at table creation? Deletion vectors is a requirement for Predic...

  • 411 Views
  • 0 replies
  • 0 kudos
DrK
by New Contributor III
  • 822 Views
  • 2 replies
  • 2 kudos

Resolved! Current_Database() function unexpected results when queried with PowerBI

Hi,I`m creating some views to be queried by PowerBI. In our delta tables we have a column called database name which contains the source systems database name. What I`m doing is using this to filter data WHERE databaseName = current_database(). No...

  • 822 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Andy Skinner​, Thank you for contacting us regarding the issue you're experiencing when querying views using the current_database() function in Power BI. I understand that the ideas work as expected within a worksheet, but no data is returned whe...

  • 2 kudos
1 More Replies
HamidHamid_Mora
by New Contributor II
  • 1213 Views
  • 2 replies
  • 2 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

  • 1213 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Ganglia is only supported on Databricks Runtime versions 12 and below. From Databricks Runtime 13, Ganglia is replaced by a new Databricks metrics system offering more features and integrations. To export metrics to external services, you can use Dat...

  • 2 kudos
1 More Replies
SS0201
by New Contributor II
  • 1807 Views
  • 4 replies
  • 0 kudos

Slow updates/upserts in Delta tables

When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightni...

  • 1807 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Surya Agarwal​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
3 More Replies
Sid1805
by New Contributor II
  • 1943 Views
  • 5 replies
  • 0 kudos

Resolved! Calling Delta Tables using JDBC

Hi team,If we kill - clusters every-time will the connection details changes.if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.Also if I want to call a Delta Table from an API using JDBC - s...

  • 1943 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Siddharth Krishna​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...

  • 0 kudos
4 More Replies
thushar
by Contributor
  • 2458 Views
  • 3 replies
  • 3 kudos

Resolved! Explicit transaction blocks

I know delta tables are supporting the ACID properties and my understanding is Merge, Insert, delete, etc. are inside a transaction by default and if any error occurred during these operations, that transaction will be roll backed. I hope this unders...

  • 2458 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Thushar R​​, We can build a thriving shared knowledge and insights community. Return and mark the best answers to contribute to our ongoing pursuit of excellence.

  • 3 kudos
2 More Replies
sanjay
by Valued Contributor II
  • 3684 Views
  • 4 replies
  • 2 kudos

Resolved! How to backup databrick delta tables or workspace

Hi,I am trying to understand how to take backup of databrick delta tables/workspace and how to restore in case if any failure.Or suggest me any alternative solution to revert back if data is corrupted.Regards,Sanjay

  • 3684 Views
  • 4 replies
  • 2 kudos
Latest Reply
NandiniN
Valued Contributor II
  • 2 kudos

Hi @Sanjay Jain​ ,Here are some of the ways Deep Clone: https://www.databricks.com/wp-content/uploads/notebooks/using-deep-clone-disaster-recovery-delta-lake-databricks.htmlRepos for notebooks and code: https://docs.databricks.com/repos/index.htmlht...

  • 2 kudos
3 More Replies
Ovi
by New Contributor III
  • 2456 Views
  • 5 replies
  • 3 kudos

Resolved! Filter only Delta tables from an S3 folders list

Hello everyone,From a list of folders on s3, how can I filter which ones are Delta tables, without trying to read each one at a time?Thanks,Ovi

  • 2456 Views
  • 5 replies
  • 3 kudos
Latest Reply
NandiniN
Valued Contributor II
  • 3 kudos

Hello @Ovidiu Eremia​ ,To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named _delta_log, which is located at the root of ...

  • 3 kudos
4 More Replies
Leszek
by Contributor
  • 580 Views
  • 1 replies
  • 3 kudos

How to handle schema changes in streaming Delta Tables?

I'm using Structure Streaming when moving data from one Delta Table to another.How to handle schema changes in those tables (e.g. adding new column)?

  • 580 Views
  • 1 replies
  • 3 kudos
Latest Reply
Murthy1
Contributor II
  • 3 kudos

Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.

  • 3 kudos
Dipesh
by New Contributor II
  • 938 Views
  • 1 replies
  • 1 kudos

Resolved! Bulk updating Delta tables in Databricks

Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...

  • 938 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

yes by using MERGE statment

  • 1 kudos
naveen123
by New Contributor II
  • 804 Views
  • 3 replies
  • 3 kudos

Previous data is getting wiped off for delta tables

I am using only insert sql query to insert the hist. load but previous data getting deleted.Tried with python query also but same issue persists.Reading the data from gcp bucket(parquet file)writing the data into gcp bucket(delta file)..the deleted f...

  • 804 Views
  • 3 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Share your query and also look for any error messages in the driver logs. This might help to undertand better what is happening.

  • 3 kudos
2 More Replies
User16826992666
by Valued Contributor
  • 12046 Views
  • 2 replies
  • 2 kudos

Can I query my Delta tables with PowerBI?

I would like to connect to the Delta tables I have created with PowerBI to use for reporting. Is it possible to do this with Databricks or do I have to write my data to some other serving layer?

  • 12046 Views
  • 2 replies
  • 2 kudos
Latest Reply
gbrueckl
Contributor II
  • 2 kudos

if you want to read your Delta Lake table directly from the storage without the need of having a Databricks cluster up and running you can also use the official connector Power BI connector for Delta Lake https://github.com/delta-io/connectors/tree/m...

  • 2 kudos
1 More Replies
gauthamchettiar
by New Contributor II
  • 819 Views
  • 0 replies
  • 1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

BroadCastJoin 1M
  • 819 Views
  • 0 replies
  • 1 kudos
Labels