cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Oliver_Angelil
by Valued Contributor II
  • 6244 Views
  • 9 replies
  • 6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Screenshot 2023-05-09 at 17.02.04
  • 6244 Views
  • 9 replies
  • 6 kudos
Latest Reply
Rahul_S
New Contributor
  • 6 kudos

Informative.

  • 6 kudos
8 More Replies
bgerhardi
by New Contributor III
  • 6455 Views
  • 12 replies
  • 13 kudos

Surrogate Keys with Delta Live

We are considering moving to Delta Live tables from a traditional sql-based data warehouse. Worrying me is this FAQ on identity columns Delta Live Tables frequently asked questions | Databricks on AWS this seems to suggest that we basically can't cre...

  • 6455 Views
  • 12 replies
  • 13 kudos
Latest Reply
Anonymous
Not applicable
  • 13 kudos

Hi @Brett Gerhardi​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 13 kudos
11 More Replies
amartinez
by New Contributor III
  • 3445 Views
  • 6 replies
  • 4 kudos

Workaround for GraphFrames not working on Delta Live Table?

According to this page, the GraphFrames package is included in the databricks runtime since at least 11.0. However trying to run a connected components algorithm inside a delta live table notebook yields the error java.lang.ClassNotFoundException: or...

  • 3445 Views
  • 6 replies
  • 4 kudos
Latest Reply
lprevost
New Contributor III
  • 4 kudos

I'm also trying to use GraphFrames inside a DLT pipeline.   I get an error that graphframes not installed in the cluster.   i"m using it successfully in test notebooks using the ML version of the cluster.  Is there a way to use this inside a DLT job?

  • 4 kudos
5 More Replies
AkasBala
by New Contributor III
  • 1754 Views
  • 3 replies
  • 0 kudos

Primary Key not working as expected on Unity Catalog delta tables

Hi @Chetan Kardekar. I noticed that you had commented on Primary key on Delta tables. Do we have that feature already released in DataBricks Premium. I have a Unity Catalog and I created a table with Primary Key, though it doesnt act like Primary Key...

  • 1754 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Bala Akas​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
2 More Replies
kskistad
by New Contributor III
  • 3442 Views
  • 3 replies
  • 4 kudos

Resolved! Streaming Delta Live Tables

I'm a little confused about how streaming works with DLT. My first questions is what is the difference in behavior if you set the pipeline mode to "Continuous" but in your notebook you don't use the "streaming" prefix on table statements, and simila...

  • 3442 Views
  • 3 replies
  • 4 kudos
Latest Reply
Harsh141220
New Contributor II
  • 4 kudos

Is it possible to have custom upserts in streaming tables in a delta live tables pipeline?Use case: I am trying to maintain a valid session based on timestamp column and want to upsert to the target table.Tried going through the documentations but dl...

  • 4 kudos
2 More Replies
Chris_Konsur
by New Contributor III
  • 12936 Views
  • 4 replies
  • 6 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 12936 Views
  • 4 replies
  • 6 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 6 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 6 kudos
3 More Replies
MartinH
by New Contributor II
  • 3900 Views
  • 7 replies
  • 5 kudos

Resolved! Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

  • 3900 Views
  • 7 replies
  • 5 kudos
Latest Reply
CharlesReily
New Contributor III
  • 5 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

  • 5 kudos
6 More Replies
Deepak_Kandpal
by New Contributor III
  • 8978 Views
  • 4 replies
  • 3 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key with com.crealytics:spark-excel

I have setup my Databricks notebook to use Service Principal to access ADLS using below configuration.service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")   spark.conf.set("fs.azure.account.auth.type.<storage-accou...

  • 8978 Views
  • 4 replies
  • 3 kudos
Latest Reply
Harsha_Dbrs
New Contributor II
  • 3 kudos

Below is the implementation of same code in scala:spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.<accountName>.dfs.core.windows.net",<accountKey>)

  • 3 kudos
3 More Replies
labromb
by Contributor
  • 10430 Views
  • 8 replies
  • 4 kudos

How to pass configuration values to a Delta Live Tables job through the Delta Live Tables API

Hi Community,I have successfully run a job through the API but would need to be able to pass parameters (configuration) to the DLT workflow via the APII have tried passing JSON in this format:{ "full_refresh": "true", "configuration": [ ...

  • 10430 Views
  • 8 replies
  • 4 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 4 kudos

@Mo - it worked. Thank you so much.

  • 4 kudos
7 More Replies
pokus
by New Contributor III
  • 4099 Views
  • 3 replies
  • 2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

  • 4099 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbal
New Contributor III
  • 2 kudos

Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.

  • 2 kudos
2 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 1390 Views
  • 3 replies
  • 7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

  • 1390 Views
  • 3 replies
  • 7 kudos
Latest Reply
Poovarasan
New Contributor III
  • 7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns 

  • 7 kudos
2 More Replies
brickster_2018
by Esteemed Contributor
  • 5837 Views
  • 2 replies
  • 0 kudos

Resolved! How does Delta solve the large number of small file problems?

Delta creates more small files during merge and updates operations.

  • 5837 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Delta solves the large number of small file problems using the below operations available for a Delta table. Optimize writes helps to optimizes the write operation by adding an additional shuffle step and reducing the number of output files. By defau...

  • 0 kudos
1 More Replies
tinai_long
by New Contributor III
  • 6399 Views
  • 10 replies
  • 4 kudos

Resolved! How to refresh a single table in Delta Live Tables?

Suppose I have a Delta Live Tables framework with 2 tables: Table 1 ingests from a json source, Table 2 reads from Table 1 and runs some transformation.In other words, the data flow is json source -> Table 1 -> Table 2. Now if I find some bugs in the...

  • 6399 Views
  • 10 replies
  • 4 kudos
Latest Reply
cpayne_vax
New Contributor III
  • 4 kudos

Answering my own question: nowadays (February 2024) this can all be done via the UI.When viewing your DLT pipeline there is a "Select tables for refresh" button in the header. If you click this, you can select individual tables, and then in the botto...

  • 4 kudos
9 More Replies
Rishabh-Pandey
by Honored Contributor II
  • 2562 Views
  • 3 replies
  • 3 kudos

www.linkedin.com

woahhh #Excel plug in for #DeltaSharing.Now I can import delta tables directly into my spreadsheet using Delta Sharing.It puts the power of #DeltaLake into the hands of millions of business users.What does this mean?Imagine a data provider delivering...

  • 2562 Views
  • 3 replies
  • 3 kudos
Latest Reply
udit02
New Contributor II
  • 3 kudos

If you have any uncertainties, feel free to inquire here or connect with me on my LinkedIn profile for further assistance.https://whatsgbpro.org/

  • 3 kudos
2 More Replies
Labels