cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tommyhmt
by New Contributor II
  • 2423 Views
  • 2 replies
  • 0 kudos

Add CreatedDate to Delta Live Table

Hi all,I have a very simple DLT set up using the following code:@dlt.view( name="view_name", comment="comments" ) def vw_DLT(): return spark.readStream.format("cloudFiles").option("cloudFiles.format", "csv").load(file_location) dlt.create_stre...

tommyhmt_0-1723758970190.png tommyhmt_1-1723759255869.png
  • 2423 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

To add a CreatedDate column that captures the timestamp when a record is first inserted into the table, you can modify your Delta Live Tables (DLT) pipeline setup as follows: 1) Define the schema for your streaming table to include the CreatedDate co...

  • 0 kudos
1 More Replies
bantarobugs
by New Contributor
  • 1885 Views
  • 1 replies
  • 0 kudos

Job Run failure - Azure Container does not exist

Hello,I have an ETL pipeline in Databricks that works perfectly when I execute it manually in the notebook using an all-purpose cluster. However, when I try to schedule it using a job cluster, it fails immediately with the error message: 'Azure conta...

Screenshot 2024-08-28 154926.png
  • 1885 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
Contributor
  • 0 kudos

Hey @bantarobugs There might be a problem with the permissions or roles assigned to the user or service principal trying to access the Azure container. Please check who/what is assigned and it role/permission: 

  • 0 kudos
DBUser2
by New Contributor III
  • 8920 Views
  • 1 replies
  • 0 kudos

Simba ODBC batch queries

I'm using Simba ODBC driver to Connect to databricks. Since this driver doesn't support transactions, I was trying to run a DELETE and then INSERT query from a within a single execute, but I get an error. Is there an alternate way to perform a batch ...

  • 8920 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
Contributor
  • 0 kudos

Hey @DBUser2 It looks that delete is not supported at all based on documentation: Write-backThe Simba Apache Spark ODBC Connector supports translation for the followingsyntax when connecting to a Spark Thrift Server instance that is running Spark 1.3...

  • 0 kudos
Eren_DE
by New Contributor
  • 2268 Views
  • 2 replies
  • 0 kudos

legacy Git integration has been removed from Notebook

How to integrate a notebook saved in workspace folder with git repos with feature branch? 

  • 2268 Views
  • 2 replies
  • 0 kudos
Latest Reply
FierceSkirtsist
New Contributor II
  • 0 kudos

Integrating a Databricks notebook with a Git repository using a feature branch sounds like a clean workflow for version control. The step-by-step process makes it straightforward to collaborate and track changes effectively. It's great that Databrick...

  • 0 kudos
1 More Replies
Leigh_Turner
by New Contributor
  • 2169 Views
  • 1 replies
  • 0 kudos

dataframe checkpoint when checkpoint location on abfss

 I'm trying to switch checkpoint locations from dbfs to abfss and i have noticed the following behaviour.The spark.sparkContext.setCheckpointDir will fail unless I call...dbutils.fs.mkdirs(checkpoint_dir) in the same cell.On top of this, the df = df....

  • 2169 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

In DBFS, the checkpoint directory is automatically created when you set it using spark.sparkContext.setCheckpointDir(checkpoint_dir). This means that you do not need to explicitly create the directory beforehand using dbutils.fs.mkdirs(checkpoint_dir...

  • 0 kudos
SagarJi
by New Contributor II
  • 1812 Views
  • 2 replies
  • 0 kudos

Data skipping statistics column datatype constraint

Is there any column datatype constraint for the first 32 columns used for the stats that help data skipping?

  • 1812 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

There are no specific column datatype constraints for the first 32 columns used for the statistics that help with data skipping in Databricks. However, data skipping is not supported for partition columns.

  • 0 kudos
1 More Replies
pinaki1
by New Contributor III
  • 1870 Views
  • 2 replies
  • 0 kudos

Datbricks delta sharing

Does caching works in delta sharing in databricks?

  • 1870 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Delta Sharing in Databricks does support caching, specifically through the use of Delta Cache.Delta cache will be enabled by default on the recipient cluster. Delta Cache helps improve the performance of data reads by storing a copy of the data on th...

  • 0 kudos
1 More Replies
6502
by New Contributor III
  • 1326 Views
  • 1 replies
  • 0 kudos

UDF already defined error when using it into a DLT pipeline

I'm using Unity catalog and defined some UDFs in my catalog.database, as reported by show functions in main.default main.default.getgender main.default.tointlist main.default.tostrlistI can use them from a start warehouse pro:SELECT main.default.get_...

  • 1326 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Now the Python UDF support is currently in public preview, and it is supported in the current channel as well. Could you please try to re-run the code.  https://docs.databricks.com/en/delta-live-tables/unity-catalog.html

  • 0 kudos
elgeo
by Valued Contributor II
  • 3432 Views
  • 1 replies
  • 0 kudos

Interactive Notebook with widgets

Dear experts,We need to create a notebook in order the users to insert/update values in a specific table. We have created one using widgets. However the code performed per action selected is visible to the users. Is there a way to have in a the dropd...

  • 3432 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

You can separate the user interface (UI) from the code logic by using two notebooks. 1) The first notebook will contain the code that performs the actions based on the widget inputs. 2) The second notebook will contain the widgets that users interact...

  • 0 kudos
nggianno
by New Contributor III
  • 7018 Views
  • 3 replies
  • 10 kudos

How can I activate enzyme for delta live tables (or dlt serverless) ?

Hi!I am using Delta Live Tables and especially materialized views and i want to run a dlt pipeline but not rerun the whole view that cost time, but rather only update and add only the values that have been changed. I saw that Enzyme and does this job...

  • 7018 Views
  • 3 replies
  • 10 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 10 kudos

To achieve "PARTITION OVERWRITE" instead of "COMPLETE RECOMPUTE" when running a Delta Live Table (DLT) materialized view, you need to configure your pipeline to use incremental refresh. This can be done by setting up your DLT pipeline to use serverle...

  • 10 kudos
2 More Replies
zero234
by New Contributor III
  • 2806 Views
  • 1 replies
  • 0 kudos

Data is not loaded when creating two different streaming table from one delta live table pipeline

 i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.where as when i try to run each table indiv...

Data Engineering
dlt
spark
STREAMINGTABLE
  • 2806 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Delta Live Tables (DLT) can indeed process multiple streaming tables within a single pipeline.Here are a few things to check:1) Verify that each streaming table has a unique checkpoint location. Checkpointing is crucial for maintaining the state of s...

  • 0 kudos
Jyo777
by Contributor
  • 7462 Views
  • 7 replies
  • 4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

  • 7462 Views
  • 7 replies
  • 4 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 4 kudos

Not a comparison, but there is a DB-SQL cheatsheet at https://www.databricks.com/sites/default/files/2023-09/databricks-sql-cheatsheet.pdf/

  • 4 kudos
6 More Replies
hukel
by Contributor
  • 599 Views
  • 1 replies
  • 0 kudos

Python function using Splunk SDK works in Python notebook but not in SQL notebook

Background:I've created a small function in a notebook that uses Splunk's splunk-sdk  package.  The original intention was to call Splunk to execute a search/query, but for the sake of simplicity while testing this issue,  the function only prints pr...

  • 599 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

When you run a Python function in a Python cell, it executes in the local Python environment of the notebook. However, when you call a Python function from a SQL cell, it runs as a UDF within the Spark execution environment.  You need to define the f...

  • 0 kudos
Miguel_Salas
by New Contributor II
  • 768 Views
  • 1 replies
  • 1 kudos

Last file in S3 folder using autoloader

Nowadays we already use the autoloader with checkpoint location, but I still wanted to know if it is possible to read only the last updated file within a folder. I know it somewhat loses the purpose of checkpoint locatioAnother question is it possibl...

  • 768 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

Auto loader's scope is limited to incrementally loading files from storage, and there is no such functionality to just load the latest file from a group of files, you'd likely want to have this kind of "last updated" logic in a different layer or in ...

  • 1 kudos
smit_tw
by New Contributor III
  • 613 Views
  • 2 replies
  • 1 kudos

APPLY AS DELETE without operation

We are performing a full load of API data into the Bronze table (append only), and then using the APPLY CHANGES INTO query to move data from Bronze to Silver using Stream to get only new records. How can we also utilize the APPLY AS DELETE functional...

  • 613 Views
  • 2 replies
  • 1 kudos
Latest Reply
smit_tw
New Contributor III
  • 1 kudos

@szymon_dybczak is it possible to do directly in Silver layer? We do not have option to go fo Gold layer. I tried to create a TEMP VIEW in Silver DLT pipeline but it gives error for circular dependency as I am comparing data from Silver it self and a...

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels