cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

anshi_t_k
by New Contributor III
  • 501 Views
  • 3 replies
  • 0 kudos

Practice question for data engineer exam

A data engineer, User A, has promoted a pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both user...

  • 501 Views
  • 3 replies
  • 0 kudos
Latest Reply
brickster
New Contributor II
  • 0 kudos

Option C will be the right answer in my perspective. Though a job ownership is transferred to another user, creater_user field will never change. In fact, "Creater User" field in job details panel is non-editable field.If I am wrong, clarification wi...

  • 0 kudos
2 More Replies
cosmicwhoop
by Visitor
  • 14 Views
  • 0 replies
  • 0 kudos

Delta Live Tables UI - missing EVENTS

I am new to Databricks and my setup is using Microsoft Azure (Premium Tier) + DatabricksI am trying to build Delta Live Tables and dont see events, without it i am finding it hard to understand the reason for job failure. Attached are 2 screenshot1) ...

  • 14 Views
  • 0 replies
  • 0 kudos
Mithos
by Visitor
  • 13 Views
  • 0 replies
  • 0 kudos

ZCube Tags not present in Databricks Delta Tables

The design doc for Liquid Clustering for Delta refer to Z-Cube to enable  incremental clustering in batches. This is the link - https://docs.google.com/document/d/1FWR3odjOw4v4-hjFy_hVaNdxHVs4WuK1asfB6M6XEMw/edit?pli=1&tab=t.0.It is also mentioned th...

  • 13 Views
  • 0 replies
  • 0 kudos
bantarobugs
by New Contributor
  • 273 Views
  • 1 replies
  • 0 kudos

Job Run failure - Azure Container does not exist

Hello,I have an ETL pipeline in Databricks that works perfectly when I execute it manually in the notebook using an all-purpose cluster. However, when I try to schedule it using a job cluster, it fails immediately with the error message: 'Azure conta...

Screenshot 2024-08-28 154926.png
  • 273 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
New Contributor
  • 0 kudos

Hey @bantarobugs There might be a problem with the permissions or roles assigned to the user or service principal trying to access the Azure container. Please check who/what is assigned and it role/permission: 

  • 0 kudos
DBUser2
by New Contributor III
  • 256 Views
  • 1 replies
  • 0 kudos

Simba ODBC batch queries

I'm using Simba ODBC driver to Connect to databricks. Since this driver doesn't support transactions, I was trying to run a DELETE and then INSERT query from a within a single execute, but I get an error. Is there an alternate way to perform a batch ...

  • 256 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
New Contributor
  • 0 kudos

Hey @DBUser2 It looks that delete is not supported at all based on documentation: Write-backThe Simba Apache Spark ODBC Connector supports translation for the followingsyntax when connecting to a Spark Thrift Server instance that is running Spark 1.3...

  • 0 kudos
Eren_DE
by New Contributor
  • 640 Views
  • 2 replies
  • 0 kudos

legacy Git integration has been removed from Notebook

How to integrate a notebook saved in workspace folder with git repos with feature branch? 

  • 640 Views
  • 2 replies
  • 0 kudos
Latest Reply
FierceSkirtsist
  • 0 kudos

Integrating a Databricks notebook with a Git repository using a feature branch sounds like a clean workflow for version control. The step-by-step process makes it straightforward to collaborate and track changes effectively. It's great that Databrick...

  • 0 kudos
1 More Replies
tgburrin-afs
by New Contributor
  • 254 Views
  • 2 replies
  • 0 kudos

Limiting concurrent tasks in a job

I have a job with > 10 tasks in it that interacts with an external system outside of databricks.  At the moment that external system cannot handle more than 3 of the tasks executing concurrently.  How can I limit the number of tasks that concurrently...

  • 254 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @tgburrin-afs, @Mounika_Tarigop ,As I understand the question is about running concurrent tasks within a single job rather than running concurrent jobs. max_concurrent_runs controls how many times a whole job can run simultaneously, not the concur...

  • 0 kudos
1 More Replies
tommyhmt
by New Contributor
  • 331 Views
  • 1 replies
  • 0 kudos

Add CreatedDate to Delta Live Table

Hi all,I have a very simple DLT set up using the following code:@dlt.view( name="view_name", comment="comments" ) def vw_DLT(): return spark.readStream.format("cloudFiles").option("cloudFiles.format", "csv").load(file_location) dlt.create_stre...

tommyhmt_0-1723758970190.png tommyhmt_1-1723759255869.png
  • 331 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

To add a CreatedDate column that captures the timestamp when a record is first inserted into the table, you can modify your Delta Live Tables (DLT) pipeline setup as follows: 1) Define the schema for your streaming table to include the CreatedDate co...

  • 0 kudos
Leigh_Turner
by New Contributor
  • 387 Views
  • 1 replies
  • 0 kudos

dataframe checkpoint when checkpoint location on abfss

 I'm trying to switch checkpoint locations from dbfs to abfss and i have noticed the following behaviour.The spark.sparkContext.setCheckpointDir will fail unless I call...dbutils.fs.mkdirs(checkpoint_dir) in the same cell.On top of this, the df = df....

  • 387 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

In DBFS, the checkpoint directory is automatically created when you set it using spark.sparkContext.setCheckpointDir(checkpoint_dir). This means that you do not need to explicitly create the directory beforehand using dbutils.fs.mkdirs(checkpoint_dir...

  • 0 kudos
SagarJi
by New Contributor II
  • 188 Views
  • 2 replies
  • 0 kudos

Data skipping statistics column datatype constraint

Is there any column datatype constraint for the first 32 columns used for the stats that help data skipping?

  • 188 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

There are no specific column datatype constraints for the first 32 columns used for the statistics that help with data skipping in Databricks. However, data skipping is not supported for partition columns.

  • 0 kudos
1 More Replies
pinaki1
by New Contributor III
  • 179 Views
  • 2 replies
  • 0 kudos

Datbricks delta sharing

Does caching works in delta sharing in databricks?

  • 179 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Delta Sharing in Databricks does support caching, specifically through the use of Delta Cache.Delta cache will be enabled by default on the recipient cluster. Delta Cache helps improve the performance of data reads by storing a copy of the data on th...

  • 0 kudos
1 More Replies
6502
by New Contributor III
  • 1002 Views
  • 1 replies
  • 0 kudos

UDF already defined error when using it into a DLT pipeline

I'm using Unity catalog and defined some UDFs in my catalog.database, as reported by show functions in main.default main.default.getgender main.default.tointlist main.default.tostrlistI can use them from a start warehouse pro:SELECT main.default.get_...

  • 1002 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Now the Python UDF support is currently in public preview, and it is supported in the current channel as well. Could you please try to re-run the code.  https://docs.databricks.com/en/delta-live-tables/unity-catalog.html

  • 0 kudos
elgeo
by Valued Contributor II
  • 1433 Views
  • 1 replies
  • 0 kudos

Interactive Notebook with widgets

Dear experts,We need to create a notebook in order the users to insert/update values in a specific table. We have created one using widgets. However the code performed per action selected is visible to the users. Is there a way to have in a the dropd...

  • 1433 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

You can separate the user interface (UI) from the code logic by using two notebooks. 1) The first notebook will contain the code that performs the actions based on the widget inputs. 2) The second notebook will contain the widgets that users interact...

  • 0 kudos
nggianno
by New Contributor III
  • 3198 Views
  • 3 replies
  • 10 kudos

How can I activate enzyme for delta live tables (or dlt serverless) ?

Hi!I am using Delta Live Tables and especially materialized views and i want to run a dlt pipeline but not rerun the whole view that cost time, but rather only update and add only the values that have been changed. I saw that Enzyme and does this job...

  • 3198 Views
  • 3 replies
  • 10 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 10 kudos

To achieve "PARTITION OVERWRITE" instead of "COMPLETE RECOMPUTE" when running a Delta Live Table (DLT) materialized view, you need to configure your pipeline to use incremental refresh. This can be done by setting up your DLT pipeline to use serverle...

  • 10 kudos
2 More Replies
zero234
by New Contributor III
  • 1247 Views
  • 1 replies
  • 0 kudos

Data is not loaded when creating two different streaming table from one delta live table pipeline

 i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.where as when i try to run each table indiv...

Data Engineering
dlt
spark
STREAMINGTABLE
  • 1247 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Delta Live Tables (DLT) can indeed process multiple streaming tables within a single pipeline.Here are a few things to check:1) Verify that each streaming table has a unique checkpoint location. Checkpointing is crucial for maintaining the state of s...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels