cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Monsem
by New Contributor III
  • 8803 Views
  • 8 replies
  • 5 kudos

Resolved! No Course Materials Widget below Lesson

Hello everyone,In my Databricks partner academy account, there is no course material while it should be under the lesson video. How can I resolve this problem? Does anyone else face the same problem? I had submitted a ticket to ask Databricks team bu...

  • 8803 Views
  • 8 replies
  • 5 kudos
Latest Reply
TheManOfSteele
New Contributor III
  • 5 kudos

I am still having this problem, cant find the slides and DLC for Data Engineering with Databricks

  • 5 kudos
7 More Replies
jeremy98
by New Contributor III
  • 31 Views
  • 3 replies
  • 0 kudos

How to deploy unique workflows that running on production

Hello, community!I have a question about deploying workflows in a production environment. Specifically, how can we deploy a group of workflows to production so that they are created only once and cannot be duplicated by others?Currently, if someone d...

  • 31 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

About your second question. You can use the UI to add Can_Manage permission on workflow job to a group. https://docs.databricks.com/en/jobs/privileges.html https://kb.databricks.com/en_US/security/bulk-update-workflow-permissions-for-a-group

  • 0 kudos
2 More Replies
pdiamond
by New Contributor II
  • 36 Views
  • 1 replies
  • 0 kudos

Resolved! Run a notebook as a different user or role

Outside of running jobs with different users, is there any way for me to run a notebook (or even better a cell within a notebook) as either a different user or a specific role that is not my user default?I'm trying to find an easy way to test data ma...

  • 36 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Unfortunately there is no direct way to run a notebook using additional principal, only option is to set up a notebook job task and on the Run As specify the principal that will run the job that can be a user or a Service Principal.

  • 0 kudos
dollyb
by Contributor
  • 44 Views
  • 4 replies
  • 0 kudos

Accessing Workspace / Repo file works in notebook, but not from job

In a notebook attached to.a normal personal cluster I can successfully do this:%fs ls file:/Workspace/Repos/$userName/$repoName/$folderNameWhen I run an init-script on a UC volume that the does the same thing, I'm getting this error:ls: cannot access...

  • 44 Views
  • 4 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @dollyb, Can you try with just "ls /Workspace/Repos/my_user_name@company.com/my_repo_name/my_folder_name" I'm not sure dbutils will be useful in an init script, I will try to test it out

  • 0 kudos
3 More Replies
rt-slowth
by Contributor
  • 1075 Views
  • 1 replies
  • 0 kudos

how to use dlt module in streaming pipeline

If anyone has example code for building a CDC live streaming pipeline generated by AWS DMS using import dlt, I'd love to see it.I'm currently able to see the parquet file starting with Load on the first full load to S3 and the cdc parquet file after ...

  • 1075 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

There is a blogpost for this that includes example code that you can find here

  • 0 kudos
Trilleo
by New Contributor III
  • 468 Views
  • 1 replies
  • 0 kudos

STATEMENT_TIMEOUT on a specific SQL Warehouse

Hi, I would like to se STATEMENT_TIMEOUT for a specific SQL warehouse and not on a global level.How would I do that?P.s. I would like to avoid it on a session level, just one-time configuration for a given SQL warehouse. 

  • 468 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Unfortunately we do not support that. We only support Global and Session level settings. We have an internal feature request for this (DB-I-6556 ) but it has not been prioritized in the Roadmap.

  • 0 kudos
haroon_24
by Visitor
  • 50 Views
  • 2 replies
  • 0 kudos

Error when trying to run model in staging

i am learning dbt and pache airflow i am using the samples catalog and tpch schema/databasewhen i try to run a sql query in my staging folder I get this error - I am using the premium trial version for databricks09:30:21 Running with dbt=1.8.8using l...

  • 50 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The error indicates that the command you are trying to run is not supported in UC, can you please share what is the SQL command you are currently running?

  • 0 kudos
1 More Replies
mkEngineer
by New Contributor III
  • 39 Views
  • 2 replies
  • 0 kudos

Configuring DLT _delta_logs with Log Analytics Workspace on Job Clusters

Hi,How do I configure my DLT (Delta Live Table pipeline notebook) _delta_logs with my Azure Log Analytics workspace? I'm encountering issues because the pipeline runs on a job cluster, which doesn't allow me to specify the destination of the log file...

  • 39 Views
  • 2 replies
  • 0 kudos
Latest Reply
mkEngineer
New Contributor III
  • 0 kudos

Hi @Alberto_Umana ,The error I received was related to cells not being connected to the DLT pipeline, as mentioned in my other post, "Cannot run a cell when connected to the pipeline Databricks." However, after browsing the web, I realized that there...

  • 0 kudos
1 More Replies
Milliman
by New Contributor
  • 1879 Views
  • 1 replies
  • 0 kudos

How could we automatically re run the complete job if any of its associted task fails.?

I need to re run the compete job automatically if any of its associated task gets failed, any help would be appreciable. Thanks

  • 1879 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You could use the suggestions being provided in Community post https://community.databricks.com/t5/data-engineering/need-to-automatically-rerun-the-failed-jobs-in-databricks/td-p/89074 

  • 0 kudos
mkEngineer
by New Contributor III
  • 20 Views
  • 1 replies
  • 0 kudos

Resolved! DLT: "cannot run a cell when connected to pipeline databricks"

Hi,I have several different cells in my notebook that are connected to a DLT pipeline. Why are some cells skipped and others aren't?I get the message "cannot run a cell when connected to the pipeline Databricks" when try running a cell when I'm conne...

  • 20 Views
  • 1 replies
  • 0 kudos
Latest Reply
TakuyaOmi
Contributor III
  • 0 kudos

Hi, @mkEngineer When working with Delta Live Tables (DLT) in Databricks, you cannot run individual cells interactively as you would in a standard Databricks notebook.DLT Pipeline Behavior:Delta Live Tables notebooks are executed as part of a managed ...

  • 0 kudos
minhhung0507
by New Contributor II
  • 28 Views
  • 1 replies
  • 0 kudos

Handling Dropped Records in Delta Live Tables with Watermark - Need Optimization Strategy

Hi Databricks Community,I'm encountering an issue with watermarks in Delta Live Tables that's causing data loss in my streaming pipeline. Let me explain my specific problem:Current SituationI've implemented watermarks for stateful processing in my De...

  • 28 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The watermark threshold determines how late data can be before it is considered too late and dropped. A smaller threshold results in lower latency but increases the likelihood of dropping late records. Conversely, a larger threshold reduces data loss...

  • 0 kudos
issa
by New Contributor II
  • 257 Views
  • 9 replies
  • 5 kudos

Resolved! How to access bronze dlt in silver dlt

I have a job in Workflows thatt runs two DLT pipelines, one for Bronze_Transaction and on for Silver_Transaction. The reason for two DLT pipelines is because i want the tables to be created in bronze catalog and erp schema, and silver catalog and erp...

Data Engineering
dlt
DLT pipeline
Medallion
Workflows
  • 257 Views
  • 9 replies
  • 5 kudos
Latest Reply
issa
New Contributor II
  • 5 kudos

Final solution for the Bronze:# Define view as the source@dlt.viewdef Transactions_Bronze_View():    return (        spark.readStream.format("cloudFiles")        .option("cloudFiles.format", "json")        .option("inferSchema", True)        .option(...

  • 5 kudos
8 More Replies
Timmes0815
by New Contributor II
  • 114 Views
  • 3 replies
  • 0 kudos

Resolved! Set up Loacation using widget

I'm struggeling using the databricks widget to set up the location in an sql create table statement. I tried the following to set up the location:Step1: Creating a notebook (Notebook1) to define the variable.Location_Path =   'abfss:xxxxx@xxxx.xxx.ne...

  • 114 Views
  • 3 replies
  • 0 kudos
Latest Reply
Timmes0815
New Contributor II
  • 0 kudos

I finaly solved my problem by using the parameters in python F-Strings:Location_Path = dbutils.widgets.text("Location_Path","") -- create table using widget query = f""" CREATE OR REPLACE TABLE schema.Tabelname1 LOCATION '{Location_Path}' AS SELECT...

  • 0 kudos
2 More Replies
holychs
by New Contributor II
  • 30 Views
  • 1 replies
  • 0 kudos

Running child job under parent job using run_job_task

Hi Community,I am trying to call another job under a workflow job using run_job_task. Currently I am manually providing job_id of the child job. I want to know if there is any way to pass job_name instead of run_id. This will automate the deployment ...

  • 30 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @holychs ,It is possible to do using lookup in Databricks Asset Bundles.You define the job id variable that finds id of the job based on its name and use this variable to specify job_id in the run_job_task. Here is the code: variables: my_job_id...

  • 0 kudos
holychs
by New Contributor II
  • 138 Views
  • 2 replies
  • 0 kudos

Resolved! Concurrent Workflow Jobs

Hi Community, I am trying to run a Databricks workflow job using run_job_task under a for_loop. I have set the concurrent jobs as 2. I can see 2 iteration jobs getting triggered successfully. But both fail with an error:"ConnectException: Connection ...

  • 138 Views
  • 2 replies
  • 0 kudos
Latest Reply
holychs
New Contributor II
  • 0 kudos

It was an internal bug resolved with managing different parameters for each loop jobs.

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels