cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

korijn
by New Contributor II
  • 98 Views
  • 1 replies
  • 0 kudos

How to set environment (client) on notebook via API/Terraform provider?

I am deploying a job with a notebook task via the Terraform provider. I want to set the client version to 2. I do NOT need to install any dependencies. I just want to use the new client version for the serverless compute. How do I do this with the Te...

  • 98 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Unfortunately, there is no direct way to set the client version for a notebook task via the Terraform provider or the API without using the UI. The error message suggests that the %pip magic command is the recommended approach for installing dependen...

  • 0 kudos
Alby091
by New Contributor
  • 105 Views
  • 1 replies
  • 0 kudos

Multiple schedules in workflow with different parameters

I have a notebook that takes a file from the landing, processes it and saves a delta table.This notebook contains a parameter (time_prm) that allows you to do this option for the different versions of files that arrive every day.Specifically, for eac...

Data Engineering
parameters
Workflows
  • 105 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Right now jobs support only 1 Schedule process per job, as you have mentioned you will need to create a different job for each schedule you require, you can use the clone capability to facilitate the process.

  • 0 kudos
LGABI
by New Contributor
  • 139 Views
  • 2 replies
  • 0 kudos

How to connect to Tableau Server FROM within Databricks Notebooks and publish data to Tableau Serv?

My company is having trouble connecting Databricks to Tableau Server. We need to be able to publish Hyper Files that are developed using Python on Databricks Notebooks to our Tableau Server, but it seems impossible to get a connection established des...

  • 139 Views
  • 2 replies
  • 0 kudos
Latest Reply
pgo
New Contributor II
  • 0 kudos

Please use netcat command for testing connection.

  • 0 kudos
1 More Replies
kazinahian
by New Contributor III
  • 2535 Views
  • 2 replies
  • 0 kudos

Lowcode ETL in Databricks

Hello everyone,I work as a Business Intelligence practitioner, employing tools like Alteryx or various low-code solutions to construct ETL processes and develop data pipelines for my Dashboards and reports. Currently, I'm delving into Azure Databrick...

  • 2535 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nam_Nguyen
Databricks Employee
  • 0 kudos

Hello @kazinahian , Azure Databricks offers several options for building ETL (Extract, Transform, Load) data pipelines, ranging from low-code to more code-centric approaches: Delta Live Tables Delta Live Tables (DLT) is a declarative framework for bu...

  • 0 kudos
1 More Replies
NathanSundarara
by Contributor
  • 1806 Views
  • 1 replies
  • 0 kudos

Lakehouse federation bringing data from SQL Server

Did any one tried to bring data using the newly announced Lakehouse federation and ingest using DELTA LIVE TABLES? I'm currently testing using Materialized Views. First loaded the full data and now loading last 3 days daily and recomputing using Mate...

Data Engineering
dlt
Lake house federation
  • 1806 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nam_Nguyen
Databricks Employee
  • 0 kudos

Hi @NathanSundarara , regarding your current approach, here are the potential solutions and considerations- Deduplication: Implement deduplication strategies within your DLT pipeline. For example clicksDedupDf = ( spark.readStream.table("LIVE.rawCl...

  • 0 kudos
RotemBar
by New Contributor II
  • 174 Views
  • 3 replies
  • 1 kudos

Incremental refresh - non serverless compute

Hey,I read the page about incremental refresh. Will you make it available on more than just serverless compute?If so, when?ThanksReference - https://docs.databricks.com/en/optimizations/incremental-refresh.html

  • 174 Views
  • 3 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Sure thing, will keep you posted in a DM.

  • 1 kudos
2 More Replies
martindlarsson
by New Contributor III
  • 346 Views
  • 2 replies
  • 0 kudos

Jobs indefinitely pending with libraries install

I think I found a bug where you get Pending indefinitely on jobs that has a library requirement and the user of the job does not have Manage permission on the cluster.In my case I was trying to start a dbt job with dbt-databricks=1.8.5 as library. Th...

  • 346 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your feedback! Just checking is this still an issue for you? would you share more details? if I wanted to reproduce this for example.

  • 0 kudos
1 More Replies
sakuraDev
by New Contributor II
  • 362 Views
  • 1 replies
  • 0 kudos

Why does soda not initialize?

Hey everyone, im using autoloader x soda.I'm new to both,The idea is to ingest with quality checks in my silver table for every batch in a continuous ingestion.I tried to configure soda as str just like the docs show, but its seems that it keeps on t...

sakuraDev_0-1725645131588.png
  • 362 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@sakuraDev is this still an ongoing issue? If so, could you please share the error stacktrace as a file attachment? Thanks.

  • 0 kudos
robbe
by New Contributor III
  • 1784 Views
  • 3 replies
  • 1 kudos

Resolved! Get job ID from Asset Bundles

When using Asset Bundles to deploy jobs, how does one get the job ID of the resources that are created?I would like to deploy some jobs through asset bundles, get the job IDs, and then trigger these jobs programmatically outside the CI/CD pipeline us...

  • 1784 Views
  • 3 replies
  • 1 kudos
Latest Reply
nvashisth
New Contributor III
  • 1 kudos

Refer this answer and this can be a solution to above scenario -> https://community.databricks.com/t5/data-engineering/getting-job-id-dynamically-to-create-another-job-to-refer-as-job/m-p/102860/highlight/true#M41252

  • 1 kudos
2 More Replies
stevewb
by New Contributor II
  • 378 Views
  • 2 replies
  • 1 kudos

Resolved! databricks bundle deploy fails when job includes dbt task and git_source

I am trying to deploy a dbt task as part of a databricks job using databricks asset bundles.However, there seems to be a clash that occurs when specifying a job that includes a dbt task that causes a bizarre failure.I am using v0.237.0 of the CLI.Min...

  • 378 Views
  • 2 replies
  • 1 kudos
Latest Reply
madams
Contributor
  • 1 kudos

Thanks for providing that whole example, it was really easy to fiddle with.  I think I've found your solution.  Update the original two tasks on the job (if you want to keep them) like this: tasks: - task_key: notebook_task job...

  • 1 kudos
1 More Replies
holychs
by New Contributor III
  • 125 Views
  • 1 replies
  • 0 kudos

Running child job under parent job using run_job_task

Hi Community,I am trying to call another job under a workflow job using run_job_task. Currently I am manually providing job_id of the child job. I want to know if there is any way to pass job_name instead of run_id. This will automate the deployment ...

  • 125 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @holychs ,It is possible to do using lookup in Databricks Asset Bundles.You define the job id variable that finds id of the job based on its name and use this variable to specify job_id in the run_job_task. Here is the code: variables: my_job_id...

  • 0 kudos
holychs
by New Contributor III
  • 224 Views
  • 2 replies
  • 0 kudos

Resolved! Concurrent Workflow Jobs

Hi Community, I am trying to run a Databricks workflow job using run_job_task under a for_loop. I have set the concurrent jobs as 2. I can see 2 iteration jobs getting triggered successfully. But both fail with an error:"ConnectException: Connection ...

  • 224 Views
  • 2 replies
  • 0 kudos
Latest Reply
holychs
New Contributor III
  • 0 kudos

It was an internal bug resolved with managing different parameters for each loop jobs.

  • 0 kudos
1 More Replies
GS_S
by New Contributor III
  • 389 Views
  • 7 replies
  • 0 kudos

Resolved! Error during merge operation: 'NoneType' object has no attribute 'collect'

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and can’t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an...

  • 389 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned  This data filtering is performed behind the scenes using serverless compute. In terms of costs:Customers are charged for ...

  • 0 kudos
6 More Replies
eballinger
by New Contributor III
  • 227 Views
  • 2 replies
  • 1 kudos

Resolved! Any way to ignore DLT tables in pipeline

Hello,In our testing environment we would like to be able to only update the DLT tables we are testing for our pipeline. This would help speed up the testing. We currently have the pipeline code being generated dynamically based on how many tables th...

  • 227 Views
  • 2 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @eballinger. To address your requirement of updating only specific Delta Live Tables (DLT) in your testing environment without removing the others, you can leverage the @dlt.table decorator and the temporary parameter in your Python code. This app...

  • 1 kudos
1 More Replies
ynskrbn
by New Contributor II
  • 240 Views
  • 4 replies
  • 0 kudos

"Databricks Bundle Deploy -t prod" command deletes log of historical runs

I'm using Databricks Asset Bundles with Azure DevOps CI/CD for workflow deployment. While the initial deployment to production works fine, I encounter an issue when updating the workflow in the development environment and redeploying it to production...

ynskrbn_0-1734355157245.png ynskrbn_1-1734355261180.png
  • 240 Views
  • 4 replies
  • 0 kudos
Latest Reply
PabloCSD
Valued Contributor
  • 0 kudos

When you re-deploy you job, do you augment the version? (e.g., 4.3.0 -> 4.3.1)I have been through this, when I change a definition in the databricks.yml, for example when changing the bundle name, because it detects as a new workflow.Can you explain ...

  • 0 kudos
3 More Replies
Labels