Topics with Label: Workflows

Forum Posts

Sorted by:

by timothy_uk • New Contributor III

06-14-2023 10:12:37 AM

593 Views
1 replies
1 kudos

Mysterious simultaneous long-running Databricks Workflows

Hi,This happened across 4x seemingly unrelated workflows at the same time of the day - all 4x workflows eventually completed successfully. It appeared that all workflows sat idling despite triggering via the Jobs API. The two symptoms I have observed...

Data Engineering

593 Views
1 replies
1 kudos

06-14-2023 10:12:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 2:28:49 AM

1 kudos

Hi @Timothy Lin Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

1 kudos

06-17-2023 2:28:49 AM

by Oliver_Angelil • Valued Contributor II

05-03-2023 7:47:02 AM

826 Views
2 replies
2 kudos

Automated CI code checks using workflows when PR is raised

I'm familiar with Github Actions workflows to automate code checks whenever a PR is raised to a specified branch. For example for Python code, very useful is if unit tests (e.g. pytest), syntax (flake8), and code formatting (black formatter), type h...

Data Engineering

826 Views
2 replies
2 kudos

05-03-2023 7:47:02 AM

View Replies

Latest Reply

Priyag1
Honored Contributor II

05-03-2023 10:05:09 PM

2 kudos

In a typical software development workflow (e.g. Github flow), a feature branch is created based on the master branch for feature development. A notebook can be synced to the feature branch via Github integration. Or a notebook can be exported from D...

2 kudos

05-03-2023 10:05:09 PM

1 More Replies

by gdev • New Contributor

06-02-2022 11:22:09 PM

4095 Views
7 replies
3 kudos

Resolved! Migrate notebooks and workflows and others .

I want to move notebooks , workflows , data from one users to another user in Azure Databricks. We move have access to that databricks. Is it possible? If, yes. How to move it.

Data Engineering

4095 Views
7 replies
3 kudos

06-02-2022 11:22:09 PM

View Replies

Latest Reply

deedstoke
New Contributor II

04-25-2023 4:26:14 AM

3 kudos

Hope all is well!

3 kudos

04-25-2023 4:26:14 AM

6 More Replies

by Tjadi • New Contributor III

04-04-2023 1:11:36 AM

914 Views
2 replies
4 kudos

Specifying cluster on running a job

Hi,Let's say that I am starting jobs with different parameters at a certain time each day in the following manner:response = requests.post( "https://%s/api/2.0/jobs/run-now" % (DOMAIN), headers={"Authorization": "Bearer %s" % TOKEN}, json={ ...

Data Engineering

914 Views
2 replies
4 kudos

04-04-2023 1:11:36 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-04-2023 9:13:42 AM

4 kudos

@Tjadi Peeters You can select option Autoscaling/Enhanced Scaling in workflows which will scale based on workload

4 kudos

04-04-2023 9:13:42 AM

1 More Replies

by User16835756816 • Valued Contributor

01-23-2023 3:55:06 PM

1568 Views
3 replies
1 kudos

How can I optimize my data pipeline?

Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...

Data Engineering

1568 Views
3 replies
1 kudos

01-23-2023 3:55:06 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-24-2023 10:40:50 AM

1 kudos

some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...

1 kudos

01-24-2023 10:40:50 AM

2 More Replies

by espenol • New Contributor III

12-05-2022 6:08:58 AM

1494 Views
3 replies
0 kudos

How to debug Workflow Jobs timing out and DLT pipelines running forever?

So I'm the designated data engineer for a proof of concept we're running, I'm working with one infrastructure guy who's setting up everything in Terraform (company policy). He's got the setup down for Databricks so we can configure clusters and run n...

Data Engineering

1494 Views
3 replies
0 kudos

12-05-2022 6:08:58 AM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

01-30-2023 12:12:32 PM

0 kudos

@Espen Solvang - Just thought of checking with you, could you please let us know if you require further assistance on this?

0 kudos

01-30-2023 12:12:32 PM

2 More Replies

by joakon • New Contributor III

01-25-2023 12:25:39 PM

1493 Views
4 replies
3 kudos

Resolved! Databricks - Workflow- Jobs- Script to automate

Hi - I have created a Databricks job - under Workflow - its running fine without any issues . I would like to promote this job to other workspaces using a script.Is there a way to script the job definition and deploy it across multiple workspaces .I ...

Data Engineering

1493 Views
4 replies
3 kudos

01-25-2023 12:25:39 PM

View Replies

Latest Reply

joakon
New Contributor III

01-27-2023 10:04:57 AM

3 kudos

thank you @Landan George

3 kudos

01-27-2023 10:04:57 AM

3 More Replies

by Vsleg • Contributor

12-03-2022 12:43:58 AM

1469 Views
2 replies
1 kudos

Resolved! Deploying Databricks Workflows and Delta Live Table pipelines across workspaces

Hello,I was wondering if there is a way to deploy Databricks Workflows and Delta Live Table pipelines across Workspaces (DEV/UAT/PROD) using Azure DevOps.

Data Engineering

1469 Views
2 replies
1 kudos

12-03-2022 12:43:58 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-03-2022 3:30:47 AM

1 kudos

Yes, for sure, using Rest API Calls to https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-api-guide.htmlYou can create DLT manually from GUI and take JSON representative of it, tweak it (so it uses your env variables, for examp...

1 kudos

12-03-2022 3:30:47 AM

1 More Replies

by jeremy1 • New Contributor II

05-17-2022 12:57:47 PM

5331 Views
10 replies
7 kudos

DLT and Modularity (best practices?)

I have [very] recently started using DLT for the first time. One of the challenges I have run into is how to include other "modules" within my pipelines. I missed the documentation where magic commands (with the exception of %pip) are ignored and was...

Data Engineering

5331 Views
10 replies
7 kudos

05-17-2022 12:57:47 PM

View Replies

Latest Reply

Greg_Galloway
New Contributor III

10-20-2022 11:14:13 AM

7 kudos

I like the approach @Arvind Ravish shared since you can't currently use %run in DLT pipelines. However, it took a little testing to be clear on how exactly to make it work. First, ensure in the Admin Console that the repos feature is configured as f...

7 kudos

10-20-2022 11:14:13 AM

9 More Replies

by FranPérez • New Contributor III

08-01-2022 12:37:10 AM

5668 Views
7 replies
4 kudos

set PYTHONPATH when executing workflows

I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. { "task_key": "prepare", "spark_python_task": { "python_file": "file...

Data Engineering

5668 Views
7 replies
4 kudos

08-01-2022 12:37:10 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

08-30-2022 10:07:06 AM

4 kudos

Hi @Fran Pérez,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

4 kudos

08-30-2022 10:07:06 AM

6 More Replies

by Sujitha • Community Manager

12-14-2022 9:16:20 AM

720 Views
1 replies
4 kudos

Documentation Update Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data enginee...

Documentation Update Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data engineers working in the Databricks Data Science & Engineering, Databricks Machine Learning, and Databricks ...

Data Engineering

720 Views
1 replies
4 kudos

12-14-2022 9:16:20 AM

View Replies

Latest Reply

Harun
Honored Contributor

12-15-2022 6:04:11 AM

4 kudos

Thanks for sharing @Sujitha Ramamoorthy

4 kudos

12-15-2022 6:04:11 AM

by Netty • New Contributor III

12-09-2022 12:07:06 PM

2200 Views
1 replies
2 kudos

What's the crontab notation for every other week for Databricks Workflow scheduling?

Hello,I need to schedule some of my jobs within Databricks Workflow every other week (or every 4 weeks). I've scoured a few forums for find what this notation would be, but I've been unfruitful in my searches.Is this scheduling possible in crontab? I...

Data Engineering

2200 Views
1 replies
2 kudos

12-09-2022 12:07:06 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-10-2022 5:05:36 AM

2 kudos

For every seven days starting from Monday, you need to use 2/7. From my experience, that generator works best with databricks https://www.freeformatter.com/cron-expression-generator-quartz.html

2 kudos

12-10-2022 5:05:36 AM

by MadelynM • New Contributor III

07-05-2022 10:32:35 AM

3229 Views
3 replies
3 kudos

How do I move existing workflows and jobs running on an all-purpose cluster to a shared jobs cluster?

A Databricks cluster is a set of computation resources that performs the heavy lifting of all of the data workloads you run in Databricks. Databricks provides a number of options when you create and configure clusters to help you get the best perform...

Left navigation bar selecting Data Science & Engineering

Data Engineering

3229 Views
3 replies
3 kudos

07-05-2022 10:32:35 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-07-2022 11:54:48 PM

3 kudos

Hi @Madelyn Mullen , Thank you for sharing such an excellent and informative post. We hope to see these very often.

3 kudos

07-07-2022 11:54:48 PM

2 More Replies

by Michael_Galli • Contributor II

04-22-2022 3:00:10 AM

2275 Views
1 replies
1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

Data Engineering

2275 Views
1 replies
1 kudos

04-22-2022 3:00:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-22-2022 3:16:05 AM

1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

1 kudos

04-22-2022 3:16:05 AM