Data Engineering

Forum Posts

Sorted by:

by xneg • Contributor

03-06-2023 7:34:01 AM

15101 Views
12 replies
9 kudos

PyPI library sometimes doesn't install during workflow execution

I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{ "task_key": "my_task", "depends_on": [ { "task_key": "<...>...

Data Engineering

15101 Views
12 replies
9 kudos

03-06-2023 7:34:01 AM

View Replies

Latest Reply

Vartika
Databricks Employee

03-31-2023 2:40:57 AM

9 kudos

Hey @Eugene Bikkinin Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

9 kudos

03-31-2023 2:40:57 AM

11 More Replies

by Ludo • New Contributor III

01-06-2023 6:54:50 AM

7096 Views
7 replies
2 kudos

Resolved! Jobs with multi-tasking are failing to retry; how to fix this issue?

Hello,This is question on our platform with `Databricks Runtime 11.3 LTS`.I'm running a Job with multiple tasks in // using a shared cluster.Each task runs a dedicated scala class within a JAR library attached as a dependency.One of the task fails (c...

Data Engineering

7096 Views
7 replies
2 kudos

01-06-2023 6:54:50 AM

View Replies

Latest Reply

YoshiCoppens61
New Contributor II

09-11-2023 5:54:22 AM

2 kudos

Hi,This actually should not be marked as solved. We are having the same problem, whenever a Shared Job Cluster crashes for some reason (generally OoM), all tasks will start failing until eternity, with the error message as described above. This is ac...

2 kudos

09-11-2023 5:54:22 AM

6 More Replies

by naga_databricks • Contributor

05-09-2023 5:32:45 AM

3421 Views
2 replies
1 kudos

Using Init scripts using DBX

I specify init scripts in my deployment.conf, as below: basic-static-cluster: &basic-static-cluster new_cluster: spark_version: "13.0.x-scala2.12" num_workers: 1 node_type_id: "n2-highmem-2" init_scripts: - worksp...

Data Engineering

3421 Views
2 replies
1 kudos

05-09-2023 5:32:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-20-2023 10:01:28 PM

1 kudos

Hi @Naga Vaibhav Elluru We haven't heard from you since the last response from @Debayan Mukherjee , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be he...

1 kudos

05-20-2023 10:01:28 PM

1 More Replies

by ppang • New Contributor III

05-17-2023 9:02:29 AM

6328 Views
1 replies
0 kudos

Resolved! Job cluster failed to start with custom docker image

Using Databricks Container Services, we have created two custom docker image: one based on nvidia/cuda:11.8.0-runtime-ubuntu22.04 and another based on databricksruntime/standard:12.2-LTS. In either case, we got this error with no specific diagnostics...

Data Engineering

6328 Views
1 replies
0 kudos

05-17-2023 9:02:29 AM

View Replies

Latest Reply

ppang
New Contributor III

05-20-2023 3:16:57 PM

0 kudos

I managed to get databricksruntime/standard:12.2-LTS to run in Databricks. However, for the CUDA image (nvidia/cuda:11.8.0-runtime-ubuntu22.04), I have only managed to get it to run with Databricks runtime 10.4 LTS. Does anyone know if Databricks run...

0 kudos

05-20-2023 3:16:57 PM

by B_J_Innov • New Contributor III

04-26-2023 3:15:09 AM

8242 Views
12 replies
0 kudos

Resolved! Can't use job cluster for scheduled jobs ADD_NODES_FAILED : Failed to add 9 containers to the cluster. Will attempt retry: false. Reason: Azure Quota Exceeded Exception

Hi everyone,I've been using my all purpose cluster for scheduled jobs and I've been told that it's a suboptimal thing to do and that using a job cluster for the scheduled jobs cuts costs by half.Unfortunately, when I tried to switch clusters on my ex...

Data Engineering

8242 Views
12 replies
0 kudos

04-26-2023 3:15:09 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

05-09-2023 8:55:22 AM

0 kudos

@Bassem Jaber If you are seeing same error then you need to increase quota, for that your azure plan should be changed from pay as you go to other plan. as pay-as-go azure model has limitations on quota increase

0 kudos

05-09-2023 8:55:22 AM

11 More Replies

by AmanSehgal • Honored Contributor III

12-15-2022 7:03:08 PM

20355 Views
6 replies
15 kudos

Job cluster vs All purpose cluster

Environment: AzureI've a workflow that takes approximately a minute to execute and I want to run the job every 2 minutes.. All purpose cluster:On attaching all purpose cluster to the job, it takes approx. 60 seconds to execute.Using job cluster:On at...

Data Engineering

20355 Views
6 replies
15 kudos

12-15-2022 7:03:08 PM

View Replies

Latest Reply

Priyag1
Honored Contributor II

05-02-2023 11:27:54 AM

15 kudos

Thanks for sharing

15 kudos

05-02-2023 11:27:54 AM

5 More Replies

by vinaykumar • New Contributor III

02-13-2023 10:06:47 PM

4883 Views
3 replies
1 kudos

Resolved! Run databricks job instantly without waiting job cluster get active

when we run databricks job it take some time to get job cluster active . I created pool also and attached with job cluster but still it take time to attached the cluster and job cluster get active to start the job run. is there any way - we can run d...

Data Engineering

4883 Views
3 replies
1 kudos

02-13-2023 10:06:47 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-14-2023 5:06:49 AM

1 kudos

If you want instant processing, you will have to have a cluster running all the time.As mentioned above, Databricks is testing serverless compute for data engineering workloads (comparable to serverless SQL). This fires up a cluster in a few seconds...

1 kudos

02-14-2023 5:06:49 AM

2 More Replies

by KVNARK • Honored Contributor II

01-16-2023 10:25:38 PM

3666 Views
4 replies
6 kudos

Resolved! How to parameterize key of spark config in the job clusterlinked service from ADF

how can we parameterize key of the spark-config in the job cluster linked service from Azure datafactory, we can parameterize the values but any idea how can we parameterize the key so that when deploying to further environment it takes the PROD/QA v...

Data Engineering

3666 Views
4 replies
6 kudos

01-16-2023 10:25:38 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-17-2023 12:07:20 AM

6 kudos

@KVNARK . You can use Databricks Secrets (create a Secret scope from AKV https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes) and then reference a secret in spark configuration (https://learn.microsoft.com/en-us/azure/d...

6 kudos

01-17-2023 12:07:20 AM

3 More Replies

by Phani1 • Valued Contributor II

01-12-2023 9:01:19 PM

3281 Views
2 replies
0 kudos

SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): or No more address space to create NIC within injected virtual network

Currently we are using an all-purpose compute cluster. When we tried to allocate the scheduled jobs to job cluster, we are blocked at the following error:SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): azure_error_code:SubnetIsFull,azure_error_message:No mo...

Data Engineering

3281 Views
2 replies
0 kudos

01-12-2023 9:01:19 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-12-2023 11:39:41 PM

0 kudos

Answering your questions - yes, your vnet/subnet is out of non-occupied IPs and this can be fixed by allocating more IPs to your network address space.Each cluster requires it's own IP, so if there are none available, it simply cannot start.

0 kudos

01-12-2023 11:39:41 PM

1 More Replies

by debanjan89 • New Contributor II

08-25-2022 12:03:55 AM

2940 Views
3 replies
2 kudos

How do we concatenate some fixed string with a secret value in Spark Config in Databricks Job Cluster?

Hi Team,I am trying to configure access to adls through Service Principal through Spark Config in Databricks job cluster. like, fs.azure.account.oauth2.client.id.<adls_account_name>.dfs.core.windows.net {{secrets/scopeName/clientID}}The above stateme...

Data Engineering

2940 Views
3 replies
2 kudos

08-25-2022 12:03:55 AM

View Replies

Latest Reply

Manimkm08
New Contributor III

01-04-2023 5:16:00 AM

2 kudos

@Kaniz Fatma We are blocked on this issue. Can you please look into the thread and give your suggestion to workaround it.

2 kudos

01-04-2023 5:16:00 AM

2 More Replies

by lawrence009 • Contributor

11-13-2022 2:59:53 PM

3362 Views
3 replies
7 kudos

Photon does not fully support the query because of dynamic pruning

Does it still make sense to run this job on a cluster with Photon enable when I am receiving the following?This is the code I ran:CREATE OR REPLACE TABLE ${tbl_name}_dups SELECT src.*, ROW_NUMBER() OVER ( PARTITION BY src.id ...

Data Engineering

3362 Views
3 replies
7 kudos

11-13-2022 2:59:53 PM

View Replies

Latest Reply

PriyaAnanthram
Contributor III

11-13-2022 4:04:17 PM

7 kudos

hmm could you show us what your query is

7 kudos

11-13-2022 4:04:17 PM

2 More Replies

by John_BardessGro • New Contributor II

10-21-2022 9:40:38 AM

6362 Views
2 replies
4 kudos

Cluster Reuse for delta live tables

I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...

Data Engineering

6362 Views
2 replies
4 kudos

10-21-2022 9:40:38 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-23-2022 2:19:34 PM

4 kudos

The same DLT job (workflow) will use the same cluster in development mode (shutdown in 2h) and new in production (shutdown 0). Although in JSON, you can manipulate that value:{ "configuration": { "pipelines.clusterShutdown.delay": "60s" } }Yo...

4 kudos

10-23-2022 2:19:34 PM

1 More Replies

by swetha • New Contributor III

09-29-2022 8:20:15 AM

6862 Views
2 replies
2 kudos

Databricks job cluster logs

I am using databricks job cluster for multitask jobs, when my job failed/succeeded I couldn't see any logs, Do I need to add any location in advanced options, cluster logging to see the logs for the failed/succeeded jobs or what it is and how it work...

Data Engineering

6862 Views
2 replies
2 kudos

09-29-2022 8:20:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

10-08-2022 11:53:53 PM

2 kudos

Hi @swetha kadiyala Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

2 kudos

10-08-2022 11:53:53 PM

1 More Replies

by Deepak_Kandpal • New Contributor III

09-13-2022 2:42:13 AM

6290 Views
3 replies
2 kudos

Resolved! Enable credential passthrough Option is not available in new UI for Job Cluster

Hi All,I am trying to add new workflow which require to use credential passthrough, but when I am trying to create new Job Cluster from Workflow -> Jobs -> My Job, the option of Enable credential passthrough is not available. Is there any other way t...

Data Engineering

6290 Views
3 replies
2 kudos

09-13-2022 2:42:13 AM

View Replies

Latest Reply

Rostislaw
New Contributor III

09-15-2022 12:25:54 PM

2 kudos

assuming your Excel file is located on ADLS you can add a service principal to the cluster configuration. see: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage#--access-azure-data-lake-storage-gen2-or-blob-stora...

2 kudos

09-15-2022 12:25:54 PM

2 More Replies

by pawelmitrus • Contributor

07-20-2022 3:57:55 AM

1634 Views
1 replies
1 kudos

Resolved! Shutting down a job cluster, when streaming is over

Hi,As for now we already know that our application will be running 24/7 streaming constantly incoming data. The stream pipeline is very basic, however as for now it's enough to run this pipeline 1x per day (to save the costs of constantly running clu...

Data Engineering

1634 Views
1 replies
1 kudos

07-20-2022 3:57:55 AM

View Replies

Latest Reply

Shasidhar_ES
Databricks Employee

07-20-2022 4:35:29 AM

1 kudos

Use .trigger(once=True) or .trigger(availableNow=True) option which will pick only the new files https://docs.databricks.com/structured-streaming/triggers.html#configuring-incremental-batch-processing

1 kudos

07-20-2022 4:35:29 AM