cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

xneg
by Contributor
  • 13378 Views
  • 12 replies
  • 9 kudos

PyPI library sometimes doesn't install during workflow execution

I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{ "task_key": "my_task", "depends_on": [ { "task_key": "<...>...

  • 13378 Views
  • 12 replies
  • 9 kudos
Latest Reply
Vartika
Databricks Employee
  • 9 kudos

Hey @Eugene Bikkinin​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

  • 9 kudos
11 More Replies
Ludo
by New Contributor III
  • 5929 Views
  • 7 replies
  • 2 kudos

Resolved! Jobs with multi-tasking are failing to retry; how to fix this issue?

Hello,This is question on our platform with `Databricks Runtime 11.3 LTS`.I'm running a Job with multiple tasks in // using a shared cluster.Each task runs a dedicated scala class within a JAR library attached as a dependency.One of the task fails (c...

  • 5929 Views
  • 7 replies
  • 2 kudos
Latest Reply
YoshiCoppens61
New Contributor II
  • 2 kudos

Hi,This actually should not be marked as solved. We are having the same problem, whenever a Shared Job Cluster crashes for some reason (generally OoM), all tasks will start failing until eternity, with the error message as described above. This is ac...

  • 2 kudos
6 More Replies
naga_databricks
by Contributor
  • 3136 Views
  • 2 replies
  • 1 kudos

Using Init scripts using DBX

I specify init scripts in my deployment.conf, as below: basic-static-cluster: &basic-static-cluster new_cluster: spark_version: "13.0.x-scala2.12" num_workers: 1 node_type_id: "n2-highmem-2" init_scripts: - worksp...

  • 3136 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Naga Vaibhav Elluru​ We haven't heard from you since the last response from @Debayan Mukherjee​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be he...

  • 1 kudos
1 More Replies
ppang
by New Contributor III
  • 6068 Views
  • 1 replies
  • 0 kudos

Resolved! Job cluster failed to start with custom docker image

Using Databricks Container Services, we have created two custom docker image: one based on nvidia/cuda:11.8.0-runtime-ubuntu22.04 and another based on databricksruntime/standard:12.2-LTS. In either case, we got this error with no specific diagnostics...

  • 6068 Views
  • 1 replies
  • 0 kudos
Latest Reply
ppang
New Contributor III
  • 0 kudos

I managed to get databricksruntime/standard:12.2-LTS to run in Databricks. However, for the CUDA image (nvidia/cuda:11.8.0-runtime-ubuntu22.04), I have only managed to get it to run with Databricks runtime 10.4 LTS. Does anyone know if Databricks run...

  • 0 kudos
B_J_Innov
by New Contributor III
  • 7377 Views
  • 12 replies
  • 0 kudos

Resolved! Can't use job cluster for scheduled jobs ADD_NODES_FAILED : Failed to add 9 containers to the cluster. Will attempt retry: false. Reason: Azure Quota Exceeded Exception

Hi everyone,I've been using my all purpose cluster for scheduled jobs and I've been told that it's a suboptimal thing to do and that using a job cluster for the scheduled jobs cuts costs by half.Unfortunately, when I tried to switch clusters on my ex...

  • 7377 Views
  • 12 replies
  • 0 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 0 kudos

@Bassem Jaber​ If you are seeing same error then you need to increase quota, for that your azure plan should be changed from pay as you go to other plan. as pay-as-go azure model has limitations on quota increase

  • 0 kudos
11 More Replies
AmanSehgal
by Honored Contributor III
  • 17579 Views
  • 6 replies
  • 15 kudos

Job cluster vs All purpose cluster

Environment: AzureI've a workflow that takes approximately a minute to execute and I want to run the job every 2 minutes.. All purpose cluster:On attaching all purpose cluster to the job, it takes approx. 60 seconds to execute.Using job cluster:On at...

  • 17579 Views
  • 6 replies
  • 15 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 15 kudos

Thanks for sharing

  • 15 kudos
5 More Replies
vinaykumar
by New Contributor III
  • 4345 Views
  • 3 replies
  • 1 kudos

Resolved! Run databricks job instantly without waiting job cluster get active

when we run databricks job it take some time to get job cluster active . I created pool also and attached with job cluster but still it take time to attached the cluster and job cluster get active to start the job run. is there any way - we can run d...

  • 4345 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

If you want instant processing, you will have to have a cluster running all the time.As mentioned above, Databricks is testing serverless compute for data engineering workloads (comparable to serverless SQL). This fires up a cluster in a few seconds...

  • 1 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 3244 Views
  • 4 replies
  • 6 kudos

Resolved! How to parameterize key of spark config in the job clusterlinked service from ADF

how can we parameterize key of the spark-config in the job cluster linked service from Azure datafactory, we can parameterize the values but any idea how can we parameterize the key so that when deploying to further environment it takes the PROD/QA v...

  • 3244 Views
  • 4 replies
  • 6 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 6 kudos

@KVNARK .​ You can use Databricks Secrets (create a Secret scope from AKV https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes) and then reference a secret in spark configuration (https://learn.microsoft.com/en-us/azure/d...

  • 6 kudos
3 More Replies
Phani1
by Valued Contributor II
  • 2816 Views
  • 2 replies
  • 0 kudos

SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): or No more address space to create NIC within injected virtual network

Currently we are using an all-purpose compute cluster. When we tried to allocate the scheduled jobs to job cluster, we are blocked at the following error:SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): azure_error_code:SubnetIsFull,azure_error_message:No mo...

  • 2816 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

Answering your questions - yes, your vnet/subnet is out of non-occupied IPs and this can be fixed by allocating more IPs to your network address space.Each cluster requires it's own IP, so if there are none available, it simply cannot start.

  • 0 kudos
1 More Replies
debanjan89
by New Contributor II
  • 2705 Views
  • 3 replies
  • 2 kudos

How do we concatenate some fixed string with a secret value in Spark Config in Databricks Job Cluster?

Hi Team,I am trying to configure access to adls through Service Principal through Spark Config in Databricks job cluster. like, fs.azure.account.oauth2.client.id.<adls_account_name>.dfs.core.windows.net {{secrets/scopeName/clientID}}The above stateme...

  • 2705 Views
  • 3 replies
  • 2 kudos
Latest Reply
Manimkm08
New Contributor III
  • 2 kudos

@Kaniz Fatma​ We are blocked on this issue. Can you please look into the thread and give your suggestion to workaround it.

  • 2 kudos
2 More Replies
lawrence009
by Contributor
  • 2940 Views
  • 3 replies
  • 7 kudos

Photon does not fully support the query because of dynamic pruning

Does it still make sense to run this job on a cluster with Photon enable when I am receiving the following?This is the code I ran:CREATE OR REPLACE TABLE ${tbl_name}_dups SELECT src.*, ROW_NUMBER() OVER ( PARTITION BY src.id ...

  • 2940 Views
  • 3 replies
  • 7 kudos
Latest Reply
PriyaAnanthram
Contributor III
  • 7 kudos

hmm could you show us what your query is

  • 7 kudos
2 More Replies
John_BardessGro
by New Contributor II
  • 6036 Views
  • 2 replies
  • 4 kudos

Cluster Reuse for delta live tables

I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...

  • 6036 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

The same DLT job (workflow) will use the same cluster in development mode (shutdown in 2h) and new in production (shutdown 0). Although in JSON, you can manipulate that value:{ "configuration": { "pipelines.clusterShutdown.delay": "60s" } }Yo...

  • 4 kudos
1 More Replies
swetha
by New Contributor III
  • 5780 Views
  • 2 replies
  • 2 kudos

Databricks job cluster logs

I am using databricks job cluster for multitask jobs, when my job failed/succeeded I couldn't see any logs, Do I need to add any location in advanced options, cluster logging to see the logs for the failed/succeeded jobs or what it is and how it work...

  • 5780 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @swetha kadiyala​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
1 More Replies
Deepak_Kandpal
by New Contributor III
  • 5100 Views
  • 3 replies
  • 2 kudos

Resolved! Enable credential passthrough Option is not available in new UI for Job Cluster

Hi All,I am trying to add new workflow which require to use credential passthrough, but when I am trying to create new Job Cluster from Workflow -> Jobs -> My Job, the option of Enable credential passthrough is not available. Is there any other way t...

image
  • 5100 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rostislaw
New Contributor III
  • 2 kudos

assuming your Excel file is located on ADLS you can add a service principal to the cluster configuration. see: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage#--access-azure-data-lake-storage-gen2-or-blob-stora...

  • 2 kudos
2 More Replies
pawelmitrus
by Contributor
  • 1464 Views
  • 1 replies
  • 1 kudos

Resolved! Shutting down a job cluster, when streaming is over

Hi,As for now we already know that our application will be running 24/7 streaming constantly incoming data. The stream pipeline is very basic, however as for now it's enough to run this pipeline 1x per day (to save the costs of constantly running clu...

  • 1464 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shasidhar_ES
Databricks Employee
  • 1 kudos

Use .trigger(once=True) or .trigger(availableNow=True) option which will pick only the new files https://docs.databricks.com/structured-streaming/triggers.html#configuring-incremental-batch-processing

  • 1 kudos
Labels