Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{
"task_key": "my_task",
"depends_on": [
{
"task_key": "<...>...
Hey @Eugene Bikkinin Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...
Hello,This is question on our platform with `Databricks Runtime 11.3 LTS`.I'm running a Job with multiple tasks in // using a shared cluster.Each task runs a dedicated scala class within a JAR library attached as a dependency.One of the task fails (c...
Hi,This actually should not be marked as solved. We are having the same problem, whenever a Shared Job Cluster crashes for some reason (generally OoM), all tasks will start failing until eternity, with the error message as described above. This is ac...
Hi @Naga Vaibhav Elluru We haven't heard from you since the last response from @Debayan Mukherjee , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be he...
Using Databricks Container Services, we have created two custom docker image: one based on nvidia/cuda:11.8.0-runtime-ubuntu22.04 and another based on databricksruntime/standard:12.2-LTS. In either case, we got this error with no specific diagnostics...
I managed to get databricksruntime/standard:12.2-LTS to run in Databricks. However, for the CUDA image (nvidia/cuda:11.8.0-runtime-ubuntu22.04), I have only managed to get it to run with Databricks runtime 10.4 LTS. Does anyone know if Databricks run...
Hi everyone,I've been using my all purpose cluster for scheduled jobs and I've been told that it's a suboptimal thing to do and that using a job cluster for the scheduled jobs cuts costs by half.Unfortunately, when I tried to switch clusters on my ex...
@Bassem Jaber If you are seeing same error then you need to increase quota, for that your azure plan should be changed from pay as you go to other plan. as pay-as-go azure model has limitations on quota increase
Environment: AzureI've a workflow that takes approximately a minute to execute and I want to run the job every 2 minutes.. All purpose cluster:On attaching all purpose cluster to the job, it takes approx. 60 seconds to execute.Using job cluster:On at...
when we run databricks job it take some time to get job cluster active . I created pool also and attached with job cluster but still it take time to attached the cluster and job cluster get active to start the job run. is there any way - we can run d...
If you want instant processing, you will have to have a cluster running all the time.As mentioned above, Databricks is testing serverless compute for data engineering workloads (comparable to serverless SQL). This fires up a cluster in a few seconds...
how can we parameterize key of the spark-config in the job cluster linked service from Azure datafactory, we can parameterize the values but any idea how can we parameterize the key so that when deploying to further environment it takes the PROD/QA v...
@KVNARK . You can use Databricks Secrets (create a Secret scope from AKV https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes) and then reference a secret in spark configuration (https://learn.microsoft.com/en-us/azure/d...
Currently we are using an all-purpose compute cluster. When we tried to allocate the scheduled jobs to job cluster, we are blocked at the following error:SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): azure_error_code:SubnetIsFull,azure_error_message:No mo...
Answering your questions - yes, your vnet/subnet is out of non-occupied IPs and this can be fixed by allocating more IPs to your network address space.Each cluster requires it's own IP, so if there are none available, it simply cannot start.
Hi Team,I am trying to configure access to adls through Service Principal through Spark Config in Databricks job cluster. like, fs.azure.account.oauth2.client.id.<adls_account_name>.dfs.core.windows.net {{secrets/scopeName/clientID}}The above stateme...
Does it still make sense to run this job on a cluster with Photon enable when I am receiving the following?This is the code I ran:CREATE OR REPLACE TABLE ${tbl_name}_dups
SELECT src.*,
ROW_NUMBER() OVER (
PARTITION BY src.id
...
I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...
The same DLT job (workflow) will use the same cluster in development mode (shutdown in 2h) and new in production (shutdown 0). Although in JSON, you can manipulate that value:{
"configuration": {
"pipelines.clusterShutdown.delay": "60s"
}
}Yo...
I am using databricks job cluster for multitask jobs, when my job failed/succeeded I couldn't see any logs, Do I need to add any location in advanced options, cluster logging to see the logs for the failed/succeeded jobs or what it is and how it work...
Hi @swetha kadiyala Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...
Hi All,I am trying to add new workflow which require to use credential passthrough, but when I am trying to create new Job Cluster from Workflow -> Jobs -> My Job, the option of Enable credential passthrough is not available. Is there any other way t...
assuming your Excel file is located on ADLS you can add a service principal to the cluster configuration. see: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage#--access-azure-data-lake-storage-gen2-or-blob-stora...
Hi,As for now we already know that our application will be running 24/7 streaming constantly incoming data. The stream pipeline is very basic, however as for now it's enough to run this pipeline 1x per day (to save the costs of constantly running clu...
Use .trigger(once=True) or .trigger(availableNow=True) option which will pick only the new files https://docs.databricks.com/structured-streaming/triggers.html#configuring-incremental-batch-processing