I am running this Delta Live Tables PoC from databricks-industry-solutions/industry-solutions-blueprints
https://github.com/databricks-industry-solutions/pos-dlt
I have Standard_DS4_v2 with 28GB and 8 cores x 2 workers - so a total of 16 cores. This is more than what Databricks recommends for this PoC (Standard_DS3_v2 with 4 cores x 2 workers).
There are 5 notebooks in the PoC. I have run Notebook 1 and 2 individually without errors.
However when I execute the entire PoC from the RUNME.py orchestrator, it fails on Notebook 2 with this error:
Unexpected failure while waiting for the cluster (0608-160733-23lp3pel) to be ready: Cluster 0608-160733-23lp3pel is in unexpected state Terminated: AZURE_QUOTA_EXCEEDED_EXCEPTION (CLIENT_ERROR): azure_error_code:QuotaExceeded, azure_error_message: Operation could not be completed as it results in exceeding approved standardDSv2Family Cores quota. Additional details - Deployment Model: Resource Manager, Location: eastus, Current Limit: 32, Current Usage: 24, Additional Required: 16, (Minimum) New Limit Required: 40. Submit a request for Quota increase at https://aka.ms/ProdportalCRP/#blade/Microsoft_Azure_Capacity/UsageAndQuota.ReactView/Parameters/%7B%...<......long url parameters list>...................
I have tried increasing the quota, but same error, so I dont think it is the Quota increase. There seems always a need for more and more vCPUs. This is the Inner exepction:
AssertionError: Job Run failed:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
File <command-2270064480735889>:3
1 dbutils.widgets.dropdown("run_job", "False", ["True", "False"])
2 run_job = dbutils.widgets.get("run_job") == "True"
----> 3 NotebookSolutionCompanion().deploy_compute(job_json, run_job=run_job)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-18c26647-04cb-4378-9a4a-92c4877cbe84/lib/python3.10/site-packages/solacc/companion/__init__.py:232, in NotebookSolutionCompanion.deploy_compute(self, input_json, run_job, wait)
230 self.install_libraries(jcid, jcl)
231 else:
--> 232 self.run_job()
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-18c26647-04cb-4378-9a4a-92c4877cbe84/lib/python3.10/site-packages/solacc/companion/__init__.py:349, in NotebookSolutionCompanion.run_job(self)
347 print("-" * 80)
348 print(f"#job/{self.job_id}/run/{self.run_id} is {self.life_cycle_state} - {self.test_result_state}")
--> 349 assert self.test_result_state == "SUCCESS", f"Job Run failed: please investigate at: {self.workspace_url}#job/{self.job_id}/run/{self.run_id}"
---------------------------------------------------------------------------
Any clues? Any of these reasons?
https://kb.databricks.com/clusters/cluster-failed-launch