06-30-2025 09:07 AM - edited 06-30-2025 09:26 AM
I'm using the Databricks Lakehouse Monitoring API to enable monitoring across every table in a catalog. I wrote a script that loops through all schemas and tables and calls the create_monitor API for each one. However, when running the script from a notebook, I consistently get a Timed out after 0:05:00 error.
It seems like enabling monitors sequentially for a large number of tables is exceeding the execution timeout, especially if each API call takes a few seconds.
Questions:
Is there a recommended way to avoid this timeout when enabling monitors at scale?
Should I implement parallelism or batching in the script?
Is there a way to increase the execution timeout in a Databricks notebook?
Any guidance or best practices would be appreciated!
07-01-2025 05:10 AM
ThreadPoolExecutor
) to enable multiple monitors at once.python
spark.conf.set("spark.databricks.execution.timeout", "18000") # seconds
- Not all compute types honor this setting. Serverless and job clusters are most likely to support overrides. - For SQL Warehouses, adjust the STATEMENT_TIMEOUT
parameter.PENDING_UPDATE
or rate limit responses; implement retry logic if encountered. - Increase concurrency cautiously, watching for API throttling or backend queuing.Problem Area | Solution/Best Practice |
---|---|
Timeouts (notebook/script) | Increase execution timeout if possible |
Bulk/slow enablement | Use batching & parallelism (reasonable limits) |
Operational scale | Run as a Job, or orchestrate externally |
API throttling/errors | Implement retry/error handling |
Efficiency | Enable CDF, use proper profiles, latest SDK |
``
_Adjust
maxworkers` based on observed performance and API constraints.07-01-2025 05:10 AM
ThreadPoolExecutor
) to enable multiple monitors at once.python
spark.conf.set("spark.databricks.execution.timeout", "18000") # seconds
- Not all compute types honor this setting. Serverless and job clusters are most likely to support overrides. - For SQL Warehouses, adjust the STATEMENT_TIMEOUT
parameter.PENDING_UPDATE
or rate limit responses; implement retry logic if encountered. - Increase concurrency cautiously, watching for API throttling or backend queuing.Problem Area | Solution/Best Practice |
---|---|
Timeouts (notebook/script) | Increase execution timeout if possible |
Bulk/slow enablement | Use batching & parallelism (reasonable limits) |
Operational scale | Run as a Job, or orchestrate externally |
API throttling/errors | Implement retry/error handling |
Efficiency | Enable CDF, use proper profiles, latest SDK |
``
_Adjust
maxworkers` based on observed performance and API constraints.07-02-2025 02:41 AM
@BigRoux , Thank you this is very well explained and it is really helpful.
07-01-2025 06:54 AM
Thank you, this is really helpful!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now