- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-24-2025 02:39 PM - edited 04-24-2025 02:42 PM
More information is needed for effective troubleshooting 😉
How did you establish that the issue is not the cluster start-up time but delays in a Pub/Sub subscription?
What is your ingestion schedule?
What are your Pub/Sub connector options?
Please share your code that configures a read from Pub/Sub, if you can.
Have you checked out the streaming metrics?
Are there any errors or warning related to the pipeline or Pub/Sub in Google Cloud Logging?
GCP Audit Logs for Pub/Sub can be configured to log timestamps of read operations which can be cross-correlated with Databricks logs, if need be.
---
The last time I run a DLT pipeline on schedule on my GCP infra it took about 15 minutes to provision a Databricks compute cluster (a GKE NodePool, effectively).
I hope this helps!