How to use delta live table with google cloud storage
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2022 10:31 AM
Hi Team
I have been working on a POC exploring delta live table with GCS location.
I have some doubts :
- how to access the gcs bucket. We have connection established using databricks service account. In a normal cluster creation , we go to cluster page and under `Advanced Options` we provide databricks service account email. For delta live table as the cluster creation is not under our control , how to add this email to cluster to make it accessible .
- The gcp documentation needs to be updated. The path referenced are for azure. https://docs.gcp.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html. Please any documentation that you may have for accessing gcp paths.
- Essentially , I want to store the delta live table data which is make the storage location gcs bucket location . Is this possible, right now using default settings the data is getting stored under dbfs location.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2022 01:24 PM
@Shruti S please try below QuickStart Delta Live Tables QuickStart | Databricks on Google Cloud
during creating your DLT pipeline using workflows provide storage location for output data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2022 08:48 PM
I have tried that but I received error
DataPlaneException: Failed to start the DLT service on cluster <cluster_id>. Please check the stack trace below or driver logs for more details.
com.databricks.pipelines.execution.service.EventLogInitializationException: Failed to initialize event log
java.io.IOException: Error accessing gs://<path>
shaded.databricks.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
GET https://storage.googleapis.com/storage/v1/b/<path>?fields=bucket,name,timeCreated,updated,generation...
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
"reason" : "forbidden"
} ],
"message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist)."
}
Alternatively , I also tried to edit the delta live table cluster from UI by adding the service account sa under google service account block. Save Cluster failed with
Error : Dlt prefixed spark images cannot be used outside of Delta live tables service
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2022 09:10 AM
Can I get any update ?
Also had a follow-up question we are not observing permissions button in pipeline details page. We were wondering if this is something to do with the pricing tier we are using.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-30-2022 04:53 AM
Kindly mount the DBFS location to GCS cloud storage, see below
Mounting cloud object storage on Databricks | Databricks on Google Cloud

