cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use delta live table with google cloud storage

shrutis23
New Contributor III

Hi Team

I have been working on a POC exploring delta live table with GCS location.

I have some doubts :

  1. how to access the gcs bucket. We have connection established using databricks service account. In a normal cluster creation , we go to cluster page and under `Advanced Options` we provide databricks service account email. For delta live table as the cluster creation is not under our control , how to add this email to cluster to make it accessible .
  2. The gcp documentation needs to be updated. The path referenced are for azure. https://docs.gcp.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html. Please any documentation that you may have for accessing gcp paths.
  3. Essentially , I want to store the delta live table data which is make the storage location gcs bucket location . Is this possible, right now using default settings the data is getting stored under dbfs location.

Thanks

5 REPLIES 5

karthik_p
Esteemed Contributor

@Shruti S​ please try below QuickStart Delta Live Tables QuickStart | Databricks on Google Cloud

during creating your DLT pipeline using workflows provide storage location for output data image

shrutis23
New Contributor III

I have tried that but I received error

DataPlaneException: Failed to start the DLT service on cluster <cluster_id>. Please check the stack trace below or driver logs for more details.
 
com.databricks.pipelines.execution.service.EventLogInitializationException: Failed to initialize event log
 
java.io.IOException: Error accessing gs://<path>
 
shaded.databricks.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
 
GET https://storage.googleapis.com/storage/v1/b/<path>?fields=bucket,name,timeCreated,updated,generation...
 
{
 
  "code" : 403,
 
  "errors" : [ {
 
    "domain" : "global",
 
    "message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
 
    "reason" : "forbidden"
 
  } ],
 
  "message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist)."
 
}
 

Alternatively , I also tried to edit the delta live table cluster from UI by adding the service account sa under google service account block. Save Cluster failed with

Error : Dlt prefixed spark images cannot be used outside of Delta live tables service

Kaniz
Community Manager
Community Manager

Hi @Shruti S​, I hope this S.O thread helps resolve your issue. Please LMK if that helps.

shrutis23
New Contributor III

Can I get any update ?

Also had a follow-up question we are not observing permissions button in pipeline details page. We were wondering if this is something to do with the pricing tier we are using.

Senthil1
Contributor

Kindly mount the DBFS location to GCS cloud storage, see below

Mounting cloud object storage on Databricks | Databricks on Google Cloud

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.