cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use delta live table with google cloud storage

shrutis23
New Contributor III

Hi Team

I have been working on a POC exploring delta live table with GCS location.

I have some doubts :

  1. how to access the gcs bucket. We have connection established using databricks service account. In a normal cluster creation , we go to cluster page and under `Advanced Options` we provide databricks service account email. For delta live table as the cluster creation is not under our control , how to add this email to cluster to make it accessible .
  2. The gcp documentation needs to be updated. The path referenced are for azure. https://docs.gcp.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html. Please any documentation that you may have for accessing gcp paths.
  3. Essentially , I want to store the delta live table data which is make the storage location gcs bucket location . Is this possible, right now using default settings the data is getting stored under dbfs location.

Thanks

4 REPLIES 4

karthik_p
Esteemed Contributor

@Shruti S​ please try below QuickStart Delta Live Tables QuickStart | Databricks on Google Cloud

during creating your DLT pipeline using workflows provide storage location for output data image

shrutis23
New Contributor III

I have tried that but I received error

DataPlaneException: Failed to start the DLT service on cluster <cluster_id>. Please check the stack trace below or driver logs for more details.
 
com.databricks.pipelines.execution.service.EventLogInitializationException: Failed to initialize event log
 
java.io.IOException: Error accessing gs://<path>
 
shaded.databricks.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
 
GET https://storage.googleapis.com/storage/v1/b/<path>?fields=bucket,name,timeCreated,updated,generation...
 
{
 
  "code" : 403,
 
  "errors" : [ {
 
    "domain" : "global",
 
    "message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
 
    "reason" : "forbidden"
 
  } ],
 
  "message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist)."
 
}
 

Alternatively , I also tried to edit the delta live table cluster from UI by adding the service account sa under google service account block. Save Cluster failed with

Error : Dlt prefixed spark images cannot be used outside of Delta live tables service

shrutis23
New Contributor III

Can I get any update ?

Also had a follow-up question we are not observing permissions button in pipeline details page. We were wondering if this is something to do with the pricing tier we are using.

Senthil1
Contributor

Kindly mount the DBFS location to GCS cloud storage, see below

Mounting cloud object storage on Databricks | Databricks on Google Cloud

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group