cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it possible to use other databases within Delta Live Tables (DLT)?

477061
Contributor

I have set up a DLT with "testing" set as the target database. I need to join data that exists in a "keys" table in my "beta" database, but I get an AccessDeniedException, despite having full access to both databases via a normal notebook.

A snippet demonstrating the issue is below (Note that I don't appear to have the code snippet button available in Chrome):

The following would be data from the keys table:

```

key_df = spark.createDataFrame([[0, "State0"]], ['key', 'name'])

```

Then the DLT:

```

@dlt.table()

def dlt_test_data():

df = spark.createDataFrame([["40873", 0]], ['version', 'key'])

return df

@dlt.table()

def dlt_test_silver():

data_df = dlt.read("dlt_test_data")

key_df = enum_df = spark.table("beta.keys")

joined_df = data_df.join(key_df, "key")

return joined_df

```

I also tried importing the data in a view and reading from that in dlt_test_silver, but this also failed.

Is it possible to use other databases within a DLT pipeline? If so, how?

1 ACCEPTED SOLUTION

Accepted Solutions

477061
Contributor

As an update to this issue: I was running the DLT pipeline on a personal cluster that had an instance profile defined (as per databricks best practises). As a result, the pipeline did not have permission to access other s3 resources (e.g other databases). The solution was to explicitly add the AWS permissions to the pipeline. To do this you need to edit the JSON settings of the pipeline, and add the "aws_attributes" key to the "clusters" definition (discussed here). The value for the "aws_attributes" key could be found by going to the cluster definition menu, clicking edit, then viewing the JSON of the cluster definition.

Apparently this issue would not have occurred had I been using a job cluster.

View solution in original post

12 REPLIES 12

-werners-
Esteemed Contributor III

what exactly do you mean by 'database'? Actual RDBMS systems or databases in the Hive metastore (or Unity)?

477061
Contributor

Apologies, I'm new to Databricks. I mean databases in the Hive metastore

-werners-
Esteemed Contributor III

no problem.

It should work, hive databases are only a semantic view on top of the actual files residing in your data lake.

Have you enabled table access control?

477061
Contributor

Table access control is currently disabled, and admin is refusing to enable it. Does this need to be enabled for the DLT pipeline to use other hive metastore databases?

-werners-
Esteemed Contributor III

no, but it could be the issue if it were enabled.

what happens if you read the parquet (or delta lake or csv or ....) directly, without using the hive_metastore?

I have just tried to read from the parquet file directly and it fails with the same error.

-werners-
Esteemed Contributor III

I guess it's one of those things that still have to be sorted out on DLT.

Anonymous
Not applicable

Right now - Yes, DLT only supports one target database. So if you need to push into different databases, then you may have two DLT pipelines.

https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-quickstart.html#requiremen...

hope this help cheer

Thanks, yes, I knew about the limitation on the target database. There does not appear to be any documentation on the limitation on source databases (for clarity, I want to pull from 2 source databases, but push to 1 target database). But I assume it comes to the same thing

Kaniz
Community Manager
Community Manager

Hi @julie.holtzhausen​, We haven’t heard from you since the last response from @Werner Stinckens​, @Alex 1283​​​​, and I was checking back to see if their suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Thanks, but no, I have not yet received an answer. Other customers have tried to help, but there is still no resolution.

477061
Contributor

As an update to this issue: I was running the DLT pipeline on a personal cluster that had an instance profile defined (as per databricks best practises). As a result, the pipeline did not have permission to access other s3 resources (e.g other databases). The solution was to explicitly add the AWS permissions to the pipeline. To do this you need to edit the JSON settings of the pipeline, and add the "aws_attributes" key to the "clusters" definition (discussed here). The value for the "aws_attributes" key could be found by going to the cluster definition menu, clicking edit, then viewing the JSON of the cluster definition.

Apparently this issue would not have occurred had I been using a job cluster.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.