11-24-2022 12:49 AM
I have set up a DLT with "testing" set as the target database. I need to join data that exists in a "keys" table in my "beta" database, but I get an AccessDeniedException, despite having full access to both databases via a normal notebook.
A snippet demonstrating the issue is below (Note that I don't appear to have the code snippet button available in Chrome):
The following would be data from the keys table:
```
key_df = spark.createDataFrame([[0, "State0"]], ['key', 'name'])
```
Then the DLT:
```
@dlt.table()
def dlt_test_data():
df = spark.createDataFrame([["40873", 0]], ['version', 'key'])
return df
@dlt.table()
def dlt_test_silver():
data_df = dlt.read("dlt_test_data")
key_df = enum_df = spark.table("beta.keys")
joined_df = data_df.join(key_df, "key")
return joined_df
```
I also tried importing the data in a view and reading from that in dlt_test_silver, but this also failed.
Is it possible to use other databases within a DLT pipeline? If so, how?
01-18-2023 07:54 AM
As an update to this issue: I was running the DLT pipeline on a personal cluster that had an instance profile defined (as per databricks best practises). As a result, the pipeline did not have permission to access other s3 resources (e.g other databases). The solution was to explicitly add the AWS permissions to the pipeline. To do this you need to edit the JSON settings of the pipeline, and add the "aws_attributes" key to the "clusters" definition (discussed here). The value for the "aws_attributes" key could be found by going to the cluster definition menu, clicking edit, then viewing the JSON of the cluster definition.
Apparently this issue would not have occurred had I been using a job cluster.
11-24-2022 01:49 AM
what exactly do you mean by 'database'? Actual RDBMS systems or databases in the Hive metastore (or Unity)?
11-24-2022 01:55 AM
Apologies, I'm new to Databricks. I mean databases in the Hive metastore
11-24-2022 02:01 AM
no problem.
It should work, hive databases are only a semantic view on top of the actual files residing in your data lake.
Have you enabled table access control?
11-24-2022 06:05 AM
Table access control is currently disabled, and admin is refusing to enable it. Does this need to be enabled for the DLT pipeline to use other hive metastore databases?
11-24-2022 06:21 AM
no, but it could be the issue if it were enabled.
what happens if you read the parquet (or delta lake or csv or ....) directly, without using the hive_metastore?
11-28-2022 12:40 AM
I have just tried to read from the parquet file directly and it fails with the same error.
11-28-2022 12:42 AM
I guess it's one of those things that still have to be sorted out on DLT.
11-25-2022 08:27 PM
Right now - Yes, DLT only supports one target database. So if you need to push into different databases, then you may have two DLT pipelines.
hope this help cheer
11-28-2022 12:39 AM
Thanks, yes, I knew about the limitation on the target database. There does not appear to be any documentation on the limitation on source databases (for clarity, I want to pull from 2 source databases, but push to 1 target database). But I assume it comes to the same thing
11-30-2022 04:56 AM
Thanks, but no, I have not yet received an answer. Other customers have tried to help, but there is still no resolution.
01-18-2023 07:54 AM
As an update to this issue: I was running the DLT pipeline on a personal cluster that had an instance profile defined (as per databricks best practises). As a result, the pipeline did not have permission to access other s3 resources (e.g other databases). The solution was to explicitly add the AWS permissions to the pipeline. To do this you need to edit the JSON settings of the pipeline, and add the "aws_attributes" key to the "clusters" definition (discussed here). The value for the "aws_attributes" key could be found by going to the cluster definition menu, clicking edit, then viewing the JSON of the cluster definition.
Apparently this issue would not have occurred had I been using a job cluster.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group