Databricks Community

477061 · ‎11-24-2022

I have set up a DLT with "testing" set as the target database. I need to join data that exists in a "keys" table in my "beta" database, but I get an AccessDeniedException, despite having full access to both databases via a normal notebook.

A snippet demonstrating the issue is below (Note that I don't appear to have the code snippet button available in Chrome):

The following would be data from the keys table:

```

key_df = spark.createDataFrame([[0, "State0"]], ['key', 'name'])

```

Then the DLT:

```

@dlt.table()

def dlt_test_data():

df = spark.createDataFrame([["40873", 0]], ['version', 'key'])

return df

@dlt.table()

def dlt_test_silver():

data_df = dlt.read("dlt_test_data")

key_df = enum_df = spark.table("beta.keys")

joined_df = data_df.join(key_df, "key")

return joined_df

```

I also tried importing the data in a view and reading from that in dlt_test_silver, but this also failed.

Is it possible to use other databases within a DLT pipeline? If so, how?

477061 · ‎01-18-2023

As an update to this issue: I was running the DLT pipeline on a personal cluster that had an instance profile defined (as per databricks best practises). As a result, the pipeline did not have permission to access other s3 resources (e.g other databases). The solution was to explicitly add the AWS permissions to the pipeline. To do this you need to edit the JSON settings of the pipeline, and add the "aws_attributes" key to the "clusters" definition (discussed here). The value for the "aws_attributes" key could be found by going to the cluster definition menu, clicking edit, then viewing the JSON of the cluster definition.

Apparently this issue would not have occurred had I been using a job cluster.

View solution in original post

-werners- · ‎11-24-2022

what exactly do you mean by 'database'? Actual RDBMS systems or databases in the Hive metastore (or Unity)?

477061 · ‎11-24-2022

Apologies, I'm new to Databricks. I mean databases in the Hive metastore

-werners- · ‎11-24-2022

no problem.

It should work, hive databases are only a semantic view on top of the actual files residing in your data lake.

Have you enabled table access control?

477061 · ‎11-24-2022

Table access control is currently disabled, and admin is refusing to enable it. Does this need to be enabled for the DLT pipeline to use other hive metastore databases?

-werners- · ‎11-24-2022

no, but it could be the issue if it were enabled.

what happens if you read the parquet (or delta lake or csv or ....) directly, without using the hive_metastore?

477061 · ‎11-28-2022

I have just tried to read from the parquet file directly and it fails with the same error.

-werners- · ‎11-28-2022

I guess it's one of those things that still have to be sorted out on DLT.

Anonymous · ‎11-25-2022

Right now - Yes, DLT only supports one target database. So if you need to push into different databases, then you may have two DLT pipelines.

https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-quickstart.html#requiremen...

hope this help cheer

477061 · ‎11-28-2022

Thanks, yes, I knew about the limitation on the target database. There does not appear to be any documentation on the limitation on source databases (for clarity, I want to pull from 2 source databases, but push to 1 target database). But I assume it comes to the same thing

477061 · ‎11-30-2022

Thanks, but no, I have not yet received an answer. Other customers have tried to help, but there is still no resolution.

477061 · ‎01-18-2023

As an update to this issue: I was running the DLT pipeline on a personal cluster that had an instance profile defined (as per databricks best practises). As a result, the pipeline did not have permission to access other s3 resources (e.g other databases). The solution was to explicitly add the AWS permissions to the pipeline. To do this you need to edit the JSON settings of the pipeline, and add the "aws_attributes" key to the "clusters" definition (discussed here). The value for the "aws_attributes" key could be found by going to the cluster definition menu, clicking edit, then viewing the JSON of the cluster definition.

Apparently this issue would not have occurred had I been using a job cluster.

Databricks Community

Is it possible to use other databases within Delta Live Tables (DLT)?

Join Us as a Local Community Builder!

🚀 Weekly Delta (1 - 7 October): A Look Back at This Week’s Top Community Highlights!

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions

Announcing Data Intelligence for Cybersecurity