cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT piplines with UC

ws4100e
New Contributor III

I try to run a (very simple) DLT pipeline in with a resulting materialized table is published in UC schema with a managed storage location defined (within an existing EXTERNAL LOCATION). Accoding to the documentation: 

  • Publishing to schemas that specify a managed storage location is supported only in the preview channel.

So - with no surprises - running this pipeline in the current channel ends with:

com.databricks.pipelines.common.CustomException: [DLT ERROR CODE: EXECUTION_SERVICE_STARTUP_FAILURE] 
Schemas with specified storage locations are not currently supported for UC enabled pipelines.

 But when I switch to the preview channel, it does not work either, but this time the error is different:

com.databricks.pipelines.common.CustomException: [DLT ERROR CODE: EXECUTION_SERVICE_STARTUP_FAILURE] 
Schema tstexternal has a specified storage location that does not match the pipeline's root location: None

Is there a chance for it to make it working? Can I setup "the pipeline's root location" somehow?

Any help welcome.

Thx

Pawel

9 REPLIES 9

data-engineer-d
Contributor

@ws4100e Did you select the target Catalog and Schema from the pipeline settings?
For persisting on UC managed schemas, currently we need to specify schema and select catalog. 

Yes, this is exactly what I did - I selected a catalog and a schema (with a storage location assigned) within it.

data-engineer-d
Contributor

I was receiving the same error however it was resolved after selecting right schema and right permissions while creating fresh pipeline. 

Can you please share the code writing the table?

ws4100e
New Contributor III

I try this for the simplest possible pipeline - I use Autoloader to read CSV files from Azure storage. I've got sufficient privileges on selected schema.

 

CREATE OR REFRESH STREAMING LIVE TABLE employees_raw_el
AS SELECT *
FROM cloud_files(
  "abfss://container1@sa12345678.dfs.core.windows.net/csvs/",
  "csv",
  map(
    "header", "true",    
    "delimiter", ";",
    "inferSchema", "true"
  )
);

CREATE OR REFRESH LIVE TABLE employees_bronze_el
COMMENT "Test external location"
TBLPROPERTIES ("table.usage" = "tests")
AS SELECT *
  FROM live.employees_raw_el;

 

DataGeek_JT
New Contributor II

Did this get resolved?  I am getting the same issue.

ws4100e
New Contributor III

Unfortunately not :(. 

I was facing the same issue and was able to solve the problem by selecting channel as 'preview' instead of 'current'

ws4100e
New Contributor III

As I mentioned above - I use preview channel, so it must be something else ๐Ÿ˜•

Imran_A
New Contributor II

Difference between the Preview and Current Channel?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group