How to apply Primary Key constraint in Delta Live Table?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 06:29 AM
In this blog I can see for dimension and fact tables, the primary key constraint has been applied. Following is the example:
-- Store dimension
CREATE OR REPLACE TABLE dim_store(
store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
business_key STRING,
name STRING,
email STRING,
city STRING,
address STRING,
phone_number STRING,
created_date TIMESTAMP,
updated_date TIMESTAMP,
start_at TIMESTAMP,
end_at TIMESTAMP
);
I want to apply the same for Delta Live Tables. Something like this:
CREATE OR REFRESH STREAMING LIVE TABLE dim_store(
store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
business_key STRING,
name STRING,
email STRING,
city STRING,
address STRING,
phone_number STRING,
created_date TIMESTAMP,
updated_date TIMESTAMP,
start_at TIMESTAMP,
end_at TIMESTAMP
);
However, when I run the Delta Live Pipeline. It is throwing following error:
Unsupported SQL statement for table 'dim_store': Missing query is not supported.
Can any one please help me and tell me how to apply Primary key constraint for Delta Live Table. I know Databricks does not support enforcement of the PK/FK relationship. However, I want the PK/FK constraints are for informational only.
Kindly help here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 06:54 AM
I don't think that it is possible (yet).
AFAIK you can only have expectations:
But DLT is pretty new, so it might get added later on
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 07:56 AM
Thanks for the reply Werners.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 12:59 PM
I second you. Only expectations are currently available in Delta live tables to maintain the data quality. We may expect other constarints in future releases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2022 01:41 AM
Delta Tables on Unity Catalog has PK/FK information (not enforced). Since DLT will support soon UC, I guess they will add this feature.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2023 06:25 AM
@SRK The documentation shows an example of how you can apply a PK constraint as an Expectation in DLT:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2023 08:52 AM
@SRK Please see a copy of this answer on stackoverflow here.
You can use DLT Expectations to have this check (see my previous answer if you're using SQL and not Python):
@dlt.table(
name="table1",
)
def create_df():
schema = T.StructType([
T.StructField("id", T.IntegerType(), True),
T.StructField("name", T.StringType(), True),
T.StructField("age", T.IntegerType(), True)])
data = [(1, "Alice", 25),
(1, "Bob", 30),
(3, "Charlie", 40)]
df = spark.createDataFrame(data, schema)
return df
@dlt.table(name="table2")
@dlt.expect("unique_pk", "num_entries = 1")
def create_df():
df = dlt.read("table1")
df = df.groupBy("id").count().withColumnRenamed("count","num_entries")
return df

