12-08-2022 06:29 AM
In this blog I can see for dimension and fact tables, the primary key constraint has been applied. Following is the example:
-- Store dimension
CREATE OR REPLACE TABLE dim_store(
store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
business_key STRING,
name STRING,
email STRING,
city STRING,
address STRING,
phone_number STRING,
created_date TIMESTAMP,
updated_date TIMESTAMP,
start_at TIMESTAMP,
end_at TIMESTAMP
);
I want to apply the same for Delta Live Tables. Something like this:
CREATE OR REFRESH STREAMING LIVE TABLE dim_store(
store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
business_key STRING,
name STRING,
email STRING,
city STRING,
address STRING,
phone_number STRING,
created_date TIMESTAMP,
updated_date TIMESTAMP,
start_at TIMESTAMP,
end_at TIMESTAMP
);
However, when I run the Delta Live Pipeline. It is throwing following error:
Unsupported SQL statement for table 'dim_store': Missing query is not supported.
Can any one please help me and tell me how to apply Primary key constraint for Delta Live Table. I know Databricks does not support enforcement of the PK/FK relationship. However, I want the PK/FK constraints are for informational only.
Kindly help here.
12-08-2022 06:54 AM
I don't think that it is possible (yet).
AFAIK you can only have expectations:
But DLT is pretty new, so it might get added later on
12-08-2022 07:56 AM
Thanks for the reply Werners.
12-08-2022 12:59 PM
I second you. Only expectations are currently available in Delta live tables to maintain the data quality. We may expect other constarints in future releases.
12-09-2022 01:41 AM
Delta Tables on Unity Catalog has PK/FK information (not enforced). Since DLT will support soon UC, I guess they will add this feature.
10-24-2023 06:25 AM
@SRK The documentation shows an example of how you can apply a PK constraint as an Expectation in DLT:
10-24-2023 08:52 AM
@SRK Please see a copy of this answer on stackoverflow here.
You can use DLT Expectations to have this check (see my previous answer if you're using SQL and not Python):
@dlt.table(
name="table1",
)
def create_df():
schema = T.StructType([
T.StructField("id", T.IntegerType(), True),
T.StructField("name", T.StringType(), True),
T.StructField("age", T.IntegerType(), True)])
data = [(1, "Alice", 25),
(1, "Bob", 30),
(3, "Charlie", 40)]
df = spark.createDataFrame(data, schema)
return df
@dlt.table(name="table2")
@dlt.expect("unique_pk", "num_entries = 1")
def create_df():
df = dlt.read("table1")
df = df.groupBy("id").count().withColumnRenamed("count","num_entries")
return df
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group