โ12-01-2023 09:13 AM
we are able to test identity columns using sql/python, but when we are trying same using DLT, we are not seeing values under identity column. it is always empty for coloumn we created "id BIGINT GENERATED ALWAYS AS IDENTITY"
โ12-08-2023 01:37 AM
Hi @karthik_p, Letโs dive into the nuances of Delta Live Tables (DLT) and its limitations regarding identity columns.
Identity Columns and DLT:
Materialized Views and Identity Columns:
Considerations:
In summary, DLTโs recommendation to use identity columns primarily with streaming tables doesnโt mean you canโt create materialized views. However, be aware of the limitations and potential implications when using identity columns in materialized views. Always test thoroughly to ensure the desired behaviour in your specific scenarios. ๐
โ12-03-2023 07:52 AM - edited โ12-03-2023 07:52 AM
Hi @karthik_p, When working with Delta Live Tables (DLT), implementing an identity column (similar to an auto-incrementing primary key) can be achieved using Python syntax.
Letโs dive into how you can set up an identity column in DLT using Python:
You can find more details and examples in the official Databricks documentation.
Remember to consider whether youโre dealing with a streaming table or not, as this can impact how th...d. Happy DLT-ing! ๐
โ12-03-2023 08:19 AM
Hi @Kaniz_Fatma @above example is for materialized view and also it uses String type and as per documentation DLT, identity column only supports
1. streaming use case
2. sql syntax streaming is not supported
I am not seeing any python example to add column on that fly or update. Also streaming uses inference, I tried to use merge schema but still same issue. If we are able to change schema on the fly for python that should confirm about identity column please
โ12-06-2023 12:20 AM - edited โ12-06-2023 12:23 AM
Hi @karthik_p, I apologize for any confusion. Let's address your concerns regarding Delta Lake and identity columns in the context of Python.
Identity Columns in Delta Lake:
Python Example for Identity Columns in Delta Lake:
# Example schema definition with an identity column
# Example schema definition with an identity column
@dlt.table(
comment="Raw data on sales",
schema="""
customer_id STRING,
customer_name STRING,
number_of_line_items STRING,
order_datetime STRING,
order_number LONG,
order_day_of_week STRING GENERATED ALWAYS AS (dayofweek(order_datetime))
""",
partition_cols=["order_day_of_week"]
)
def sales():
return "..." # Your data loading logic here
# You can then insert new data into the 'sales' table
# The 'order_day_of_week' identity column will auto-increment
Remember that identity columns provide a convenient way to generate surrogate keys, but their usage should align with the specific requirements of your data processing workflows.
Feel free to ask if you encounter any further issues or need additional assistance! ๐
โ01-03-2024 11:19 AM
Hi Kaniz,
I tried the solution above with streaming tables. I defined sales as returning a streaming table, but the identity columns are not populated. They always show as null. Any help is much appreciated.
โ12-06-2023 08:22 AM
@Kaniz_Fatma thank you for quick response, we are able to generate for streaming and materialized views. but only confusion that i am seeing is, in terms of limitations that are mentioned in DLT
Identity columns are not supported with tables that are the target of APPLY CHANGES INTO and might be recomputed during updates for materialized views. For this reason, Databricks recommends only using identity columns with streaming tables in Delta Live Tables. See Use identity columns in Delta Lake.
Based on above limitation, does this means do we need to use only for streaming use cases. when we tested for creation of Materialized Views also it worked ( data from data frames are passed into MV)
โ12-08-2023 01:37 AM
Hi @karthik_p, Letโs dive into the nuances of Delta Live Tables (DLT) and its limitations regarding identity columns.
Identity Columns and DLT:
Materialized Views and Identity Columns:
Considerations:
In summary, DLTโs recommendation to use identity columns primarily with streaming tables doesnโt mean you canโt create materialized views. However, be aware of the limitations and potential implications when using identity columns in materialized views. Always test thoroughly to ensure the desired behaviour in your specific scenarios. ๐
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group