Unable to create a record_id column via DLT - Autoloader

RiyazAliM — Wed, 10 Apr 2024 11:22:19 GMT

Hi Community,

I'm trying to load data from the landing zone to the bronze layer via DLT- Autoloader, I want to add a column record_id to the bronze table while I fetch my data. I'm also using file arrival trigger in the workflow to update my table incrementally.

I've followed a post in the community forum using monotonically_increasing_id() but it's not supported in the streaming use case. I also do not have any event-time based column to enable watermarking.

Attaching code snippet for reference:

def create_bronze_tables( source_file_path=None, target_table_name=None, primary_keys=None ): @Dlt.table(name=f"{target_table_name}") def create_sample_tables(): table = ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", "csv") .option("inferColumnTypes", True) .option('header','true') .option("delimiter", "\t") .option("mergeSchema", "true") .load(f"{source_file_path}") .withColumn("record_id", row_number().over( Window.orderBy(monotonically_increasing_id()))) ) return table

the error I face is attached.

Any help in how to create a record-id in this setting is appreciated.

Re: Unable to create a record_id column via DLT - Autoloader

RiyazAliM — Thu, 18 Apr 2024 09:36:18 GMT

Hey @Retired_mod - could you or any body from the community team help me here, please? I've been stuck since quite some time now.

topic Unable to create a record_id column via DLT - Autoloader in Data Engineering

Unable to create a record_id column via DLT - Autoloader

Re: Unable to create a record_id column via DLT - Autoloader