Re: In Python, Streaming read by DLT from Hive Tab...

MetaRossiVinli · ‎04-28-2023

The below code is a solution. I was missing that I could read from a table with `spark.readStream.format("delta").table("...")`. Simple. Just missed it. This is different than `dlt.read_stream()` which appears in the examples a lot.

This is referenced as an example in the docs on CDC: https://docs.databricks.com/delta-live-tables/cdc.html.

import dlt
 
@dlt.table(
    table_properties = {"quality" : "silver"}
)
def silver_1():
    # Read the changes as a stream from the table
    df = spark.readStream.format("delta").table("hive_metastore.dev.bronze_raw")
    
    # Return the entire dataframe with all columns
    return df

Reading from a table like this is not explicitly given as an example in the Python ref: https://docs.databricks.com/delta-live-tables/python-ref.html. I think that making this an example in a section called "Reading from sources" with examples on how to read in various ways would save people some time. I will send some feedback on that.

View solution in original post