<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic In Python, Streaming read by DLT from Hive Table in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/in-python-streaming-read-by-dlt-from-hive-table/m-p/5073#M82</link>
    <description>&lt;P&gt;I am pulling data from Google BigQuery and writing it to a bronze table on an interval. I do this in a separate continuous job because DLT did not like the BigQuery connector calling collect on a dataframe inside of DLT.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In Python, I would like to read that bronze table in to DLT in a streaming fashion and create a silver table with some complex dataframe logic and functions. I can accomplish this with the below SQL, but most of our pipeline is in Python and I'd like to know how to do this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am probably missing something rather small. I do NOT want to use the absolute path if possible. I would rather reference the table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How do I convert the below SQL to Python? Can I use a table reference in Python? Where is this explained in the docs?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;CREATE STREAMING LIVE VIEW silver_1  -- create a new STREAMING LIVE view called silver_1
SELECT *
FROM STREAM(dev.bronze_raw)
-- catalog = hive_metastore
-- schema = dev
-- table = bronze_raw
-- path would be something like = dbfs:/user/hive/warehouse/dev.db/bronze_raw&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Python please...&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import dlt
&amp;nbsp;
???&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 21 Mar 2025 12:59:52 GMT</pubDate>
    <dc:creator>MetaRossiVinli</dc:creator>
    <dc:date>2025-03-21T12:59:52Z</dc:date>
    <item>
      <title>In Python, Streaming read by DLT from Hive Table</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/in-python-streaming-read-by-dlt-from-hive-table/m-p/5073#M82</link>
      <description>&lt;P&gt;I am pulling data from Google BigQuery and writing it to a bronze table on an interval. I do this in a separate continuous job because DLT did not like the BigQuery connector calling collect on a dataframe inside of DLT.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In Python, I would like to read that bronze table in to DLT in a streaming fashion and create a silver table with some complex dataframe logic and functions. I can accomplish this with the below SQL, but most of our pipeline is in Python and I'd like to know how to do this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am probably missing something rather small. I do NOT want to use the absolute path if possible. I would rather reference the table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How do I convert the below SQL to Python? Can I use a table reference in Python? Where is this explained in the docs?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;CREATE STREAMING LIVE VIEW silver_1  -- create a new STREAMING LIVE view called silver_1
SELECT *
FROM STREAM(dev.bronze_raw)
-- catalog = hive_metastore
-- schema = dev
-- table = bronze_raw
-- path would be something like = dbfs:/user/hive/warehouse/dev.db/bronze_raw&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Python please...&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import dlt
&amp;nbsp;
???&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Mar 2025 12:59:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/in-python-streaming-read-by-dlt-from-hive-table/m-p/5073#M82</guid>
      <dc:creator>MetaRossiVinli</dc:creator>
      <dc:date>2025-03-21T12:59:52Z</dc:date>
    </item>
    <item>
      <title>Re: In Python, Streaming read by DLT from Hive Table</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/in-python-streaming-read-by-dlt-from-hive-table/m-p/5074#M83</link>
      <description>&lt;P&gt;The below code is a solution. I was missing that I could read from a table with `spark.readStream.format("delta").table("...")`. Simple. Just missed it. This is different than `dlt.read_stream()` which appears in the examples a lot.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is referenced as an example in the docs on CDC: &lt;A href="https://docs.databricks.com/delta-live-tables/cdc.html" alt="https://docs.databricks.com/delta-live-tables/cdc.html" target="_blank"&gt;https://docs.databricks.com/delta-live-tables/cdc.html&lt;/A&gt;.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import dlt
&amp;nbsp;
@dlt.table(
    table_properties = {"quality" : "silver"}
)
def silver_1():
    # Read the changes as a stream from the table
    df = spark.readStream.format("delta").table("hive_metastore.dev.bronze_raw")
    
    # Return the entire dataframe with all columns
    return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Reading from a table like this is not explicitly given as an example in the Python ref: &lt;A href="https://docs.databricks.com/delta-live-tables/python-ref.html" target="test_blank"&gt;https://docs.databricks.com/delta-live-tables/python-ref.html&lt;/A&gt;. I think that making this an example in a section called "Reading from sources"  with examples on how to read in various ways would save people some time. I will send some feedback on that.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Apr 2023 22:24:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/in-python-streaming-read-by-dlt-from-hive-table/m-p/5074#M83</guid>
      <dc:creator>MetaRossiVinli</dc:creator>
      <dc:date>2023-04-28T22:24:14Z</dc:date>
    </item>
  </channel>
</rss>

