<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Set and use variables in DLT pipeline notebooks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/set-and-use-variables-in-dlt-pipeline-notebooks/m-p/16826#M10928</link>
    <description>&lt;P&gt;Using DLT, I have two streaming sources coming from autoloader. Source1 contains a single row of data in the file and Source2 has thousands of rows. There is a common key column between the two sources to join them together. So far, so good.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a function that does a lookup in a CosmosDB database for an item. This function takes an ItemId as a parameter. I use the return value as a filter in a later step in my pipeline. I can hard-code an ItemId in this function and the pipeline runs without issues and pulls back the proper data from Cosmos and filters the DLT table correctly. However, I don't want to hard-code the ItemId but get it's value from Source1 after a file lands. Source1 will always only contain 1 row per file and the ItemId is a column in that row.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have tried something like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ItemId = spark.sql("SELECT ItemId from STREAM(LIVE.Source1)")
FilterInfo = LookupInCosmosdb(ItemId)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;... but that doesn't work. I've tried several other things and none have worked.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can I:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Set a variable from a DLT streaming table&lt;/LI&gt;&lt;LI&gt;Use that variable in another DLT table as a filter clause&lt;/LI&gt;&lt;LI&gt;OR, use a sub-select directly in the WHERE clause of Source2 like this:&lt;OL&gt;&lt;LI&gt;SELECT * FROM Source2 WHERE Source2.ItemId IN (SELECT LookupInCosmosdb(Source1.KeyCol) FROM Source1)&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Thu, 15 Dec 2022 01:14:00 GMT</pubDate>
    <dc:creator>kskistad</dc:creator>
    <dc:date>2022-12-15T01:14:00Z</dc:date>
    <item>
      <title>Set and use variables in DLT pipeline notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/set-and-use-variables-in-dlt-pipeline-notebooks/m-p/16826#M10928</link>
      <description>&lt;P&gt;Using DLT, I have two streaming sources coming from autoloader. Source1 contains a single row of data in the file and Source2 has thousands of rows. There is a common key column between the two sources to join them together. So far, so good.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a function that does a lookup in a CosmosDB database for an item. This function takes an ItemId as a parameter. I use the return value as a filter in a later step in my pipeline. I can hard-code an ItemId in this function and the pipeline runs without issues and pulls back the proper data from Cosmos and filters the DLT table correctly. However, I don't want to hard-code the ItemId but get it's value from Source1 after a file lands. Source1 will always only contain 1 row per file and the ItemId is a column in that row.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have tried something like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ItemId = spark.sql("SELECT ItemId from STREAM(LIVE.Source1)")
FilterInfo = LookupInCosmosdb(ItemId)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;... but that doesn't work. I've tried several other things and none have worked.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can I:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Set a variable from a DLT streaming table&lt;/LI&gt;&lt;LI&gt;Use that variable in another DLT table as a filter clause&lt;/LI&gt;&lt;LI&gt;OR, use a sub-select directly in the WHERE clause of Source2 like this:&lt;OL&gt;&lt;LI&gt;SELECT * FROM Source2 WHERE Source2.ItemId IN (SELECT LookupInCosmosdb(Source1.KeyCol) FROM Source1)&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Thu, 15 Dec 2022 01:14:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/set-and-use-variables-in-dlt-pipeline-notebooks/m-p/16826#M10928</guid>
      <dc:creator>kskistad</dc:creator>
      <dc:date>2022-12-15T01:14:00Z</dc:date>
    </item>
  </channel>
</rss>

