cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Set and use variables in DLT pipeline notebooks

kskistad
New Contributor III

Using DLT, I have two streaming sources coming from autoloader. Source1 contains a single row of data in the file and Source2 has thousands of rows. There is a common key column between the two sources to join them together. So far, so good.

I have a function that does a lookup in a CosmosDB database for an item. This function takes an ItemId as a parameter. I use the return value as a filter in a later step in my pipeline. I can hard-code an ItemId in this function and the pipeline runs without issues and pulls back the proper data from Cosmos and filters the DLT table correctly. However, I don't want to hard-code the ItemId but get it's value from Source1 after a file lands. Source1 will always only contain 1 row per file and the ItemId is a column in that row.

I have tried something like this:

ItemId = spark.sql("SELECT ItemId from STREAM(LIVE.Source1)")
FilterInfo = LookupInCosmosdb(ItemId)

... but that doesn't work. I've tried several other things and none have worked.

How can I:

  1. Set a variable from a DLT streaming table
  2. Use that variable in another DLT table as a filter clause
  3. OR, use a sub-select directly in the WHERE clause of Source2 like this:
    1. SELECT * FROM Source2 WHERE Source2.ItemId IN (SELECT LookupInCosmosdb(Source1.KeyCol) FROM Source1)
0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group