<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Correlated column exception in SQL UDF when using UDF parameters. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33821#M24742</link>
    <description>&lt;P&gt;Thanks for your suggestion. The fact that I want to do this in SparkSQL is because there is no underlying SQLServer.&lt;/P&gt;</description>
    <pubDate>Fri, 27 Jan 2023 10:52:23 GMT</pubDate>
    <dc:creator>creastysomp</dc:creator>
    <dc:date>2023-01-27T10:52:23Z</dc:date>
    <item>
      <title>Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33802#M24723</link>
      <description>&lt;P&gt;&lt;B&gt;Environment&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Azure Databricks 10.1, including Spark 3.2.0&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Scenario&lt;/B&gt;&lt;/P&gt;&lt;P&gt;I want to retrieve the average of a series of values between two timestamps, using a SQL UDF.&lt;/P&gt;&lt;P&gt;The average is obviously just an example. In a real scenario, I would like to hide some additional querying complexity behind a SQL UDF.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Tried (and working)&lt;/B&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%sql
SELECT avg(temperature) as averageTemperature 
    from oventemperatures 
    where ovenTimestamp
          between to_timestamp("1999-01-01") 
          and to_timestamp("2021-12-31")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And this seems possible in a UDF as well.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE FUNCTION
averageTemperatureUDF(ovenID STRING, startTime TIMESTAMP, endTime TIMESTAMP) 
   RETURNS FLOAT
   READS SQL DATA SQL SECURITY DEFINER
   RETURN SELECT avg(ovenTemperature) as averageTemperature 
          from oventemperatures 
          where ovenTimestamp
                between to_timestamp("1999-01-01") 
                and to_timestamp("2021-12-31")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;Tried (and failed)&lt;/B&gt;&lt;/P&gt;&lt;P&gt;When I want to use the UDF's parameters in the filter condition, the function definition fails.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE FUNCTION
averageTemperatureUDF(ovenID STRING, startTime TIMESTAMP, endTime TIMESTAMP) 
   RETURNS FLOAT
   READS SQL DATA SQL SECURITY DEFINER
   RETURN SELECT avg(ovenTemperature) as averageTemperature 
          from oventemperatures 
          where ovenTimestamp
                between startTime
                and endTime&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The error message complains on "correlated column".&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Error in SQL statement: AnalysisException: 
Correlated column is not allowed in predicate 
(spark_catalog.default.oventemperatures.ovenTimestamp &amp;gt;= outer(averageTemperatureUDF.startTime))
(spark_catalog.default.oventemperatures.ovenTimestamp &amp;lt;= outer(averageTemperatureUDF.endTime)):
Aggregate [avg(ovenTemperature#275) AS averageTemperature#9299]
+- Filter ((ovenTimestamp#273 &amp;gt;= outer(startTime#9301)) AND (ovenTimestamp#273 &amp;lt;= outer(endTime#9302)))
   +- SubqueryAlias spark_catalog.default.oventemperatures
      +- Relation default.oventemperatures[ovenTimestamp#273,(...),ovenTemperature#275,(...)] 
	     JDBCRelation(OvenTemperatures) [numPartitions=1]&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;Question(s)&lt;/B&gt;&lt;/P&gt;&lt;P&gt;It seems that it is not accepted to use the UDF's parameters inside the expressions.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Is that a correct conclusion?&lt;/LI&gt;&lt;LI&gt;Any reason for this limitation?&lt;/LI&gt;&lt;LI&gt;Any alternatives to work around this?&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 06 Dec 2021 12:38:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33802#M24723</guid>
      <dc:creator>Johan_Van_Noten</dc:creator>
      <dc:date>2021-12-06T12:38:56Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33803#M24724</link>
      <description>&lt;P&gt;Hello there, @Johan Van Noten​! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My name is Piper and I'm one of the moderators here. Welcome to the community and thanks for your questions. Let's give it a while longer to see how the community responds. We can circle back to this later if we need to. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Dec 2021 23:45:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33803#M24724</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-06T23:45:42Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33804#M24725</link>
      <description>&lt;P&gt;@Johan Van Noten​&amp;nbsp;- I'm sorry about the delay. Our SMEs are running behind. &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'll bump this back to them. Hang in there.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Dec 2021 16:13:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33804#M24725</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-28T16:13:17Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33805#M24726</link>
      <description>&lt;P&gt;Hi @Johan Van Noten​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It seems like there is an issue with the columns that are part of your "where" clause. These columns are part of the UDF itself. I would like to recommend to the check the following docs &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1) &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-function.html" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-function.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2) &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-functions-udf-scalar.html" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-functions-udf-scalar.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3) &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-functions-udf-aggregate.html" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-functions-udf-aggregate.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hopefully, these link could be helpful to solve this issue.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jan 2022 00:54:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33805#M24726</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-01-12T00:54:40Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33806#M24727</link>
      <description>&lt;P&gt;HI @Johan Van Noten​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Were you able to resolve this issue? I am stuck on a same issue. Please if you can guide me if you are able to resolve this issue.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank You.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2022 05:21:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33806#M24727</guid>
      <dc:creator>lav</dc:creator>
      <dc:date>2022-07-05T05:21:42Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33807#M24728</link>
      <description>&lt;P&gt;Hi Iav&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Unfortunately, I've not been able to solve this or work around it.&lt;/P&gt;&lt;P&gt;Based on @Jose Gonzalez​&amp;nbsp;' reply, I've been reading the documents he pointed to and I concluded that this is just not possible in SparkSQL. It significantly disables the usability of UDFs, but I can understand why it doesn't work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Johan&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2022 07:09:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33807#M24728</guid>
      <dc:creator>Johan_Van_Noten</dc:creator>
      <dc:date>2022-07-05T07:09:18Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33808#M24729</link>
      <description>&lt;P&gt;This UDF takes two parameters startDate and tillDate, and returns an integer number. Create a udf in sql server with return type integer.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2022 09:16:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33808#M24729</guid>
      <dc:creator>Robert_Smith</dc:creator>
      <dc:date>2022-07-05T09:16:19Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33809#M24730</link>
      <description>&lt;P&gt;Hi Robert,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your suggestion. The fact that I want to do this in SparkSQL is because there is no underlying SQLServer. Well, there might be, but often there isn't. We make use of the possibility of SparkSQL to abstract the underlying data stores (Parquet files, CSV files or SQL data) into a single SQL-alike layer. That is a kind of virtual curation.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In that context, I do not see an option to delegate the UDF to a lower SQL server. Am I overlooking something?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;&lt;P&gt;Johan&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2022 13:38:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33809#M24730</guid>
      <dc:creator>Johan_Van_Noten</dc:creator>
      <dc:date>2022-07-05T13:38:36Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33810#M24731</link>
      <description>&lt;P&gt;Hi @Johan Van Noten​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I got a work around it. If this helps you. Below is the query I wrote&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Target Query:&lt;/P&gt;&lt;P&gt;create or replace function TestAverage(DateBudget Date) RETURNS FLOAT&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;return select Avg(pd.Amount) as Amount&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;from TestTable1 as pd&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;left join TestTable2 er&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;on er.PK = pd.Exchange_Rate_PK&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;where sign(DATEDIFF(DateBudget, date_add(to_date(pd.From_Date), -1))) = 1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;and sign(DATEDIFF(DateBudget, date_add(to_date(pd.To_Date), 1))) = -1;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Original Query:&lt;/P&gt;&lt;P&gt;select AVG(pd.Amount) as Amount&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;from TestTable1 as pd&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;left join TestTable2 er&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;on er.PK = pd.Exchange_Rate_PK&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;where fci.DATEBUDGET &amp;gt;= pd.From_Date&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;and fci.DATEBUDGET &amp;lt;= pd.To_Date&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2022 16:40:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33810#M24731</guid>
      <dc:creator>lav</dc:creator>
      <dc:date>2022-07-05T16:40:38Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33811#M24732</link>
      <description>&lt;P&gt;I have used sign function to determine the date diff.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2022 16:41:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33811#M24732</guid>
      <dc:creator>lav</dc:creator>
      <dc:date>2022-07-05T16:41:01Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33812#M24733</link>
      <description>&lt;P&gt;Hi @Johan Van Noten​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Were you able to work around this issue. The UDF does not mind an equivalent operator but does not like it when the predicate has a non equivalent operator like &amp;gt; or &amp;lt;. &lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 19:57:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33812#M24733</guid>
      <dc:creator>Raghu_Bindingan</dc:creator>
      <dc:date>2022-12-01T19:57:54Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33813#M24734</link>
      <description>&lt;P&gt;Hi @Raghu Bindinganavale​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I wasn't able to work around this and considered it a restriction of SparkSQL. We are coping with it on a higher layer now, which is unfortunate.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Johan&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 08:39:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33813#M24734</guid>
      <dc:creator>Johan_Van_Noten</dc:creator>
      <dc:date>2022-12-02T08:39:55Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33814#M24735</link>
      <description>&lt;P&gt;Iav's solution works.​&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 12:04:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33814#M24735</guid>
      <dc:creator>Own</dc:creator>
      <dc:date>2022-12-02T12:04:11Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33815#M24736</link>
      <description>&lt;P&gt;In case of Azure Databricks you can leverage ADF and run this function using SQL Integration runtime while ingesting without having any dependency on lower SQL environment.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 12:12:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33815#M24736</guid>
      <dc:creator>Own</dc:creator>
      <dc:date>2022-12-02T12:12:39Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33816#M24737</link>
      <description>&lt;P&gt;Hi @Johan Van Noten​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I got it to work, its a very different kind of SQL then what i am used to but it works, you could try the below and tweak it to your needs. Basically spark sql does not like predicates that are not equal, so you will need to use a self join to the same table and create a column that equates to your conditions and then join that back to your main table to get the desired result. Very different from our daily SQL &lt;span class="lia-unicode-emoji" title=":grinning_face:"&gt;😀&lt;/span&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;CREATE FUNCTION&lt;/P&gt;&lt;P&gt;averageTemperatureUDF(ovenID STRING, startTime TIMESTAMP, endTime TIMESTAMP)&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;RETURNS FLOAT&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;READS SQL DATA SQL SECURITY DEFINER&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;RETURN SELECT avg(ovenTemperature) as averageTemperature&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;from oventemperatures base&lt;/P&gt;&lt;P&gt;		&amp;nbsp;INNER JOIN (SELECT ovenTimestamp BETWEEN startTime AND endTime AS label, ovenTimestamp FROM oventemperatures) *****&lt;/P&gt;&lt;P&gt;			ON base.ovenTimestamp = *****.ovenTimestamp&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;where *****.label IS true&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 13:11:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33816#M24737</guid>
      <dc:creator>Raghu_Bindingan</dc:creator>
      <dc:date>2022-12-02T13:11:15Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33817#M24738</link>
      <description>&lt;P&gt;Hi @Raghu Bindinganavale​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your suggestion.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm not sure how the SQL engine will optimize this query, but you are essentially calculating the condition-label over all oventemperatures in the full storage (millions) and only afterwards filtering the good from the bad ones in the where clause.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While the oventemperatures (virtual) table is partitioned on ovenTimestamp and should therefore be able to resolve the BETWEEN extremely quickly, I guess that your solution may not run with acceptable performance. Do you think the query optimizer will be able to work its way through this efficiently?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my scenario description, I indicated that this is just a simplified example to clarify the issue. I guess though that in our general cases, isolating the condition in the way you propose may be possible. Although I remain sceptical about the performance. I'll need to try your proposal once I have the chance to refactor the application.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;&lt;P&gt;Johan&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 14:14:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33817#M24738</guid>
      <dc:creator>Johan_Van_Noten</dc:creator>
      <dc:date>2022-12-02T14:14:07Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33818#M24739</link>
      <description>&lt;P&gt;Thanks guys, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@Lav Chandaria​&amp;nbsp;and @Raghu Bindinganavale​&amp;nbsp;'s solutions both work, but as indicated in my reply above, I'm worried about the performance of evaluating the datediff (Lav) or the label (Raghu) over the full dataset without the engine being able to just cut away "half" of the potential outcomes based on the single &amp;lt; and information from an index / partition. (see my comment above to Raghu).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'll try it on a significant dataset once I get the opportunity.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Johan&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 14:26:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33818#M24739</guid>
      <dc:creator>Johan_Van_Noten</dc:creator>
      <dc:date>2022-12-02T14:26:07Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33819#M24740</link>
      <description>&lt;P&gt;HI @Johan Van Noten​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the instance that i had which was quite simple it did perform ok, but you are right, about performance, this is something that needs to be monitored. I am sure SQL server would go crazy with this approach.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Raghu&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 14:28:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33819#M24740</guid>
      <dc:creator>Raghu_Bindingan</dc:creator>
      <dc:date>2022-12-02T14:28:45Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33820#M24741</link>
      <description>&lt;P&gt;Hi Iav,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;With sign I still get Correlated column bug, any thoughts?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;CREATE FUNCTION IF NOT EXISTS rw_weekday_diff(start_date DATE, end_date DATE)&lt;/P&gt;&lt;P&gt;RETURNS INT&lt;/P&gt;&lt;P&gt;COMMENT 'Calculated days between two dates excluding weekend'&lt;/P&gt;&lt;P&gt;RETURN DATEDIFF(DAY, start_date, end_date)&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;- (DATEDIFF(WEEK, start_date, end_date) * 2)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;- (CASE WHEN dayofweek(start_date) = 1 THEN 1 ELSE 0 END)&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;+ (CASE WHEN dayofweek(start_date) = 7 THEN 1 ELSE 0 END)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;- (SELECT COUNT(*) FROM bank_holidays AS h WHERE&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sign(DATEDIFF(h.date, date_add(to_date(start_date), -1))) = 1 AND&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sign(DATEDIFF(h.date, date_add(to_date(end_date), 1))) = -1);&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;--&amp;nbsp;&amp;nbsp;&amp;nbsp;original where clause - (SELECT COUNT(*) FROM bank_holidays.uk_bank_holidays AS h WHERE h.date &amp;gt;= start_date AND h.date = end_date);&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2023 17:14:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33820#M24741</guid>
      <dc:creator>Rhys</dc:creator>
      <dc:date>2023-01-24T17:14:04Z</dc:date>
    </item>
    <item>
      <title>Re: Correlated column exception in SQL UDF when using UDF parameters.</title>
      <link>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33821#M24742</link>
      <description>&lt;P&gt;Thanks for your suggestion. The fact that I want to do this in SparkSQL is because there is no underlying SQLServer.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jan 2023 10:52:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/correlated-column-exception-in-sql-udf-when-using-udf-parameters/m-p/33821#M24742</guid>
      <dc:creator>creastysomp</dc:creator>
      <dc:date>2023-01-27T10:52:23Z</dc:date>
    </item>
  </channel>
</rss>

