<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ApprodxQuantile does not seem to be working with delta live tables (DLT) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10483#M5653</link>
    <description>&lt;P&gt;Maybe try to use (and the first test in the separate notebook) standard df =  spark.read.table("customer_order_silver") to calculate approxQuantile.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Of course, you need to set that customer_order_silver has a target location in the catalog, so read using regular spark.read will work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 30 Jan 2023 18:15:00 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2023-01-30T18:15:00Z</dc:date>
    <item>
      <title>ApprodxQuantile does not seem to be working with delta live tables (DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10482#M5652</link>
      <description>&lt;P&gt;HI,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am tying to use the approxQuantile() function and populate a list that I made, yet somehow, whenever I try to run the code it's as if the list is empty and there are no values in it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Code is written as below:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;@dlt.table(name = "customer_order_silver_v2")
def capping_unitPrice_Qt():
    df =  dlt.read("customer_order_silver")
    boundary_unit = [0,0]
    boundary_qty = [0,0]
    boundary_unit = df.select(col("UnitPrice")).approxQuantile('UnitPrice',[0.05,0.95], 0.25)
&amp;nbsp;
    boundary_qty = df.select(col("Quantity")).approxQuantile('Quantity',[0.05,0.95], 0.25)
&amp;nbsp;
&amp;nbsp;
    df = df.withColumn('UnitPrice', F.when(col('UnitPrice') &amp;gt; boundary_unit[1], boundary_unit[1])
                                       .when(col('UnitPrice') &amp;lt; boundary_unit[0], boundary_unit[0])
                                       .otherwise(col('UnitPrice')))
    
    df = df.withColumn('Quantity', F.when(col('Quantity') &amp;gt; boundary_qty[1], boundary_qty[1])
                                       .when(col('Quantity') &amp;lt; boundary_qty[0], boundary_qty[0])
                                       .otherwise(col('Quantity')))
                                          
    return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The output that I get when running is below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot_20230130_053953"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/751i104EB80533CF7FB6/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot_20230130_053953" alt="Screenshot_20230130_053953" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Am I missing something somewhere? any advice or ideas are welcomed. &lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 16:41:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10482#M5652</guid>
      <dc:creator>Trodenn</dc:creator>
      <dc:date>2023-01-30T16:41:20Z</dc:date>
    </item>
    <item>
      <title>Re: ApprodxQuantile does not seem to be working with delta live tables (DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10483#M5653</link>
      <description>&lt;P&gt;Maybe try to use (and the first test in the separate notebook) standard df =  spark.read.table("customer_order_silver") to calculate approxQuantile.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Of course, you need to set that customer_order_silver has a target location in the catalog, so read using regular spark.read will work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 18:15:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10483#M5653</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2023-01-30T18:15:00Z</dc:date>
    </item>
    <item>
      <title>Re: ApprodxQuantile does not seem to be working with delta live tables (DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10484#M5654</link>
      <description>&lt;P&gt;I see what you are suggesting, if I were to run it in the same notebook but in a different cell that is not a @dlt.table, will it work? I need to determine the quantiles and then use that to make changes to the table so that is why. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To read a delta live table do I just use spark.read.table("customer_order_silver")? &lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 18:18:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10484#M5654</guid>
      <dc:creator>Trodenn</dc:creator>
      <dc:date>2023-01-30T18:18:53Z</dc:date>
    </item>
    <item>
      <title>Re: ApprodxQuantile does not seem to be working with delta live tables (DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10485#M5655</link>
      <description>&lt;P&gt;It will work inside def capping_unitPrice_Qt() I am using precisely the same approach.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;To read a delta live table do I just use spark.read.table("customer_order_silver")? &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yes, if the table is registered in metastore. Usually, you prefix it with a database/schema name (so database.customer_order_silver&lt;I&gt;)&lt;/I&gt;. It is specified in DLT setting what is the name of the database.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 18:22:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10485#M5655</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2023-01-30T18:22:04Z</dc:date>
    </item>
    <item>
      <title>Re: ApprodxQuantile does not seem to be working with delta live tables (DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10486#M5656</link>
      <description>&lt;P&gt;what if this is not a database but another delta live table? do correct me if its the same thing. I really just started learning this tool and spark&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 18:25:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10486#M5656</guid>
      <dc:creator>Trodenn</dc:creator>
      <dc:date>2023-01-30T18:25:04Z</dc:date>
    </item>
    <item>
      <title>Re: ApprodxQuantile does not seem to be working with delta live tables (DLT)</title>
      <link>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10487#M5657</link>
      <description>&lt;P&gt;So I tried running the code inside the dlt function, it tells me that I cannot find the table now. Do I need to do anything to make it kknow where the table is? like add the path to it? &lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 18:34:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/approdxquantile-does-not-seem-to-be-working-with-delta-live/m-p/10487#M5657</guid>
      <dc:creator>Trodenn</dc:creator>
      <dc:date>2023-01-30T18:34:43Z</dc:date>
    </item>
  </channel>
</rss>

