Databricks

Trodenn · ‎01-30-2023

HI,

I am tying to use the approxQuantile() function and populate a list that I made, yet somehow, whenever I try to run the code it's as if the list is empty and there are no values in it.

Code is written as below:

@dlt.table(name = "customer_order_silver_v2")
def capping_unitPrice_Qt():
    df =  dlt.read("customer_order_silver")
    boundary_unit = [0,0]
    boundary_qty = [0,0]
    boundary_unit = df.select(col("UnitPrice")).approxQuantile('UnitPrice',[0.05,0.95], 0.25)
 
    boundary_qty = df.select(col("Quantity")).approxQuantile('Quantity',[0.05,0.95], 0.25)
 
 
    df = df.withColumn('UnitPrice', F.when(col('UnitPrice') > boundary_unit[1], boundary_unit[1])
                                       .when(col('UnitPrice') < boundary_unit[0], boundary_unit[0])
                                       .otherwise(col('UnitPrice')))
    
    df = df.withColumn('Quantity', F.when(col('Quantity') > boundary_qty[1], boundary_qty[1])
                                       .when(col('Quantity') < boundary_qty[0], boundary_qty[0])
                                       .otherwise(col('Quantity')))
                                          
    return df

The output that I get when running is below:

Am I missing something somewhere? any advice or ideas are welcomed.

Hubert-Dudek · ‎01-30-2023

Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.

Of course, you need to set that customer_order_silver has a target location in the catalog, so read using regular spark.read will work.

View solution in original post

Hubert-Dudek · ‎01-30-2023

Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.

Of course, you need to set that customer_order_silver has a target location in the catalog, so read using regular spark.read will work.

Trodenn · ‎01-30-2023

I see what you are suggesting, if I were to run it in the same notebook but in a different cell that is not a @dlt.table, will it work? I need to determine the quantiles and then use that to make changes to the table so that is why.

To read a delta live table do I just use spark.read.table("customer_order_silver")?

Hubert-Dudek · ‎01-30-2023

It will work inside def capping_unitPrice_Qt() I am using precisely the same approach.

To read a delta live table do I just use spark.read.table("customer_order_silver")?

Yes, if the table is registered in metastore. Usually, you prefix it with a database/schema name (so database.customer_order_silver). It is specified in DLT setting what is the name of the database.

Trodenn · ‎01-30-2023

what if this is not a database but another delta live table? do correct me if its the same thing. I really just started learning this tool and spark

Trodenn · ‎01-30-2023

So I tried running the code inside the dlt function, it tells me that I cannot find the table now. Do I need to do anything to make it kknow where the table is? like add the path to it?

Databricks

ApprodxQuantile does not seem to be working with delta live tables (DLT)

Announcing the General Availability of Databricks Asset Bundles

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Register now and save 50% on training at Data + AI Summit!