cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ApprodxQuantile does not seem to be working with delta live tables (DLT)

Trodenn
New Contributor III

HI,

I am tying to use the approxQuantile() function and populate a list that I made, yet somehow, whenever I try to run the code it's as if the list is empty and there are no values in it.

Code is written as below:

@dlt.table(name = "customer_order_silver_v2")
def capping_unitPrice_Qt():
    df =  dlt.read("customer_order_silver")
    boundary_unit = [0,0]
    boundary_qty = [0,0]
    boundary_unit = df.select(col("UnitPrice")).approxQuantile('UnitPrice',[0.05,0.95], 0.25)
 
    boundary_qty = df.select(col("Quantity")).approxQuantile('Quantity',[0.05,0.95], 0.25)
 
 
    df = df.withColumn('UnitPrice', F.when(col('UnitPrice') > boundary_unit[1], boundary_unit[1])
                                       .when(col('UnitPrice') < boundary_unit[0], boundary_unit[0])
                                       .otherwise(col('UnitPrice')))
    
    df = df.withColumn('Quantity', F.when(col('Quantity') > boundary_qty[1], boundary_qty[1])
                                       .when(col('Quantity') < boundary_qty[0], boundary_qty[0])
                                       .otherwise(col('Quantity')))
                                          
    return df

The output that I get when running is below:

Screenshot_20230130_053953 

Am I missing something somewhere? any advice or ideas are welcomed.

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.

Of course, you need to set that customer_order_silver has a target location in the catalog, so read using regular spark.read will work.

View solution in original post

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.

Of course, you need to set that customer_order_silver has a target location in the catalog, so read using regular spark.read will work.

Trodenn
New Contributor III

I see what you are suggesting, if I were to run it in the same notebook but in a different cell that is not a @dlt.table, will it work? I need to determine the quantiles and then use that to make changes to the table so that is why.

To read a delta live table do I just use spark.read.table("customer_order_silver")?

Hubert-Dudek
Esteemed Contributor III

It will work inside def capping_unitPrice_Qt() I am using precisely the same approach.

To read a delta live table do I just use spark.read.table("customer_order_silver")?

Yes, if the table is registered in metastore. Usually, you prefix it with a database/schema name (so database.customer_order_silver). It is specified in DLT setting what is the name of the database.

Trodenn
New Contributor III

what if this is not a database but another delta live table? do correct me if its the same thing. I really just started learning this tool and spark

Trodenn
New Contributor III

So I tried running the code inside the dlt function, it tells me that I cannot find the table now. Do I need to do anything to make it kknow where the table is? like add the path to it?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group