Hi,I have started getting following error while running jobs in databrick. It started failing since last few days. Recently I have migrated to unity catalogue, no other change was made recently. I am running on DBR 13.3 LTS. com.google.common.util.co...
Hi,I am trying to deploy mlflow model in Sagemaker. My mlflow model is registered in Databrick.Followed below url to deploy and it need ECR for deployment. For ECR, either I can create custom image and push to ECR or its mentioned in below url to get...
Hi,I am trying to delete duplicate records found by key but its very slow. Its continuous running pipeline so data is not that huge but still it takes time to execute this command.df = df.dropDuplicates(["fileName"])Is there any better approach to d...
Hi,I am running autoloader with continuous trigger. How can I stop this trigger during some specific time, only if no data pending and current batch process is complete. How to check how many records pending in queue and current state.Regards,Sanjay
Hi,I have pyspark dataframe and pyspark udf which calls mlflow model for each row but its performance is too slow.Here is sample codedef myfunc(input_text): restult = mlflowmodel.predict(input_text) return resultmyfuncUDF = udf(myfunc,StringType(...
Thank you @Kaniz_Fatma. As I am trying to remove duplicate only on single column, so am specifying column name in dropDuplicates. Still its very slow. Can you provide more context on last point i.e. Streamlining Your Data with Grouping and Aggregatio...
Hi Kaniz,I started getting following error after using myfunc_udf with 2 parameters.pythonException: 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()Regards,Sanjay
Thank you @Kaniz_Fatma, its really helpful and did worked. Another quick question, I have to pass 2 parameters as input to myfunc. Please help how to pass multiple parameters. def myfunc(input_text, param2): # Assuming mlflowmodel is defined elsewh...