cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

famous_jt33
by New Contributor
  • 775 Views
  • 2 replies
  • 2 kudos

SQL UDFs for DLT pipelines

I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separat...

  • 775 Views
  • 2 replies
  • 2 kudos
Latest Reply
6502
New Contributor III
  • 2 kudos

You can't. The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL cod...

  • 2 kudos
1 More Replies
Christine
by Contributor
  • 18517 Views
  • 4 replies
  • 1 kudos

Resolved! Is it possible to import functions from a module in Workspace/Shared instead of Repos?

Hi,I am considering creating libraries for my databricks notebooks, and found that it is possible to import functions from modules saved in repos. Is it possible to move the .py files with the functions to Workspace/Shared and still import functions ...

  • 18517 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Christine Pedersen​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell ...

  • 1 kudos
3 More Replies
KarthikeyanB
by New Contributor II
  • 1041 Views
  • 3 replies
  • 4 kudos

Resolved! Window function + Multiple simultaneous aggregations

Hi team,Why is there no support to perform multiple aggregations together with a single window spec? ie I dont want to specify each aggregation separately and I don't want to see each aggregation perform as a separate piece of work.Or if there is ind...

  • 1041 Views
  • 3 replies
  • 4 kudos
Latest Reply
KarthikeyanB
New Contributor II
  • 4 kudos

Hi @Kaniz Fatma​ ,Firstly, thank you much for responding.Thank you for confirming that performing multiple aggr using a single window spec does NOT evaluate the window spec separately each time. My bad in the wrong understanding prior.

  • 4 kudos
2 More Replies
Bbren
by New Contributor
  • 1554 Views
  • 2 replies
  • 1 kudos

Resolved! Handling of millions of xml in json files

Hi all, i have some questions related to the handling of many smalls files and possible improvements and augmentations. We have many small xml files. These files are previously processed by another system that puts them in our datalake, but as an add...

  • 1554 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Bauke Brenninkmeijer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

  • 1 kudos
1 More Replies
Tsar
by New Contributor III
  • 7001 Views
  • 12 replies
  • 12 kudos

Limitations with UDFs wrapping modules imported via Repos files?

We have been importing custom module wheel files from our AzDevOps repository. We are pushing to use the Databricks Repos arbitrary files to simplify this but it is breaking our spark UDF that wraps one of the functions in the library with a ModuleNo...

  • 7001 Views
  • 12 replies
  • 12 kudos
Latest Reply
Scott_B
New Contributor III
  • 12 kudos

If your notebook is in the same Repo as the module, this should work without any modifications to the sys path.If your notebook is not in the same Repo as the module, you may need to ensure that the sys path is correct on all nodes in your cluster th...

  • 12 kudos
11 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1384 Views
  • 2 replies
  • 9 kudos

databricks Photon is a next-generation engine on the Databricks Lakehouse Platform that provides speedy query performance at a low cost.- Its function...

databricks Photon is a next-generation engine on the Databricks Lakehouse Platform that provides speedy query performance at a low cost.- Its function coverage is growing, and UDF under Photon is coming, which can bring significant improvements in us...

ezgif-5-724cb0ccf8
  • 1384 Views
  • 2 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

 

  • 9 kudos
1 More Replies
rbelidrv
by New Contributor II
  • 3955 Views
  • 3 replies
  • 1 kudos

How to apply a UDF to a property in an array of structs

I have a column that contains an array of structs as follows:"column" : [ { "struct_field1": "struct_value", "struct_field2": "struct_value" }, { "struct_field1": "struct_value", "struct_field2": "struct_value" } ]I want to apply a udf to each f...

  • 3955 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Richard Belihomji​, It looks like you are trying to apply a UDF to each field of the structs in an array column in a Spark DataFrame. However, it seems you are encountering an issue with the UDF not receiving the context.To nest a UDF inside a tr...

  • 1 kudos
2 More Replies
Pawan1
by New Contributor II
  • 1010 Views
  • 1 replies
  • 2 kudos

Your administrator has forbidden Scala UDFs from being run on this cluster. How to enable access to Scala UDF on Azure Databricks cluster ?

Hi All,When i try to run a scala UDF in Azuredatabricks 10.1 (includes Apache Spark 3.2.0, Scala 2.12) cluster i was able to run the udf. However when i tried to run the same notebook in 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12) cluster i ha...

  • 1010 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi, Are you trying this with High concurrency clusters? Also, please tag @Debayan Mukherjee​ with your next response so that I will get notified.

  • 2 kudos
tytytyc26
by New Contributor II
  • 1161 Views
  • 3 replies
  • 0 kudos

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Hi everyone,I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element...

  • 1161 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

 @Yan Chong Tan​ :The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attr...

  • 0 kudos
2 More Replies
sanjay
by Valued Contributor II
  • 4432 Views
  • 4 replies
  • 4 kudos

Resolved! PySpark UDF is taking long to process

Hi,I have UDF which runs for each spark dataframe row, does some complex processing and return string output. But it takes very long if data is 15000 rows. I have configured cluster with autoscaling, but its not spinning more servers.Please suggest h...

  • 4432 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Sanjay Jain​ â€‹â€‹, We haven't heard from you since the last response from @Lakshay Goel​, @rishabh and @Vigneshraja Palaniraj​​, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with ...

  • 4 kudos
3 More Replies
MikeJohnsonZa
by New Contributor
  • 1183 Views
  • 3 replies
  • 0 kudos

Resolved! Importing irregularly formatted json files

HiI'm importing a large collection of json files, the problem is that they are not what I would expect a well-formatted json file to be (although probably still valid), each file consists of only a single record that looks something like this (this i...

  • 1183 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Michael Johnson​,I would like to share the following notebook which contains examples on how to process complex data types, like JSON. Please check the following link and let us know if you still need help https://docs.databricks.com/optimization...

  • 0 kudos
2 More Replies
Johan_Van_Noten
by New Contributor III
  • 6619 Views
  • 19 replies
  • 10 kudos

Resolved! Correlated column exception in SQL UDF when using UDF parameters.

EnvironmentAzure Databricks 10.1, including Spark 3.2.0ScenarioI want to retrieve the average of a series of values between two timestamps, using a SQL UDF.The average is obviously just an example. In a real scenario, I would like to hide some additi...

  • 6619 Views
  • 19 replies
  • 10 kudos
Latest Reply
creastysomp
New Contributor II
  • 10 kudos

Thanks for your suggestion. The fact that I want to do this in SparkSQL is because there is no underlying SQLServer.

  • 10 kudos
18 More Replies
Ancil
by Contributor II
  • 877 Views
  • 1 replies
  • 1 kudos

PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 2.'.

I have pandas_udf, its working for 4 rows, but I tried with more than 4 rows getting below error.PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was...

  • 877 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ancil
Contributor II
  • 1 kudos

@Kaniz Fatma​  Can you please help me on pandas_udf ?Above scenario I have used regular expressions, for that we have our spark method, but I have other pandas_udf have same issue.

  • 1 kudos
Gim
by Contributor
  • 2730 Views
  • 2 replies
  • 1 kudos

Resolved! How to use SQL UDFs for Delta Live Table pipelines?

I've been searching for a way to use a SQL UDF for our DLT pipeline. In this case it is to convert a time duration string into INT seconds. How exactly do we use/apply UDFs in this case?

  • 2730 Views
  • 2 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@GimYou can create Python UDF and then use it in SQL.https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html#use-python-udfs-in-sql

  • 1 kudos
1 More Replies
Ancil
by Contributor II
  • 1309 Views
  • 3 replies
  • 1 kudos

Resolved! PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 2.'.

I have pandas_udf, its working for 1 rows, but I tried with more than one rows getting below error.PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output w...

  • 1309 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

I was testing, and your function is correct. So you need to have an error in inputData type (is all string) or with result_json. Please also check the runtime version. I was using 11 LTS. 

  • 1 kudos
2 More Replies
Labels