Data Engineering

Forum Posts

Sorted by:

by maranBH • New Contributor III

10-19-2021 1:41:18 PM

27036 Views
5 replies
11 kudos

Resolved! How to import a function to another notebook using Repos without %run?

Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...

Data Engineering

27036 Views
5 replies
11 kudos

10-19-2021 1:41:18 PM

View Replies

Latest Reply

JakubSkibicki
Contributor

11-06-2024 2:25:26 AM

11 kudos

Due to new functionalies in Runtime 16.0 regarding autoload i came across this autoload. Performaed a practical test. It works. However had some problems at first.As in solution the key was that definitions are places in a file.py not a notebook.

11 kudos

11-06-2024 2:25:26 AM

4 More Replies

by b_1 • New Contributor II

06-20-2023 1:51:36 AM

1617 Views
2 replies
1 kudos

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:from pyspark.sq...

Data Engineering

1617 Views
2 replies
1 kudos

06-20-2023 1:51:36 AM

View Replies

Latest Reply

b_1
New Contributor II

10-11-2023 9:20:05 AM

1 kudos

Is there anybody who has the same issue or knows that this is in fact an issue?

1 kudos

10-11-2023 9:20:05 AM

1 More Replies

by MichaelO • New Contributor III

05-05-2023 2:07:40 AM

4167 Views
4 replies
2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

Data Engineering

4167 Views
4 replies
2 kudos

05-05-2023 2:07:40 AM

View Replies

Latest Reply

Raluka
New Contributor III

09-23-2023 8:42:04 PM

2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

2 kudos

09-23-2023 8:42:04 PM

3 More Replies

by mjbobak • Contributor

09-08-2022 6:31:52 PM

26531 Views
5 replies
9 kudos

Resolved! How to import a helper module that uses databricks specific modules (dbutils)

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-i...

Data Engineering

26531 Views
5 replies
9 kudos

09-08-2022 6:31:52 PM

View Replies

Latest Reply

amitca71
Contributor II

12-11-2022 7:51:48 AM

9 kudos

Hi,i 'm facing similiar issue, when deploying via dbx.I have an helper notebook, that when executing it via jobs works fine (without any includes)while i deploy it via dbx (to same cluster), the helper notebook results withdbutils.fs.ls(path)NameEr...

9 kudos

12-11-2022 7:51:48 AM

4 More Replies

by Orianh • Valued Contributor II

05-23-2022 3:33:10 AM

6345 Views
4 replies
3 kudos

function does not exist in JVM ERROR

Hello guys, I'm building a python package that return 1 row from DF at a time inside data bricks environment.To improve the performance of this package i used multiprocessing library in python, I have background process that his whole purpose is to p...

Data Engineering

6345 Views
4 replies
3 kudos

05-23-2022 3:33:10 AM

View Replies

Latest Reply

dineshreddy
New Contributor III

06-27-2023 5:37:14 PM

3 kudos

Using thread instead of processes solved the issue for me

3 kudos

06-27-2023 5:37:14 PM

3 More Replies

by giriraj01234567 • New Contributor II

06-16-2023 9:09:11 PM

8964 Views
1 replies
2 kudos

getting error while runction show function

I was using String indexer, while fitting, transforming I didn't get any erro. but While runnign show function I am getting error, I mention the error beloworg.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 45.0 failed...

Data Engineering

8964 Views
1 replies
2 kudos

06-16-2023 9:09:11 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-18-2023 3:20:08 AM

2 kudos

Hi @Bojja Giri Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

2 kudos

06-18-2023 3:20:08 AM

by luiso • New Contributor

06-08-2023 6:15:33 AM

913 Views
1 replies
0 kudos

Is it possible to create sql statements (after return) with cte's in databricks scalar functions?

The function returns an integer as a result of the query

Data Engineering

913 Views
1 replies
0 kudos

06-08-2023 6:15:33 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 10:58:56 PM

0 kudos

Hi @Luis Lopez Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-15-2023 10:58:56 PM

by g96g • New Contributor III

05-23-2023 1:26:00 AM

1121 Views
1 replies
0 kudos

Function in databricks

Im having a hard time to convert below function from SSMS to databricks function. Any help would be appreciated! CREATE FUNCTION [dbo].[MaxOf5Values] (@D1 [int],@D2 [int],@D3 [int],@D4 [int],@D5 [int]) RETURNS int AS BEGIN DECLARE @Result int ...

Data Engineering

1121 Views
1 replies
0 kudos

05-23-2023 1:26:00 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

05-23-2023 6:38:49 AM

0 kudos

Hi @Givi Salu ,Please refer to this link that will help you convert this function.

0 kudos

05-23-2023 6:38:49 AM

by elgeo • Valued Contributor II

04-18-2023 1:04:12 AM

9195 Views
4 replies
0 kudos

Function returns UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY

Hello experts. The below function in Databricks gives UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error. We didn't have this issue though in Oracle. Is this a limitation of Databricks? Just to note the final result returns only one row. Thank you in advan...

Data Engineering

9195 Views
4 replies
0 kudos

04-18-2023 1:04:12 AM

View Replies

Latest Reply

TheofilosSt
New Contributor II

05-09-2023 1:29:49 AM

0 kudos

Hello @Suteja Kanuri can we have any respond on the above?Thank you.

0 kudos

05-09-2023 1:29:49 AM

3 More Replies

by qwerty1 • Contributor

04-24-2023 11:56:35 PM

7399 Views
3 replies
1 kudos

Is there a way to register a scala function that is available to other notebooks?

I am in a situation where I have a notebook that runs in a pipeline that creates a "live streaming table". So, I cannot use a language other than sql in the pipeline. I would like to format a certain column in the pipeline using a scala code (it's a ...

Data Engineering

7399 Views
3 replies
1 kudos

04-24-2023 11:56:35 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

04-25-2023 7:36:09 AM

1 kudos

no, DLT does not work with Scala unfortunately.Delta Live Tables are not vanilla spark.Is python an option instead of scala?

1 kudos

04-25-2023 7:36:09 AM

2 More Replies

by Prasann_gupta • New Contributor

03-09-2023 10:40:52 PM

10278 Views
3 replies
0 kudos

SQL CONTAINS Function is not working on Databricks

I am trying to use sql CONTAINS function in my sql query but it is throwing the below error :AnalysisException: Undefined function: 'CONTAINS'. This function is neither a registered temporary function nor a permanent function registered in the databa...

Data Engineering

10278 Views
3 replies
0 kudos

03-09-2023 10:40:52 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:47:41 PM

0 kudos

Hi @Prasann Gupta Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

0 kudos

03-31-2023 5:47:41 PM

2 More Replies

by andrew0117 • Contributor

03-26-2023 9:04:50 PM

4791 Views
4 replies
0 kudos

Resolved! Can merge() function be applied to dataframe?

if I have two dataframes df_target and df_source, can I do df_target.as("t).merge(df_source.as("s"), "s.id=t.id").whenMatched().updateAll().whenNotMatched.insertAll.execute(). when I tried the code above, I got the error "merge is not a member of the...

Data Engineering

4791 Views
4 replies
0 kudos

03-26-2023 9:04:50 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-27-2023 9:10:57 PM

0 kudos

Hi @andrew li Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-27-2023 9:10:57 PM

3 More Replies

by jonathan-dufaul • Valued Contributor

01-24-2023 7:28:41 AM

1689 Views
2 replies
0 kudos

Is there a function similar to display that downloads a dataframe?

I find myself constantly having to do display(df), and then "recompute with <5g records and download). I was just hoping I could skip the middleman and download from get go. ideally it'd be a function like download(df,num_rows="max") where num_rows i...

Data Engineering

1689 Views
2 replies
0 kudos

01-24-2023 7:28:41 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-24-2023 10:26:35 AM

0 kudos

Question where do you want to download it to? If to cloud location, use regular DataFrameWriter. You can install, for example, Azure Storage Explorer on your computer. Some cloud storage you can even mount in your system as a folder or network share.

0 kudos

01-24-2023 10:26:35 AM

1 More Replies

by pjp94 • Contributor

12-05-2022 12:53:47 PM

8822 Views
9 replies
7 kudos

Calling a python function (def) in databricks

Not sure if I'm missing something here, but running a task outside of a python function runs much much quicker than executing the same task inside a function. Is there something I'm missing with how spark handles functions? 1) def task(x): y = dostuf...

Data Engineering

8822 Views
9 replies
7 kudos

12-05-2022 12:53:47 PM

View Replies

Latest Reply

sher
Valued Contributor II

01-05-2023 10:30:06 PM

7 kudos

don't use python normal function use UDF in pyspark so that will be faster

7 kudos

01-05-2023 10:30:06 PM

8 More Replies

by weldermartins • Honored Contributor

12-20-2022 2:20:51 PM

3650 Views
3 replies
6 kudos

Resolved! Function When + Dictionary.

Hey everyone, I'm avoiding repeating the When Function for 12x, so I thought of the dictionary. I don't know if it's a limitation of the Spark function or a Logic error. Does the function allow this concatenation?

Data Engineering

3650 Views
3 replies
6 kudos

12-20-2022 2:20:51 PM

View Replies

Latest Reply

weldermartins
Honored Contributor

12-21-2022 9:05:00 AM

6 kudos

Hello everyone, I found this alternative to reduce repeated code.custoDF = (custoDF.withColumn('month', col('Nummes').cast('string')) .replace(months, subset=['month']))

6 kudos

12-21-2022 9:05:00 AM

2 More Replies