Data Engineering

Forum Posts

Sorted by:

by giohappy • New Contributor III

02-06-2023 3:31:16 AM

2771 Views
3 replies
1 kudos

Resolved! SedonaSqlExtensions is not autoregistering types and functions

The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...

Data Engineering

2771 Views
3 replies
1 kudos

02-06-2023 3:31:16 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2023 2:41:21 AM

1 kudos

Hi @Giovanni Allegri Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

1 kudos

04-12-2023 2:41:21 AM

2 More Replies

by Christine • Contributor II

04-26-2023 3:25:27 AM

29363 Views
4 replies
1 kudos

Resolved! Is it possible to import functions from a module in Workspace/Shared instead of Repos?

Hi,I am considering creating libraries for my databricks notebooks, and found that it is possible to import functions from modules saved in repos. Is it possible to move the .py files with the functions to Workspace/Shared and still import functions ...

Data Engineering

29363 Views
4 replies
1 kudos

04-26-2023 3:25:27 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-30-2023 11:45:20 PM

1 kudos

Hi @Christine Pedersen Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell ...

1 kudos

04-30-2023 11:45:20 PM

3 More Replies

by JLCDA • New Contributor

04-13-2023 6:09:00 AM

2557 Views
2 replies
0 kudos

databricks-connect 9.1 : StreamCorruptedException: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

Hello, I'm using databricks-connect 9.1 and I started having issues since last week in all functions that have a "collect()". Everything was working before : myList = df1.select("id").rdd.flatMap(lambda x: x).collect()here the error : py4j.protocol.P...

Data Engineering

2557 Views
2 replies
0 kudos

04-13-2023 6:09:00 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 10:17:43 PM

0 kudos

Hi @Julien Larcher Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

0 kudos

04-15-2023 10:17:43 PM

1 More Replies

by hitesh22 • New Contributor II

03-31-2023 7:12:36 AM

4530 Views
5 replies
0 kudos

Is there a way to add docstring to user defined functions in databricks?

Data Engineering

4530 Views
5 replies
0 kudos

03-31-2023 7:12:36 AM

View Replies

Latest Reply

Debayan
Databricks Employee

04-03-2023 10:01:12 PM

0 kudos

Hi, I am not sure if this helps: https://www.databricks.com/blog/2020/12/15/python-autocomplete-improvements-for-databricks-notebooks.htmlAlso, please tag @Debayan with your next response which will notify me. Thank you!

0 kudos

04-03-2023 10:01:12 PM

4 More Replies

by RyanHager • Contributor

04-28-2022 12:15:29 PM

3376 Views
5 replies
2 kudos

Are there any plans to add functions on the partition by fields of a delta table definition such as day() ? A similar capability exists in iceberg.

Benefit: This will help simplify the where clauses of the consumers of the tables? Just query on the main date field if I need all the data for a day. Not an extra day field we had to make.

Data Engineering

3376 Views
5 replies
2 kudos

04-28-2022 12:15:29 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

05-07-2022 10:35:55 AM

2 kudos

@Ryan Hager , yes it is possible using AUTO GENERATED COLUMNS since delta lake 1.2For example, you can automatically generate a date column (for partitioning the table by date) from the timestamp column; any writes into the table need only specify t...

2 kudos

05-07-2022 10:35:55 AM

4 More Replies

by GC-James • Contributor II

08-10-2022 2:13:11 AM

4534 Views
6 replies
10 kudos

Disable dbutils suggestion

I would like to turn off or suppress this message which is returned from the dbutils library. %r files <- dbutils.fs.ls("/dbfs/tmp/") For prettier results from dbutils.fs.ls(<dir>), please use `%fs ls <dir>`How can I do this?

Data Engineering

4534 Views
6 replies
10 kudos

08-10-2022 2:13:11 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-20-2022 2:13:44 AM

10 kudos

Hi @James Smith Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

10 kudos

09-20-2022 2:13:44 AM

5 More Replies

by vk217 • Contributor

12-19-2022 5:49:25 AM

2429 Views
1 replies
0 kudos

Access same createorreplacetempview("viewname") by multiple functions.

I have several functions accessing the same createorreplacetempview("viewname"). Does this cause any issues with multiple functions accessing it in a distributed environment?def get_data_sql(spark_session, data_frame, data_element): data_fram...

Data Engineering

2429 Views
1 replies
0 kudos

12-19-2022 5:49:25 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-20-2022 6:06:47 AM

0 kudos

there is two type of viewsone is global view - it will be available for whole cluster and notebook but it will removed after cluster restartand another is Temp view- that will be available for only notebook level, and other notebook will not able to ...

0 kudos

12-20-2022 6:06:47 AM

by fury88 • New Contributor II

12-16-2022 9:50:17 AM

1214 Views
1 replies
0 kudos

Why are the get..Id() functions returning 'some(123456)' instead of just the id?

Hey fellow users, I've successfully retrieved the notebook context during job runs and there are several getId calls. For some reason when the ids are returned, they are wrapped in a some() instead of just the number. Does anyone know why this is the...

Data Engineering

1214 Views
1 replies
0 kudos

12-16-2022 9:50:17 AM

View Replies

Latest Reply

fury88
New Contributor II

12-16-2022 1:15:49 PM

0 kudos

Well, my post for me is irrelevant now!! I just stumbled across this beauty which avoids me having to do any of this and deal with odd return values.How to get the Job ID and Run ID and save into a database (databricks.com)Are the braces {{job_id}} n...

0 kudos

12-16-2022 1:15:49 PM

by jay_sharma • New Contributor III

09-13-2022 12:43:56 PM

1543 Views
0 replies
4 kudos

Function not found when running from another Notebook using %run command.

Hi all,I'm trying to run some functions from another notebook (data_process_notebook) in my main notebook, using the %run command command. When I run the command: %run ../path/to/data_process_notebook, it is able to complete successfully, no path, pe...

Data Engineering

1543 Views
0 replies
4 kudos

09-13-2022 12:43:56 PM

by dtabass • New Contributor III

05-29-2022 4:51:15 PM

3226 Views
3 replies
0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

Data Engineering

3226 Views
3 replies
0 kudos

05-29-2022 4:51:15 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-28-2022 10:32:22 AM

0 kudos

Hey there @Michael Carey Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

0 kudos

07-28-2022 10:32:22 AM

2 More Replies

by codevisionz • New Contributor

07-23-2022 4:58:27 AM

581 Views
0 replies
0 kudos

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, da...

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, data structures, sorting algorithms, mathematical functions, mathematical sequences, threads, exceptio...

Data Engineering

581 Views
0 replies
0 kudos

07-23-2022 4:58:27 AM

by cdiers • New Contributor

04-12-2022 2:31:31 AM

2222 Views
0 replies
0 kudos

Dataframe functions not ending

Hi everyone,For a few days now, my notebook containing some Databricks functions stopped working.Last day my notebook correctly ran was the 6th of April.Since then, jobs won't stop and keep running because some functions don't end.I figured those fun...

Data Engineering

2222 Views
0 replies
0 kudos

04-12-2022 2:31:31 AM

by Barb • New Contributor III

10-07-2019 9:05:30 AM

7364 Views
6 replies
0 kudos

SQL charindex function?

Hi all,I need to use the SQL charindex function, but I'm getting a databricks error that this doesn't exist. That can't be true, right? Thanks for any ideas about how to make this work!Barb

Data Engineering

7364 Views
6 replies
0 kudos

10-07-2019 9:05:30 AM

View Replies

Latest Reply

Traveller
New Contributor II

10-13-2020 10:14:32 PM

0 kudos

The best option I found to replace CHARINDEX was LOCATE, examples from the Spark documentation below > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4

0 kudos

10-13-2020 10:14:32 PM

5 More Replies