cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

giohappy
by New Contributor III
  • 2372 Views
  • 3 replies
  • 1 kudos

Resolved! SedonaSqlExtensions is not autoregistering types and functions

The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions withSedonaRegistrator.registerAll(spark)We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial q...

  • 2372 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Giovanni Allegri​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 1 kudos
2 More Replies
Christine
by Contributor II
  • 27236 Views
  • 4 replies
  • 1 kudos

Resolved! Is it possible to import functions from a module in Workspace/Shared instead of Repos?

Hi,I am considering creating libraries for my databricks notebooks, and found that it is possible to import functions from modules saved in repos. Is it possible to move the .py files with the functions to Workspace/Shared and still import functions ...

  • 27236 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Christine Pedersen​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell ...

  • 1 kudos
3 More Replies
JLCDA
by New Contributor
  • 2349 Views
  • 2 replies
  • 0 kudos

databricks-connect 9.1 : StreamCorruptedException: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

Hello, I'm using databricks-connect 9.1 and I started having issues since last week in all functions that have a "collect()". Everything was working before : myList = df1.select("id").rdd.flatMap(lambda x: x).collect()here the error : py4j.protocol.P...

  • 2349 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Julien Larcher​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 0 kudos
1 More Replies
hitesh22
by New Contributor II
  • 4033 Views
  • 5 replies
  • 0 kudos
  • 4033 Views
  • 5 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, I am not sure if this helps: https://www.databricks.com/blog/2020/12/15/python-autocomplete-improvements-for-databricks-notebooks.htmlAlso, please tag @Debayan​ with your next response which will notify me. Thank you!

  • 0 kudos
4 More Replies
RyanHager
by Contributor
  • 3074 Views
  • 5 replies
  • 2 kudos

Are there any plans to add functions on the partition by fields of a delta table definition such as day() ? A similar capability exists in iceberg.

Benefit: This will help simplify the where clauses of the consumers of the tables? Just query on the main date field if I need all the data for a day. Not an extra day field we had to make.

  • 3074 Views
  • 5 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

@Ryan Hager​ , yes it is possible using AUTO GENERATED COLUMNS since delta lake 1.2For example, you can automatically generate a date column (for partitioning the table by date) from the timestamp column; any writes into the table need only specify t...

  • 2 kudos
4 More Replies
GC-James
by Contributor II
  • 4029 Views
  • 6 replies
  • 10 kudos

Disable dbutils suggestion

I would like to turn off or suppress this message which is returned from the dbutils library. %r   files <- dbutils.fs.ls("/dbfs/tmp/")   For prettier results from dbutils.fs.ls(<dir>), please use `%fs ls <dir>`How can I do this?

  • 4029 Views
  • 6 replies
  • 10 kudos
Latest Reply
Vidula
Honored Contributor
  • 10 kudos

Hi @James Smith​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 10 kudos
5 More Replies
vk217
by Contributor
  • 2199 Views
  • 1 replies
  • 0 kudos

Access same createorreplacetempview("viewname") by multiple functions.

I have several functions accessing the same createorreplacetempview("viewname"). Does this cause any issues with multiple functions accessing it in a distributed environment?def get_data_sql(spark_session, data_frame, data_element): data_fram...

  • 2199 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

there is two type of viewsone is global view - it will be available for whole cluster and notebook but it will removed after cluster restartand another is Temp view- that will be available for only notebook level, and other notebook will not able to ...

  • 0 kudos
fury88
by New Contributor II
  • 1062 Views
  • 1 replies
  • 0 kudos

Why are the get..Id() functions returning 'some(123456)' instead of just the id?

Hey fellow users, I've successfully retrieved the notebook context during job runs and there are several getId calls. For some reason when the ids are returned, they are wrapped in a some() instead of just the number. Does anyone know why this is the...

  • 1062 Views
  • 1 replies
  • 0 kudos
Latest Reply
fury88
New Contributor II
  • 0 kudos

Well, my post for me is irrelevant now!! I just stumbled across this beauty which avoids me having to do any of this and deal with odd return values.How to get the Job ID and Run ID and save into a database (databricks.com)Are the braces {{job_id}} n...

  • 0 kudos
dtabass
by New Contributor III
  • 2997 Views
  • 3 replies
  • 0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

  • 2997 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Michael Carey​ Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

  • 0 kudos
2 More Replies
codevisionz
by New Contributor
  • 526 Views
  • 0 replies
  • 0 kudos

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, da...

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, data structures, sorting algorithms, mathematical functions, mathematical sequences, threads, exceptio...

  • 526 Views
  • 0 replies
  • 0 kudos
cdiers
by New Contributor
  • 2059 Views
  • 0 replies
  • 0 kudos

Dataframe functions not ending

Hi everyone,For a few days now, my notebook containing some Databricks functions stopped working.Last day my notebook correctly ran was the 6th of April.Since then, jobs won't stop and keep running because some functions don't end.I figured those fun...

image
  • 2059 Views
  • 0 replies
  • 0 kudos
Barb
by New Contributor III
  • 6650 Views
  • 6 replies
  • 0 kudos

SQL charindex function?

Hi all,I need to use the SQL charindex function, but I'm getting a databricks error that this doesn't exist. That can't be true, right? Thanks for any ideas about how to make this work!Barb

  • 6650 Views
  • 6 replies
  • 0 kudos
Latest Reply
Traveller
New Contributor II
  • 0 kudos

The best option I found to replace CHARINDEX was LOCATE, examples from the Spark documentation below > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4

  • 0 kudos
5 More Replies
Labels