cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

maranBH
by New Contributor III
  • 26114 Views
  • 5 replies
  • 11 kudos

Resolved! How to import a function to another notebook using Repos without %run?

Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...

  • 26114 Views
  • 5 replies
  • 11 kudos
Latest Reply
JakubSkibicki
New Contributor III
  • 11 kudos

Due to new functionalies in Runtime 16.0 regarding autoload i came across this autoload. Performaed a practical test. It works. However had some problems at first.As in solution the key was that definitions are places in a file.py not a notebook.

  • 11 kudos
4 More Replies
b_1
by New Contributor II
  • 1421 Views
  • 2 replies
  • 1 kudos

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:from pyspark.sq...

  • 1421 Views
  • 2 replies
  • 1 kudos
Latest Reply
b_1
New Contributor II
  • 1 kudos

Is there anybody who has the same issue or knows that this is in fact an issue?

  • 1 kudos
1 More Replies
MichaelO
by New Contributor III
  • 3896 Views
  • 4 replies
  • 2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

  • 3896 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raluka
New Contributor III
  • 2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

  • 2 kudos
3 More Replies
mjbobak
by New Contributor III
  • 23767 Views
  • 5 replies
  • 9 kudos

Resolved! How to import a helper module that uses databricks specific modules (dbutils)

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-i...

  • 23767 Views
  • 5 replies
  • 9 kudos
Latest Reply
amitca71
Contributor II
  • 9 kudos

Hi,i 'm facing similiar issue, when deploying via dbx.I have an helper notebook, that when executing it via jobs works fine (without any includes)while i deploy it via dbx (to same cluster), the helper notebook results withdbutils.fs.ls(path)NameEr...

  • 9 kudos
4 More Replies
Orianh
by Valued Contributor II
  • 5942 Views
  • 4 replies
  • 3 kudos

function does not exist in JVM ERROR

Hello guys, I'm building a python package that return 1 row from DF at a time inside data bricks environment.To improve the performance of this package i used multiprocessing library in python, I have background process that his whole purpose is to p...

function dont exist in JVM error.
  • 5942 Views
  • 4 replies
  • 3 kudos
Latest Reply
dineshreddy
New Contributor III
  • 3 kudos

Using thread instead of processes solved the issue for me

  • 3 kudos
3 More Replies
giriraj01234567
by New Contributor II
  • 8868 Views
  • 1 replies
  • 2 kudos

getting error while runction show function

I was using String indexer, while fitting, transforming I didn't get any erro. but While runnign show function I am getting error, I mention the error beloworg.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 45.0 failed...

  • 8868 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Bojja Giri​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
luiso
by New Contributor
  • 812 Views
  • 1 replies
  • 0 kudos
  • 812 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Luis Lopez​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
g96g
by New Contributor III
  • 1053 Views
  • 1 replies
  • 0 kudos

Function in databricks

Im having a hard time to convert below function from SSMS to databricks function. Any help would be appreciated! CREATE FUNCTION [dbo].[MaxOf5Values] (@D1 [int],@D2 [int],@D3 [int],@D4 [int],@D5 [int]) RETURNS int AS BEGIN DECLARE @Result int   ...

  • 1053 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Givi Salu​ ,​Please refer to this link that will help you convert this function.

  • 0 kudos
elgeo
by Valued Contributor II
  • 8732 Views
  • 4 replies
  • 0 kudos

Function returns UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY

Hello experts. The below function in Databricks gives UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error. We didn't have this issue though in Oracle. Is this a limitation of Databricks? Just to note the final result returns only one row. Thank you in advan...

  • 8732 Views
  • 4 replies
  • 0 kudos
Latest Reply
TheofilosSt
New Contributor II
  • 0 kudos

Hello @Suteja Kanuri​  can we have any respond on the above?Thank you.

  • 0 kudos
3 More Replies
qwerty1
by Contributor
  • 6677 Views
  • 3 replies
  • 1 kudos

Is there a way to register a scala function that is available to other notebooks?

I am in a situation where I have a notebook that runs in a pipeline that creates a "live streaming table". So, I cannot use a language other than sql in the pipeline. I would like to format a certain column in the pipeline using a scala code (it's a ...

  • 6677 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

no, DLT does not work with Scala unfortunately.Delta Live Tables are not vanilla spark.Is python an option instead of scala?

  • 1 kudos
2 More Replies
Prasann_gupta
by New Contributor
  • 9265 Views
  • 3 replies
  • 0 kudos

SQL CONTAINS Function is not working on Databricks

I am trying to use sql CONTAINS function in my sql query but it is throwing the below error :AnalysisException: Undefined function: 'CONTAINS'. This function is neither a registered temporary function nor a permanent function registered in the databa...

  • 9265 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Prasann Gupta​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
2 More Replies
andrew0117
by Contributor
  • 4333 Views
  • 4 replies
  • 0 kudos

Resolved! Can merge() function be applied to dataframe?

if I have two dataframes df_target and df_source, can I do df_target.as("t).merge(df_source.as("s"), "s.id=t.id").whenMatched().updateAll().whenNotMatched.insertAll.execute(). when I tried the code above, I got the error "merge is not a member of the...

  • 4333 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @andrew li​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
3 More Replies
jonathan-dufaul
by Valued Contributor
  • 1598 Views
  • 2 replies
  • 0 kudos

Is there a function similar to display that downloads a dataframe?

I find myself constantly having to do display(df), and then "recompute with <5g records and download). I was just hoping I could skip the middleman and download from get go. ideally it'd be a function like download(df,num_rows="max") where num_rows i...

  • 1598 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

Question where do you want to download it to? If to cloud location, use regular DataFrameWriter. You can install, for example, Azure Storage Explorer on your computer. Some cloud storage you can even mount in your system as a folder or network share.

  • 0 kudos
1 More Replies
pjp94
by Contributor
  • 8310 Views
  • 9 replies
  • 7 kudos

Calling a python function (def) in databricks

Not sure if I'm missing something here, but running a task outside of a python function runs much much quicker than executing the same task inside a function. Is there something I'm missing with how spark handles functions? 1) def task(x): y = dostuf...

  • 8310 Views
  • 9 replies
  • 7 kudos
Latest Reply
sher
Valued Contributor II
  • 7 kudos

don't use python normal function use UDF in pyspark so that will be faster

  • 7 kudos
8 More Replies
weldermartins
by Honored Contributor
  • 3443 Views
  • 3 replies
  • 6 kudos

Resolved! Function When + Dictionary.

Hey everyone, I'm avoiding repeating the When Function for 12x, so I thought of the dictionary. I don't know if it's a limitation of the Spark function or a Logic error. Does the function allow this concatenation?

image
  • 3443 Views
  • 3 replies
  • 6 kudos
Latest Reply
weldermartins
Honored Contributor
  • 6 kudos

Hello everyone, I found this alternative to reduce repeated code.custoDF = (custoDF.withColumn('month', col('Nummes').cast('string')) .replace(months, subset=['month']))

  • 6 kudos
2 More Replies
Labels