cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

b_1
by New Contributor II
  • 1148 Views
  • 2 replies
  • 1 kudos

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:from pyspark.sq...

  • 1148 Views
  • 2 replies
  • 1 kudos
Latest Reply
b_1
New Contributor II
  • 1 kudos

Is there anybody who has the same issue or knows that this is in fact an issue?

  • 1 kudos
1 More Replies
MichaelO
by New Contributor III
  • 3345 Views
  • 4 replies
  • 2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

  • 3345 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raluka
New Contributor III
  • 2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

  • 2 kudos
3 More Replies
mjbobak
by New Contributor III
  • 19747 Views
  • 5 replies
  • 9 kudos

Resolved! How to import a helper module that uses databricks specific modules (dbutils)

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-i...

  • 19747 Views
  • 5 replies
  • 9 kudos
Latest Reply
amitca71
Contributor II
  • 9 kudos

Hi,i 'm facing similiar issue, when deploying via dbx.I have an helper notebook, that when executing it via jobs works fine (without any includes)while i deploy it via dbx (to same cluster), the helper notebook results withdbutils.fs.ls(path)NameEr...

  • 9 kudos
4 More Replies
Orianh
by Valued Contributor II
  • 5189 Views
  • 4 replies
  • 3 kudos

function does not exist in JVM ERROR

Hello guys, I'm building a python package that return 1 row from DF at a time inside data bricks environment.To improve the performance of this package i used multiprocessing library in python, I have background process that his whole purpose is to p...

function dont exist in JVM error.
  • 5189 Views
  • 4 replies
  • 3 kudos
Latest Reply
dineshreddy
New Contributor II
  • 3 kudos

Using thread instead of processes solved the issue for me

  • 3 kudos
3 More Replies
giriraj01234567
by New Contributor II
  • 8645 Views
  • 1 replies
  • 2 kudos

getting error while runction show function

I was using String indexer, while fitting, transforming I didn't get any erro. but While runnign show function I am getting error, I mention the error beloworg.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 45.0 failed...

  • 8645 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Bojja Giri​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
luiso
by New Contributor
  • 703 Views
  • 1 replies
  • 0 kudos
  • 703 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Luis Lopez​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
g96g
by New Contributor III
  • 940 Views
  • 1 replies
  • 0 kudos

Function in databricks

Im having a hard time to convert below function from SSMS to databricks function. Any help would be appreciated! CREATE FUNCTION [dbo].[MaxOf5Values] (@D1 [int],@D2 [int],@D3 [int],@D4 [int],@D5 [int]) RETURNS int AS BEGIN DECLARE @Result int   ...

  • 940 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Givi Salu​ ,​Please refer to this link that will help you convert this function.

  • 0 kudos
elgeo
by Valued Contributor II
  • 7990 Views
  • 4 replies
  • 0 kudos

Function returns UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY

Hello experts. The below function in Databricks gives UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error. We didn't have this issue though in Oracle. Is this a limitation of Databricks? Just to note the final result returns only one row. Thank you in advan...

  • 7990 Views
  • 4 replies
  • 0 kudos
Latest Reply
TheofilosSt
New Contributor II
  • 0 kudos

Hello @Suteja Kanuri​  can we have any respond on the above?Thank you.

  • 0 kudos
3 More Replies
qwerty1
by Contributor
  • 6305 Views
  • 3 replies
  • 1 kudos

Is there a way to register a scala function that is available to other notebooks?

I am in a situation where I have a notebook that runs in a pipeline that creates a "live streaming table". So, I cannot use a language other than sql in the pipeline. I would like to format a certain column in the pipeline using a scala code (it's a ...

  • 6305 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

no, DLT does not work with Scala unfortunately.Delta Live Tables are not vanilla spark.Is python an option instead of scala?

  • 1 kudos
2 More Replies
Prasann_gupta
by New Contributor
  • 7303 Views
  • 3 replies
  • 0 kudos

SQL CONTAINS Function is not working on Databricks

I am trying to use sql CONTAINS function in my sql query but it is throwing the below error :AnalysisException: Undefined function: 'CONTAINS'. This function is neither a registered temporary function nor a permanent function registered in the databa...

  • 7303 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Prasann Gupta​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
2 More Replies
andrew0117
by Contributor
  • 3434 Views
  • 4 replies
  • 0 kudos

Resolved! Can merge() function be applied to dataframe?

if I have two dataframes df_target and df_source, can I do df_target.as("t).merge(df_source.as("s"), "s.id=t.id").whenMatched().updateAll().whenNotMatched.insertAll.execute(). when I tried the code above, I got the error "merge is not a member of the...

  • 3434 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @andrew li​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
3 More Replies
jonathan-dufaul
by Valued Contributor
  • 1405 Views
  • 2 replies
  • 0 kudos

Is there a function similar to display that downloads a dataframe?

I find myself constantly having to do display(df), and then "recompute with <5g records and download). I was just hoping I could skip the middleman and download from get go. ideally it'd be a function like download(df,num_rows="max") where num_rows i...

  • 1405 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

Question where do you want to download it to? If to cloud location, use regular DataFrameWriter. You can install, for example, Azure Storage Explorer on your computer. Some cloud storage you can even mount in your system as a folder or network share.

  • 0 kudos
1 More Replies
pjp94
by Contributor
  • 6916 Views
  • 9 replies
  • 7 kudos

Calling a python function (def) in databricks

Not sure if I'm missing something here, but running a task outside of a python function runs much much quicker than executing the same task inside a function. Is there something I'm missing with how spark handles functions? 1) def task(x): y = dostuf...

  • 6916 Views
  • 9 replies
  • 7 kudos
Latest Reply
sher
Valued Contributor II
  • 7 kudos

don't use python normal function use UDF in pyspark so that will be faster

  • 7 kudos
8 More Replies
weldermartins
by Honored Contributor
  • 2892 Views
  • 3 replies
  • 6 kudos

Resolved! Function When + Dictionary.

Hey everyone, I'm avoiding repeating the When Function for 12x, so I thought of the dictionary. I don't know if it's a limitation of the Spark function or a Logic error. Does the function allow this concatenation?

image
  • 2892 Views
  • 3 replies
  • 6 kudos
Latest Reply
weldermartins
Honored Contributor
  • 6 kudos

Hello everyone, I found this alternative to reduce repeated code.custoDF = (custoDF.withColumn('month', col('Nummes').cast('string')) .replace(months, subset=['month']))

  • 6 kudos
2 More Replies
dulu
by New Contributor III
  • 2808 Views
  • 2 replies
  • 6 kudos

Is there a function similar to split_part, json_extract_scalar?

I am using spark_sql version 3.2.1. Is there a function that can replacesplit_part,json_extract_scalarare not?

  • 2808 Views
  • 2 replies
  • 6 kudos
Latest Reply
Ankush
New Contributor II
  • 6 kudos

pyspark.sql.functions.get_json_object(col, path)[source]Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.​

  • 6 kudos
1 More Replies
Labels