Data Engineering

Forum Posts

Sorted by:

by nolanreilly • New Contributor II

07-22-2021 7:08:02 AM

958 Views
1 replies
1 kudos

Impossible to read a custom pipeline? (Scala)

I have created a custom transformer to be used in a ml pipeline. I was able to write the pipeline to storage by extending the transformer class with DefaultParamsWritable. Reading the pipeline back in however, does not seem possible in Scala. I have...

Data Engineering

958 Views
1 replies
1 kudos

07-22-2021 7:08:02 AM

View Replies

Latest Reply

WarrenO
New Contributor III

03-06-2025 10:52:25 AM

1 kudos

Hi, did you ever find a solution for this?

1 kudos

03-06-2025 10:52:25 AM

by qwerty1 • Contributor

03-23-2023 5:46:15 AM

6075 Views
7 replies
19 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

Data Engineering

6075 Views
7 replies
19 kudos

03-23-2023 5:46:15 AM

View Replies

Latest Reply

guersam
New Contributor II

11-11-2024 1:13:41 AM

19 kudos

I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...

19 kudos

11-11-2024 1:13:41 AM

6 More Replies

by swzzzsw • New Contributor III

01-24-2022 11:30:43 AM

5719 Views
6 replies
0 kudos

Resolved! SQLServerException: deadlock

I'm using databricks to connect to a SQL managed instance via JDBC. SQL operations I need to perform include DELETE, UPDATE, and simple read and write. Since spark syntax only handles simple read and write, I had to open SQL connection using Scala an...

Data Engineering

5719 Views
6 replies
0 kudos

01-24-2022 11:30:43 AM

View Replies

Latest Reply

Panda
Valued Contributor

10-21-2024 4:56:31 AM

0 kudos

@swzzzsw Since you are performing database operations, to reduce the chances of deadlocks, make sure to wrap your SQL operations inside transactions using commit and rollback.Another approachs to consider is adding retry logic or using Isolation Leve...

0 kudos

10-21-2024 4:56:31 AM

5 More Replies

by Mohit_m • Valued Contributor II

06-15-2022 5:23:13 AM

29701 Views
3 replies
4 kudos

Resolved! How to get the Job ID and Run ID and save into a database

We are having Databricks Job running with main class and JAR file in it. Our JAR file code base is in Scala. Now, when our job starts running, we need to log Job ID and Run ID into the database for future purpose. How can we achieve this?

Data Engineering

29701 Views
3 replies
4 kudos

06-15-2022 5:23:13 AM

View Replies

Latest Reply

Bruno-Castro
New Contributor II

05-08-2024 1:05:13 AM

4 kudos

That article is for members only, can we also specify here how to do it (for those that are not medium members?). Thanks!

4 kudos

05-08-2024 1:05:13 AM

2 More Replies

by YSDPrasad • New Contributor III

03-10-2023 11:45:22 AM

6609 Views
3 replies
3 kudos

Resolved! NoClassDefFoundError: scala/Product$class

import com.microsoft.azure.sqldb.spark.config.Configimport com.microsoft.azure.sqldb.spark.connect._import com.microsoft.azure.sqldb.spark.query._val query = "Truncate table tablename"val config = Config(Map( "url" -> dbutils.secrets.get(scope = ...

Data Engineering

6609 Views
3 replies
3 kudos

03-10-2023 11:45:22 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-13-2023 12:22:33 AM

3 kudos

@Someswara Durga Prasad Yaralgadda :The NoClassDefFoundError error occurs when a class that was available during the compile time is not available at the runtime. This could be due to a few reasons, including a missing dependency or an incompatible ...

3 kudos

03-13-2023 12:22:33 AM

2 More Replies

by schnee1 • New Contributor III

10-23-2015 6:07:48 AM

9698 Views
8 replies
0 kudos

Access struct elements inside dataframe?

I have JSON data set that contains a price in a string like "USD 5.00". I'd like to convert the numeric portion to a Double to use in an MLLIB LabeledPoint, and have managed to split the price string into an array of string. The below creates a data...

Data Engineering

9698 Views
8 replies
0 kudos

10-23-2015 6:07:48 AM

View Replies

Latest Reply

goldentriangle
New Contributor II

08-10-2023 8:26:34 PM

0 kudos

Thanks, Golden Triangle Tour

0 kudos

08-10-2023 8:26:34 PM

7 More Replies

by AryaMa • New Contributor III

07-12-2019 3:07:30 PM

32929 Views
13 replies
8 kudos

Resolved! reading data from url using spark

reading data form url using spark ,community edition ,got a path related error ,any suggestions please ? url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv" from pyspark import SparkFiles spark.sparkContext.addFil...

Data Engineering

32929 Views
13 replies
8 kudos

07-12-2019 3:07:30 PM

View Replies

Latest Reply

padang
New Contributor II

03-01-2023 1:10:07 PM

8 kudos

Sorry, bringing this back up...from pyspark import SparkFiles url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv" spark.sparkContext.addFile(url) df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSc...

8 kudos

03-01-2023 1:10:07 PM

12 More Replies

by Sandesh87 • New Contributor III

06-13-2023 11:31:15 AM

1138 Views
1 replies
2 kudos

apply a function across multiple smaller dataframes created from one big dataframe in scala

The dataframe 'big_df' looks like the below| id| index| timestamp||:---- |:------:| -----:|| abc| 1| 11:00:00|| abc| 1| 11:00:10|| abc| 1| 11:00:20|| abc| 1| 11:00:30|| abc| 1| 11:00:40|| abc| 1| 11:00:50|| abc| 2| 11:01:00|| abc| 2| 11:01:10|| abc| ...

Data Engineering

1138 Views
1 replies
2 kudos

06-13-2023 11:31:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 11:10:06 PM

2 kudos

Hi @Sandesh Puligundla Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

2 kudos

06-15-2023 11:10:06 PM

by swatish0395 • New Contributor III

05-11-2023 2:59:39 AM

3846 Views
3 replies
4 kudos

Resolved! how to create a scala jar using db notebook and save it in a file path inside databricks

I have scala function as below, i am unable to understand how to write a scala jar with the same, please find below code i have used Enforcing Column-Level Encryption - Databrick %scala import com.macasaet.fernet.{Key, StringValidator, Token}import o...

Data Engineering

3846 Views
3 replies
4 kudos

05-11-2023 2:59:39 AM

View Replies

Latest Reply

swatish0395
New Contributor III

05-23-2023 3:50:49 AM

4 kudos

I had to finally create the jar using teh intellij and sbt iconfiguration on the same env. and then installed the jar in the cluster it worked

4 kudos

05-23-2023 3:50:49 AM

2 More Replies

by Pawan1 • New Contributor II

08-25-2022 3:38:34 AM

2010 Views
1 replies
2 kudos

Your administrator has forbidden Scala UDFs from being run on this cluster. How to enable access to Scala UDF on Azure Databricks cluster ?

Hi All,When i try to run a scala UDF in Azuredatabricks 10.1 (includes Apache Spark 3.2.0, Scala 2.12) cluster i was able to run the udf. However when i tried to run the same notebook in 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12) cluster i ha...

Data Engineering

2010 Views
1 replies
2 kudos

08-25-2022 3:38:34 AM

View Replies

Latest Reply

Debayan
Databricks Employee

04-24-2023 8:07:40 AM

2 kudos

Hi, Are you trying this with High concurrency clusters? Also, please tag @Debayan Mukherjee with your next response so that I will get notified.

2 kudos

04-24-2023 8:07:40 AM

by gud4eve • New Contributor III

04-10-2023 12:07:10 AM

3089 Views
1 replies
0 kudos

Resolved! Scala app getting NullPointerException while migrating from DBR 7.3 to 9.1 (and above)

We are migrating our Scala jobs from AWS EMR (6.2.1 and Spark version - 3.0.1) to Lakehouse and few of our jobs are failing due to NullPointerException. We tried in Databricks Runtime 7.3 LTS, it is working fine. Because it had same spark version 3.0...

Data Engineering

3089 Views
1 replies
0 kudos

04-10-2023 12:07:10 AM

View Replies

Latest Reply

gud4eve
New Contributor III

04-10-2023 11:33:40 PM

0 kudos

In one of my code statements, I updated scala Boolean to java.lang.Boolean and this is working fine now. May be in new newer Spark versions, null in scala Boolean isn't supported.

0 kudos

04-10-2023 11:33:40 PM

by Databrickguy • New Contributor II

01-13-2023 9:42:13 AM

1368 Views
1 replies
0 kudos

How to use Java MaskFormatter in sparksql?

I create a function based on Java MaskFormatter function in Databricks/Scala.But when I call it from sparksql, I received error messageError in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/t...

Data Engineering

1368 Views
1 replies
0 kudos

01-13-2023 9:42:13 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:57:56 AM

0 kudos

@Tim zhang :The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register t...

0 kudos

04-10-2023 7:57:56 AM

by bchaubey • Contributor II

01-15-2023 8:57:47 AM

4285 Views
1 replies
0 kudos

unable to connect with Azure Storage with Scala

Hi Team, I am unable to connect Storage account with scala in Databricks, getting bellow error.AbfsRestOperationException: Status code: -1 error code: null error message: Cannot resolve hostname: ptazsg5gfcivcrstrlrs.dfs.core.windows.netCaused by: Un...

Data Engineering

4285 Views
1 replies
0 kudos

01-15-2023 8:57:47 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:49:45 AM

0 kudos

@Bhagwan Chaubey :The error message suggests that the hostname for your Azure Storage account could not be resolved. This could happen if there is a network issue, or if the hostname is incorrect.Here are some steps you can try to resolve the issue:...

0 kudos

04-10-2023 7:49:45 AM

by aladda • Databricks Employee

06-19-2021 8:59:35 PM

4345 Views
2 replies
3 kudos

Resolved! Can you share variables defined in a Python based cell with Scala cells?

Data Engineering

4345 Views
2 replies
3 kudos

06-19-2021 8:59:35 PM

View Replies

Latest Reply

Imtiyaz_Shaikh
New Contributor II

04-05-2023 11:34:51 AM

3 kudos

The workaround is available here.Idea is to use spark.conf.set(), spark.conf.get() methods.

3 kudos

04-05-2023 11:34:51 AM

1 More Replies

by jerry-xu-sa • New Contributor II

03-06-2023 11:45:02 PM

3082 Views
2 replies
1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

Data Engineering

3082 Views
2 replies
1 kudos

03-06-2023 11:45:02 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:58:05 PM

1 kudos

Hi @Jerry Xu Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

1 kudos

03-31-2023 5:58:05 PM

1 More Replies