Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have created a custom transformer to be used in a ml pipeline. I was able to write the pipeline to storage by extending the transformer class with DefaultParamsWritable. Reading the pipeline back in however, does not seem possible in Scala. I have...
I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.
I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...
I'm using databricks to connect to a SQL managed instance via JDBC. SQL operations I need to perform include DELETE, UPDATE, and simple read and write. Since spark syntax only handles simple read and write, I had to open SQL connection using Scala an...
@swzzzsw Since you are performing database operations, to reduce the chances of deadlocks, make sure to wrap your SQL operations inside transactions using commit and rollback.Another approachs to consider is adding retry logic or using Isolation Leve...
We are having Databricks Job running with main class and JAR file in it. Our JAR file code base is in Scala. Now, when our job starts running, we need to log Job ID and Run ID into the database for future purpose. How can we achieve this?
@Someswara Durga Prasad Yaralgadda :The NoClassDefFoundError error occurs when a class that was available during the compile time is not available at the runtime. This could be due to a few reasons, including a missing dependency or an incompatible ...
I have JSON data set that contains a price in a string like "USD 5.00". I'd like to convert the numeric portion to a Double to use in an MLLIB LabeledPoint, and have managed to split the price string into an array of string. The below creates a data...
reading data form url using spark ,community edition ,got a path related error ,any suggestions please ?
url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv"
from pyspark import SparkFiles
spark.sparkContext.addFil...
Hi @Sandesh Puligundla Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.
I have scala function as below, i am unable to understand how to write a scala jar with the same, please find below code i have used Enforcing Column-Level Encryption - Databrick %scala import com.macasaet.fernet.{Key, StringValidator, Token}import o...
Hi All,When i try to run a scala UDF in Azuredatabricks 10.1 (includes Apache Spark 3.2.0, Scala 2.12) cluster i was able to run the udf. However when i tried to run the same notebook in 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12) cluster i ha...
We are migrating our Scala jobs from AWS EMR (6.2.1 and Spark version - 3.0.1) to Lakehouse and few of our jobs are failing due to NullPointerException. We tried in Databricks Runtime 7.3 LTS, it is working fine. Because it had same spark version 3.0...
In one of my code statements, I updated scala Boolean to java.lang.Boolean and this is working fine now. May be in new newer Spark versions, null in scala Boolean isn't supported.
I create a function based on Java MaskFormatter function in Databricks/Scala.But when I call it from sparksql, I received error messageError in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/t...
@Tim zhang :The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register t...
Hi Team, I am unable to connect Storage account with scala in Databricks, getting bellow error.AbfsRestOperationException: Status code: -1 error code: null error message: Cannot resolve hostname: ptazsg5gfcivcrstrlrs.dfs.core.windows.netCaused by: Un...
@Bhagwan Chaubey :The error message suggests that the hostname for your Azure Storage account could not be resolved. This could happen if there is a network issue, or if the hostname is incorrect.Here are some steps you can try to resolve the issue:...
Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df
val rand = new scala.util.Random
val df = (1 to 3000).map(i => (r...
Hi @Jerry Xu Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...