Data Engineering

Forum Posts

Sorted by:

by qwerty1 • Contributor

03-23-2023 5:46:15 AM

6077 Views
7 replies
19 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

Data Engineering

6077 Views
7 replies
19 kudos

03-23-2023 5:46:15 AM

View Replies

Latest Reply

guersam
New Contributor II

11-11-2024 1:13:41 AM

19 kudos

I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...

19 kudos

11-11-2024 1:13:41 AM

6 More Replies

by naveenreddy1 • New Contributor II

11-21-2019 8:40:58 PM

18661 Views
4 replies
0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

Data Engineering

18661 Views
4 replies
0 kudos

11-21-2019 8:40:58 PM

View Replies

Latest Reply

RodrigoDe_Freit
New Contributor II

12-10-2019 11:55:58 AM

0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

0 kudos

12-10-2019 11:55:58 AM

3 More Replies

by Rani • New Contributor

11-23-2016 8:27:33 AM

9275 Views
2 replies
0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

Data Engineering

9275 Views
2 replies
0 kudos

11-23-2016 8:27:33 AM

View Replies

Latest Reply

subham0611
New Contributor II

10-27-2023 2:02:07 AM

0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

0 kudos

10-27-2023 2:02:07 AM

1 More Replies

by PraveenSaini • New Contributor

05-07-2019 5:14:16 AM

152818 Views
34 replies
6 kudos

How to read excel file using databricks

0 I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io....

Data Engineering

152818 Views
34 replies
6 kudos

05-07-2019 5:14:16 AM

View Replies

Latest Reply

Jenish_lodha
New Contributor II

09-15-2023 1:32:31 AM

6 kudos

To read an Excel file using Databricks, you can use the Databricks runtime, which supports multiple programming languages such as Python, Scala, and R. Here are the general steps to read an Excel file in Databricks using Python:1. **Upload the Excel ...

6 kudos

09-15-2023 1:32:31 AM

33 More Replies

by rammy • Contributor III

11-21-2022 10:41:03 PM

1996 Views
1 replies
5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

Data Engineering

1996 Views
1 replies
5 kudos

11-21-2022 10:41:03 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-03-2022 12:57:24 AM

5 kudos

Hi @Ramesh Bathini In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

5 kudos

12-03-2022 12:57:24 AM

by RiyazAli • Valued Contributor II

11-09-2022 6:59:13 AM

7044 Views
3 replies
7 kudos

Resolved! Converting a transformation written in Spark Scala to PySpark

Hello all,I've been tasked to convert a Scala Spark code to PySpark code with minimal changes (kinda literal translation).I've come across some code that claims to be a list comprehension. Look below for code snippet:%scala val desiredColumn = Seq("f...

Data Engineering

7044 Views
3 replies
7 kudos

11-09-2022 6:59:13 AM

View Replies

Latest Reply

RiyazAli
Valued Contributor II

11-10-2022 2:43:37 AM

7 kudos

Another follow-up question, if you don't mind. @Pat Sienkiewicz As I was trying to parse the name column into multiple columns. I came across the data below:("James,\"A,B\", Smith", "2018", "M", 3000)In order to parse these comma-included middle na...

7 kudos

11-10-2022 2:43:37 AM

2 More Replies

by Databach • New Contributor

03-24-2022 2:34:46 AM

4202 Views
0 replies
0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

Data Engineering

4202 Views
0 replies
0 kudos

03-24-2022 2:34:46 AM

by User15787040559 • Databricks Employee

06-07-2021 9:02:46 AM

2378 Views
1 replies
1 kudos

How can I get Databricks notebooks to stop cutting off the explain plans?

(since Spark 3.0)Dataset.queryExecution.debug.toFilewill dump the full plan to a file, without concatenating the output as a fully materialized Java string in memory.

Data Engineering

2378 Views
1 replies
1 kudos

06-07-2021 9:02:46 AM

View Replies

Latest Reply

dazfuller
Contributor III

09-28-2021 12:16:03 PM

1 kudos

Notebooks really aren't the best method of viewing large files. Two methods you could employ areSave the file to dbfs and then use databricks CLI to download the fileUse the web terminalIn the web terminal option you can do something like "cat my_lar...

1 kudos

09-28-2021 12:16:03 PM

by pepevo • New Contributor III

02-10-2020 7:23:36 AM

15655 Views
10 replies
0 kudos

Resolved! How to convert column type from decimal to date in sparksql

I need to convert column type from decimal to date in sparksql when the format is not yyyy-mm-dd? A table contains column data declared as decimal (38,0) and data is in yyyymmdd format and I am unable to run sql queries on it in databrick notebook. ...

Data Engineering

15655 Views
10 replies
0 kudos

02-10-2020 7:23:36 AM

View Replies

Latest Reply

pepevo
New Contributor III

02-13-2020 11:35:35 AM

0 kudos

thank you Tom. I made it work already.

0 kudos

02-13-2020 11:35:35 AM

9 More Replies

by Pierrek20 • New Contributor

10-11-2018 4:59:22 AM

15689 Views
2 replies
0 kudos

How to loop over spark dataframe with scala ?

Hello ! I 'm rookie to spark scala, here is my problem : tk's in advance for your help my input dataframe looks like this : index bucket time ap station rssi 0 1 00:00 1 1 -84.0 1 1 00:00 1 3 -67.0 2 1 00:00 1 4 -82.0 3 1 00:00 1 2 -68.0 4 1 00:00...

Data Engineering

15689 Views
2 replies
0 kudos

10-11-2018 4:59:22 AM

View Replies

Latest Reply

Eve
New Contributor III

11-19-2019 1:53:57 AM

0 kudos

Looping is not always necessary, I always use this foreach method, something like the following: aps.collect().foreach(row => <do something>)

0 kudos

11-19-2019 1:53:57 AM

1 More Replies

by SwapanSwapandee • New Contributor II

10-26-2019 8:28:02 PM

8415 Views
2 replies
0 kudos

How to pass column names in selectExpr through one or more string parameters in spark using scala?

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as...

Data Engineering

8415 Views
2 replies
0 kudos

10-26-2019 8:28:02 PM

View Replies

Latest Reply

shyam_9
Databricks Employee

10-28-2019 10:40:48 PM

0 kudos

Hi @Swapan Swapandeep Marwaha, Can you pass them as a Seq as in below code, keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols")

0 kudos

10-28-2019 10:40:48 PM

1 More Replies

by _not_provid1755 • New Contributor

03-18-2019 4:42:57 PM

6710 Views
3 replies
0 kudos

Write empty dataframe into csv

I'm writing my output (entity) data frame into csv file. Below statement works well when the data frame is non-empty. entity.repartition(1).write.mode(SaveMode.Overwrite).format("csv").option("header", "true").save(tempLocation) It's not working wh...

Data Engineering

6710 Views
3 replies
0 kudos

03-18-2019 4:42:57 PM

View Replies

Latest Reply

mrnov
New Contributor II

05-07-2019 7:23:29 AM

0 kudos

the same problem here (similar code and the same behavior with Spark 2.4.0, running with spark submit on Win and on Lin) dataset.coalesce(1) .write() .option("charset", "UTF-8") .option("header", "true") .mode(SaveMod...

0 kudos

05-07-2019 7:23:29 AM

2 More Replies

by XinZodl • New Contributor III

11-03-2017 12:01:16 AM

16329 Views
3 replies
1 kudos

Resolved! How to parse a file with newline character, escaped with \ and not quoted

Hi! I am facing an issue when reading and parsing a CSV file. Some records have a newline symbol, "escaped" by a \, and that record not being quoted. The file might look like this: Line1field1;Line1field2.1 \ Line1field2.2;Line1field3; Line2FIeld1;...

Data Engineering

16329 Views
3 replies
1 kudos

11-03-2017 12:01:16 AM

View Replies

Latest Reply

XinZodl
New Contributor III

11-07-2017 11:59:09 PM

1 kudos

Solution is "sparkContext.wholeTextFiles"

1 kudos

11-07-2017 11:59:09 PM

2 More Replies