cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

qwerty1
by Contributor
  • 6415 Views
  • 7 replies
  • 19 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

  • 6415 Views
  • 7 replies
  • 19 kudos
Latest Reply
guersam
New Contributor II
  • 19 kudos

I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...

  • 19 kudos
6 More Replies
naveenreddy1
by New Contributor II
  • 18864 Views
  • 4 replies
  • 0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

  • 18864 Views
  • 4 replies
  • 0 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

  • 0 kudos
3 More Replies
Rani
by New Contributor
  • 9477 Views
  • 2 replies
  • 0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

  • 9477 Views
  • 2 replies
  • 0 kudos
Latest Reply
subham0611
New Contributor II
  • 0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

  • 0 kudos
1 More Replies
PraveenSaini
by New Contributor
  • 156006 Views
  • 34 replies
  • 6 kudos

How to read excel file using databricks

0 I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io....

  • 156006 Views
  • 34 replies
  • 6 kudos
Latest Reply
Jenish_lodha
New Contributor II
  • 6 kudos

To read an Excel file using Databricks, you can use the Databricks runtime, which supports multiple programming languages such as Python, Scala, and R. Here are the general steps to read an Excel file in Databricks using Python:1. **Upload the Excel ...

  • 6 kudos
33 More Replies
rammy
by Contributor III
  • 2117 Views
  • 1 replies
  • 5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

error screenshot Jar dependencies
  • 2117 Views
  • 1 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @Ramesh Bathini​ In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

  • 5 kudos
aayrm5
by Honored Contributor
  • 7418 Views
  • 3 replies
  • 7 kudos

Resolved! Converting a transformation written in Spark Scala to PySpark

Hello all,I've been tasked to convert a Scala Spark code to PySpark code with minimal changes (kinda literal translation).I've come across some code that claims to be a list comprehension. Look below for code snippet:%scala val desiredColumn = Seq("f...

  • 7418 Views
  • 3 replies
  • 7 kudos
Latest Reply
aayrm5
Honored Contributor
  • 7 kudos

Another follow-up question, if you don't mind. @Pat Sienkiewicz​ As I was trying to parse the name column into multiple columns. I came across the data below:("James,\"A,B\", Smith", "2018", "M", 3000)In order to parse these comma-included middle na...

  • 7 kudos
2 More Replies
Databach
by New Contributor
  • 4414 Views
  • 0 replies
  • 0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

image build.sbt
  • 4414 Views
  • 0 replies
  • 0 kudos
User15787040559
by Databricks Employee
  • 2492 Views
  • 1 replies
  • 1 kudos

How can I get Databricks notebooks to stop cutting off the explain plans?

(since Spark 3.0)Dataset.queryExecution.debug.toFilewill dump the full plan to a file, without concatenating the output as a fully materialized Java string in memory.

  • 2492 Views
  • 1 replies
  • 1 kudos
Latest Reply
dazfuller
Contributor III
  • 1 kudos

Notebooks really aren't the best method of viewing large files. Two methods you could employ areSave the file to dbfs and then use databricks CLI to download the fileUse the web terminalIn the web terminal option you can do something like "cat my_lar...

  • 1 kudos
pepevo
by New Contributor III
  • 16267 Views
  • 10 replies
  • 0 kudos

Resolved! How to convert column type from decimal to date in sparksql

I need to convert column type from decimal to date in sparksql when the format is not yyyy-mm-dd? A table contains column data declared as decimal (38,0) and data is in yyyymmdd format and I am unable to run sql queries on it in databrick notebook. ...

  • 16267 Views
  • 10 replies
  • 0 kudos
Latest Reply
pepevo
New Contributor III
  • 0 kudos

thank you Tom. I made it work already.

  • 0 kudos
9 More Replies
Pierrek20
by New Contributor
  • 15939 Views
  • 2 replies
  • 0 kudos

How to loop over spark dataframe with scala ?

Hello ! I 'm rookie to spark scala, here is my problem : tk's in advance for your help my input dataframe looks like this : index bucket time ap station rssi 0 1 00:00 1 1 -84.0 1 1 00:00 1 3 -67.0 2 1 00:00 1 4 -82.0 3 1 00:00 1 2 -68.0 4 1 00:00...

  • 15939 Views
  • 2 replies
  • 0 kudos
Latest Reply
Eve
New Contributor III
  • 0 kudos

Looping is not always necessary, I always use this foreach method, something like the following: aps.collect().foreach(row => <do something>)

  • 0 kudos
1 More Replies
SwapanSwapandee
by New Contributor II
  • 8584 Views
  • 2 replies
  • 0 kudos

How to pass column names in selectExpr through one or more string parameters in spark using scala?

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as...

  • 8584 Views
  • 2 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Swapan Swapandeep Marwaha, Can you pass them as a Seq as in below code, keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols")

  • 0 kudos
1 More Replies
_not_provid1755
by New Contributor
  • 6903 Views
  • 3 replies
  • 0 kudos

Write empty dataframe into csv

I'm writing my output (entity) data frame into csv file. Below statement works well when the data frame is non-empty. entity.repartition(1).write.mode(SaveMode.Overwrite).format("csv").option("header", "true").save(tempLocation) It's not working wh...

  • 6903 Views
  • 3 replies
  • 0 kudos
Latest Reply
mrnov
New Contributor II
  • 0 kudos

the same problem here (similar code and the same behavior with Spark 2.4.0, running with spark submit on Win and on Lin) dataset.coalesce(1) .write() .option("charset", "UTF-8") .option("header", "true") .mode(SaveMod...

  • 0 kudos
2 More Replies
XinZodl
by New Contributor III
  • 17056 Views
  • 3 replies
  • 1 kudos

Resolved! How to parse a file with newline character, escaped with \ and not quoted

Hi! I am facing an issue when reading and parsing a CSV file. Some records have a newline symbol, "escaped" by a \, and that record not being quoted. The file might look like this: Line1field1;Line1field2.1 \ Line1field2.2;Line1field3; Line2FIeld1;...

  • 17056 Views
  • 3 replies
  • 1 kudos
Latest Reply
XinZodl
New Contributor III
  • 1 kudos

Solution is "sparkContext.wholeTextFiles"

  • 1 kudos
2 More Replies
Labels