cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

qwerty1
by Contributor
  • 2602 Views
  • 5 replies
  • 14 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

  • 2602 Views
  • 5 replies
  • 14 kudos
Latest Reply
source2sea
Contributor
  • 14 kudos

I see db runtime 14 is out, but still 2.12, when would databricks plan to support 2.13 or 3  thank you

  • 14 kudos
4 More Replies
Rani
by New Contributor
  • 7508 Views
  • 2 replies
  • 0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

  • 7508 Views
  • 2 replies
  • 0 kudos
Latest Reply
subham0611
New Contributor II
  • 0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

  • 0 kudos
1 More Replies
PraveenSaini
by New Contributor
  • 107580 Views
  • 34 replies
  • 6 kudos

How to read excel file using databricks

0 I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io....

  • 107580 Views
  • 34 replies
  • 6 kudos
Latest Reply
Jenish_lodha
New Contributor II
  • 6 kudos

To read an Excel file using Databricks, you can use the Databricks runtime, which supports multiple programming languages such as Python, Scala, and R. Here are the general steps to read an Excel file in Databricks using Python:1. **Upload the Excel ...

  • 6 kudos
33 More Replies
rammy
by Contributor III
  • 940 Views
  • 1 replies
  • 5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

error screenshot Jar dependencies
  • 940 Views
  • 1 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @Ramesh Bathini​ In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

  • 5 kudos
RiyazAli
by Valued Contributor
  • 4441 Views
  • 3 replies
  • 7 kudos

Resolved! Converting a transformation written in Spark Scala to PySpark

Hello all,I've been tasked to convert a Scala Spark code to PySpark code with minimal changes (kinda literal translation).I've come across some code that claims to be a list comprehension. Look below for code snippet:%scala val desiredColumn = Seq("f...

  • 4441 Views
  • 3 replies
  • 7 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 7 kudos

Another follow-up question, if you don't mind. @Pat Sienkiewicz​ As I was trying to parse the name column into multiple columns. I came across the data below:("James,\"A,B\", Smith", "2018", "M", 3000)In order to parse these comma-included middle na...

  • 7 kudos
2 More Replies
Databach
by New Contributor
  • 2823 Views
  • 0 replies
  • 0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

image build.sbt
  • 2823 Views
  • 0 replies
  • 0 kudos
User15787040559
by New Contributor III
  • 1568 Views
  • 2 replies
  • 1 kudos

How can I get Databricks notebooks to stop cutting off the explain plans?

(since Spark 3.0)Dataset.queryExecution.debug.toFilewill dump the full plan to a file, without concatenating the output as a fully materialized Java string in memory.

  • 1568 Views
  • 2 replies
  • 1 kudos
Latest Reply
dazfuller
Contributor III
  • 1 kudos

Notebooks really aren't the best method of viewing large files. Two methods you could employ areSave the file to dbfs and then use databricks CLI to download the fileUse the web terminalIn the web terminal option you can do something like "cat my_lar...

  • 1 kudos
1 More Replies
pepevo
by New Contributor III
  • 10380 Views
  • 10 replies
  • 0 kudos

Resolved! How to convert column type from decimal to date in sparksql

I need to convert column type from decimal to date in sparksql when the format is not yyyy-mm-dd? A table contains column data declared as decimal (38,0) and data is in yyyymmdd format and I am unable to run sql queries on it in databrick notebook. ...

  • 10380 Views
  • 10 replies
  • 0 kudos
Latest Reply
pepevo
New Contributor III
  • 0 kudos

thank you Tom. I made it work already.

  • 0 kudos
9 More Replies
naveenreddy1
by New Contributor II
  • 16666 Views
  • 3 replies
  • 0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

  • 16666 Views
  • 3 replies
  • 0 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

  • 0 kudos
2 More Replies
Pierrek20
by New Contributor
  • 13032 Views
  • 2 replies
  • 0 kudos

How to loop over spark dataframe with scala ?

Hello ! I 'm rookie to spark scala, here is my problem : tk's in advance for your help my input dataframe looks like this : index bucket time ap station rssi 0 1 00:00 1 1 -84.0 1 1 00:00 1 3 -67.0 2 1 00:00 1 4 -82.0 3 1 00:00 1 2 -68.0 4 1 00:00...

  • 13032 Views
  • 2 replies
  • 0 kudos
Latest Reply
Eve
New Contributor III
  • 0 kudos

Looping is not always necessary, I always use this foreach method, something like the following: aps.collect().foreach(row => <do something>)

  • 0 kudos
1 More Replies
SwapanSwapandee
by New Contributor II
  • 6625 Views
  • 2 replies
  • 0 kudos

How to pass column names in selectExpr through one or more string parameters in spark using scala?

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as...

  • 6625 Views
  • 2 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Swapan Swapandeep Marwaha, Can you pass them as a Seq as in below code, keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols")

  • 0 kudos
1 More Replies
_not_provid1755
by New Contributor
  • 4861 Views
  • 3 replies
  • 0 kudos

Write empty dataframe into csv

I'm writing my output (entity) data frame into csv file. Below statement works well when the data frame is non-empty. entity.repartition(1).write.mode(SaveMode.Overwrite).format("csv").option("header", "true").save(tempLocation) It's not working wh...

  • 4861 Views
  • 3 replies
  • 0 kudos
Latest Reply
mrnov
New Contributor II
  • 0 kudos

the same problem here (similar code and the same behavior with Spark 2.4.0, running with spark submit on Win and on Lin) dataset.coalesce(1) .write() .option("charset", "UTF-8") .option("header", "true") .mode(SaveMod...

  • 0 kudos
2 More Replies
XinZodl
by New Contributor III
  • 9850 Views
  • 3 replies
  • 1 kudos

Resolved! How to parse a file with newline character, escaped with \ and not quoted

Hi! I am facing an issue when reading and parsing a CSV file. Some records have a newline symbol, "escaped" by a \, and that record not being quoted. The file might look like this: Line1field1;Line1field2.1 \ Line1field2.2;Line1field3; Line2FIeld1;...

  • 9850 Views
  • 3 replies
  • 1 kudos
Latest Reply
XinZodl
New Contributor III
  • 1 kudos

Solution is "sparkContext.wholeTextFiles"

  • 1 kudos
2 More Replies
Labels