cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RiyazAli
by Valued Contributor
  • 5506 Views
  • 3 replies
  • 7 kudos

Resolved! Converting a transformation written in Spark Scala to PySpark

Hello all,I've been tasked to convert a Scala Spark code to PySpark code with minimal changes (kinda literal translation).I've come across some code that claims to be a list comprehension. Look below for code snippet:%scala val desiredColumn = Seq("f...

  • 5506 Views
  • 3 replies
  • 7 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 7 kudos

Another follow-up question, if you don't mind. @Pat Sienkiewicz​ As I was trying to parse the name column into multiple columns. I came across the data below:("James,\"A,B\", Smith", "2018", "M", 3000)In order to parse these comma-included middle na...

  • 7 kudos
2 More Replies
suman9872
by New Contributor II
  • 1675 Views
  • 1 replies
  • 1 kudos

How to dynamically convert Spark DataFrame to Nested json using Spark Scala

I want to convert the DataFrame to nested json. Sourse Data:-DataFrame have data value like :- As image 2 Expected Output:-I have to convert DataFrame value to Nested Json like : -As image 1Appreciate your help !

  • 1675 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Suman Mishra​, This article explains how to convert a flattened DataFrame to a nested structure by nesting a case class within another case class.You can use this technique to build a JSON file that can then be sent to an external API.

  • 1 kudos
sannycse
by New Contributor II
  • 1386 Views
  • 2 replies
  • 3 kudos

Resolved! display password as shown in example using spark scala

Table has the following Columns:First_Name, Last_Name, Department_Id,Contact_No, Hire_DateDisplay the emplopyee First_name, Count of Characters in the firstname,password.Password should be first 4 letters of first name in lower case and the date and ...

  • 1386 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

@SANJEEV BANDRU​ , SELECT CONCAT(substring(First_Name, 0, 2) , substring(Hire_Date, 0, 2), substring(Hire_Date, 3, 2)) as password FROM table;If Hire_date is timestamp you may need to add date_format()

  • 3 kudos
1 More Replies
paourissi
by New Contributor
  • 8653 Views
  • 2 replies
  • 1 kudos

When to persist and when to unpersist RDD in Spark

Lets say i have the following:<code>val dataset2 = dataset1.persist(StorageLevel.MEMORY_AND_DISK) val dataset3 = dataset2.map(.....)1) 1)If you do a transformation on the dataset2 then you have to persist it and pass it to dataset3 and unpersist ...

  • 8653 Views
  • 2 replies
  • 1 kudos
Latest Reply
Arun_KumarPT
New Contributor II
  • 1 kudos

It is well documented here : http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence

  • 1 kudos
1 More Replies
shampa
by New Contributor
  • 4663 Views
  • 1 replies
  • 0 kudos

How can we compare two dataframes in spark scala to find difference between these 2 files, which column ?? and value ??.

I have two files and I created two dataframes prod1 and prod2 out of it.I need to find the records with column names and values that are not matching in both the dfs. id_sk is the primary key .all the cols are string datatype dataframe 1 (prod1) id_...

  • 4663 Views
  • 1 replies
  • 0 kudos
Latest Reply
manojlukhi
New Contributor II
  • 0 kudos

use full Outer Join in spark SQL

  • 0 kudos
Labels