cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Rani
by New Contributor
  • 7868 Views
  • 2 replies
  • 0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

  • 7868 Views
  • 2 replies
  • 0 kudos
Latest Reply
subham0611
New Contributor II
  • 0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

  • 0 kudos
1 More Replies
Kaniz_Fatma
by Community Manager
  • 1237 Views
  • 1 replies
  • 1 kudos
  • 1237 Views
  • 1 replies
  • 1 kudos
Latest Reply
saipujari_spark
Valued Contributor
  • 1 kudos

Yes, we can concat() and concat_ws() inbuilt functions.concat - usage> SELECT concat('Spark', 'SQL'); SparkSQLconcat_ws - usage - concatenates with a separator SELECT concat_ws(' -', 'Spark', 'SQL'); Spark-SQLReference: https://spark.apache.org/do...

  • 1 kudos
Labels