Databricks Community

pramalin · ‎01-30-2023

daniel_sahal · ‎01-30-2023

@prudhvi ramalingam

Here is example: https://stackoverflow.com/a/61029482

Nhan_Nguyen · ‎01-31-2023

@prudhvi ramalingam you could refer to this link: https://sparkbyexamples.com/spark/spark-sql-join-on-multiple-columns/

shan_chandra · ‎01-31-2023

@prudhvi ramalingam - Please refer to the below example code.

import org.apache.spark.sql.functions.expr
val person = Seq(
    (0, "Bill Chambers", 0, Seq(100)),
    (1, "Matei Zaharia", 1, Seq(500, 250, 100)),
    (2, "Michael Armbrust", 1, Seq(250, 100)))
  .toDF("id", "name", "graduate_program", "spark_status")
 
val graduateProgram = Seq(
    (0, "Masters", "School of Information", "UC Berkeley"),
    (2, "Masters", "EECS", "UC Berkeley"),
    (1, "Ph.D.", "EECS", "UC Berkeley"))
  .toDF("id", "degree", "department", "school")
 
val sparkStatus = Seq(
    (500, "Vice President"),
    (250, "PMC Member"),
    (100, "Contributor"))
  .toDF("id", "status")
 
person
  .withColumnRenamed("id", "personId")
  .join(sparkStatus, expr("array_contains(spark_status, id)"))
  .show()

Databricks Community

How to perform Inner join using withcolumn

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples