cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Select dataframe columns from a sequence of string

Jean-FrancoisRa
New Contributor

Is there a simple way to select columns from a dataframe with a sequence of string?

Something like

val colNames = Seq("c1", "c2")
df.select(colNames)

1 ACCEPTED SOLUTION

Accepted Solutions

JongKim
New Contributor III

I also had the same problem, and here's how to make it work using column type and varargs:

// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")

// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")

// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed

// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()

This should yield as expected:

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  4|  5|
|  7|  8|
+---+---+

View solution in original post

2 REPLIES 2

JongKim
New Contributor III

I also had the same problem, and here's how to make it work using column type and varargs:

// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")

// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")

// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed

// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()

This should yield as expected:

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  4|  5|
|  7|  8|
+---+---+

vEdwardpc
New Contributor II

Thanks. I needed to modify the final lines.

val df_new = df.select(column_names_col:_*)
df_new.show()

Edward

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group