cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Select dataframe columns from a sequence of string

Jean-FrancoisRa
New Contributor

Is there a simple way to select columns from a dataframe with a sequence of string?

Something like

val colNames = Seq("c1", "c2")
df.select(colNames)

1 ACCEPTED SOLUTION

Accepted Solutions

JongKim
New Contributor III

I also had the same problem, and here's how to make it work using column type and varargs:

// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")

// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")

// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed

// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()

This should yield as expected:

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  4|  5|
|  7|  8|
+---+---+

View solution in original post

2 REPLIES 2

JongKim
New Contributor III

I also had the same problem, and here's how to make it work using column type and varargs:

// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")

// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")

// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed

// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()

This should yield as expected:

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  4|  5|
|  7|  8|
+---+---+

vEdwardpc
New Contributor II

Thanks. I needed to modify the final lines.

val df_new = df.select(column_names_col:_*)
df_new.show()

Edward

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now