cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Select dataframe columns from a sequence of string

Jean-FrancoisRa
New Contributor

Is there a simple way to select columns from a dataframe with a sequence of string?

Something like

val colNames = Seq("c1", "c2")
df.select(colNames)

1 ACCEPTED SOLUTION

Accepted Solutions

JongKim
New Contributor III

I also had the same problem, and here's how to make it work using column type and varargs:

// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")

// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")

// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed

// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()

This should yield as expected:

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  4|  5|
|  7|  8|
+---+---+

View solution in original post

2 REPLIES 2

JongKim
New Contributor III

I also had the same problem, and here's how to make it work using column type and varargs:

// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")

// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")

// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed

// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()

This should yield as expected:

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  4|  5|
|  7|  8|
+---+---+

vEdwardpc
New Contributor II

Thanks. I needed to modify the final lines.

val df_new = df.select(column_names_col:_*)
df_new.show()

Edward

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.