- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2015 01:10 PM
Is there a simple way to select columns from a dataframe with a sequence of string?
Something like
val colNames = Seq("c1", "c2")
df.select(colNames)
- Labels:
-
Dataframe
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2015 06:54 PM
I also had the same problem, and here's how to make it work using column type and varargs:
// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")
// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed
// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()
This should yield as expected:
+---+---+
| a| b|
+---+---+
| 1| 2|
| 4| 5|
| 7| 8|
+---+---+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2015 06:54 PM
I also had the same problem, and here's how to make it work using column type and varargs:
// make example dataframe import org.apache.spark.sql.DataFrame val df: DataFrame = sc.parallelize(Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))).toDF("a", "b", "c")// desired list of column names in string (making it possible programmatically) val column_names_str = Seq[String]("a", "b")
// construct list of column names in column type import org.apache.spark.sql.functions.col val column_names_col = column_names_str.map(name => col(name)) //val column_names_col = column_names_str.map(name => col(name).as(s"renamed_$name")) // rename if needed
// select specific columns from dataframe using varargs syntax * val df_new = df.select(column_names_col: *) df_new.show()
This should yield as expected:
+---+---+
| a| b|
+---+---+
| 1| 2|
| 4| 5|
| 7| 8|
+---+---+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2016 01:21 PM
Thanks. I needed to modify the final lines.
val df_new = df.select(column_names_col:_*)
df_new.show()
Edward

