Databricks

SohelKhan · ‎02-21-2016

In SQL select, in some implementation, we can provide select -col_A to select all columns except the col_A.

I tried it in the Spark 1.6.0 as follows:

For a dataframe df with three columns col_A, col_B, col_C

df.select('col_B, 'col_C') # it works

df.select(-'col_A') # does not work

df.select(*-'col_A') # does not work

Note, I am trying to find the alternative of df.context.sql("select col_B, col_C ... ") in above script.

zjffdu · ‎02-24-2016

I don't think it is supported since it is not sql standard.

LejlaMetohajrov · ‎12-19-2017

cols = list(set(df.columns) - {'col_A'})

df.select(cols)

@Sohel Khan , @zjffdu

NavitaJain · ‎03-25-2020

@sk777, @zjffdu, @Lejla Metohajrova

if your columns are time-series ordered OR you want to maintain their original order... use

cols = [c for c in df.columns if c != 'col_A']

df[cols]

PySpark DataFrame: Select all but one or a set of columns