cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

PySpark DataFrame: Select all but one or a set of columns

SohelKhan
New Contributor II

In SQL select, in some implementation, we can provide select -col_A to select all columns except the col_A.

I tried it in the Spark 1.6.0 as follows:

For a dataframe df with three columns col_A, col_B, col_C

df.select('col_B, 'col_C') # it works

df.select(-'col_A') # does not work

df.select(*-'col_A') # does not work

Note, I am trying to find the alternative of df.context.sql("select col_B, col_C ... ") in above script.

3 REPLIES 3

zjffdu
New Contributor II

I don't think it is supported since it is not sql standard.

LejlaMetohajrov
New Contributor II

cols = list(set(df.columns) - {'col_A'})

df.select(cols)

@Sohel Khan​ , @zjffdu​ 

NavitaJain
New Contributor II

@sk777, @zjffdu, @Lejla Metohajrova

if your columns are time-series ordered OR you want to maintain their original order... use

cols = [c for c in df.columns if c != 'col_A']

df[cols]

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.