<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: PySpark DataFrame: Select all but one or a set of columns in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29845#M21546</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@sk777, @zjffdu, @Lejla Metohajrova&lt;/P&gt;
&lt;P&gt;if your columns are time-series ordered OR you want to maintain their original order... use&lt;/P&gt;
&lt;P&gt;cols = [c for c in df.columns if c != 'col_A']&lt;/P&gt;
&lt;P&gt;df[cols]&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 25 Mar 2020 23:21:12 GMT</pubDate>
    <dc:creator>NavitaJain</dc:creator>
    <dc:date>2020-03-25T23:21:12Z</dc:date>
    <item>
      <title>PySpark DataFrame: Select all but one or a set of columns</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29842#M21543</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;In SQL select, in some implementation, we can provide select -col_A to select all columns except the col_A.&lt;/P&gt;
&lt;P&gt;I tried it in the Spark 1.6.0 as follows:&lt;/P&gt;
&lt;P&gt;For a dataframe df with three columns col_A, col_B, col_C&lt;/P&gt;
&lt;P&gt;df.select('col_B, 'col_C') # it works&lt;/P&gt;
&lt;P&gt;df.select(-'col_A') # does not work&lt;/P&gt;
&lt;P&gt;df.select(*-'col_A') # does not work&lt;/P&gt;
&lt;P&gt;Note, I am trying to find the alternative of df.context.sql("select col_B, col_C ... ") in above script.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Feb 2016 06:27:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29842#M21543</guid>
      <dc:creator>SohelKhan</dc:creator>
      <dc:date>2016-02-22T06:27:36Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark DataFrame: Select all but one or a set of columns</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29843#M21544</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I don't think it is supported since it is not sql standard. &lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Feb 2016 06:08:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29843#M21544</guid>
      <dc:creator>zjffdu</dc:creator>
      <dc:date>2016-02-25T06:08:43Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark DataFrame: Select all but one or a set of columns</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29844#M21545</link>
      <description>&lt;P&gt;cols = list(set(df.columns) - {'col_A'})&lt;/P&gt;&lt;P&gt;df.select(cols)&lt;/P&gt;&lt;P&gt; @Sohel Khan​&amp;nbsp;, @zjffdu​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Dec 2017 11:57:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29844#M21545</guid>
      <dc:creator>LejlaMetohajrov</dc:creator>
      <dc:date>2017-12-19T11:57:00Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark DataFrame: Select all but one or a set of columns</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29845#M21546</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@sk777, @zjffdu, @Lejla Metohajrova&lt;/P&gt;
&lt;P&gt;if your columns are time-series ordered OR you want to maintain their original order... use&lt;/P&gt;
&lt;P&gt;cols = [c for c in df.columns if c != 'col_A']&lt;/P&gt;
&lt;P&gt;df[cols]&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Mar 2020 23:21:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-select-all-but-one-or-a-set-of-columns/m-p/29845#M21546</guid>
      <dc:creator>NavitaJain</dc:creator>
      <dc:date>2020-03-25T23:21:12Z</dc:date>
    </item>
  </channel>
</rss>

