<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18591#M12364</link>
    <description>&lt;P&gt;AttributeError: 'DataFrame' object has no attribute 'ColumnChecker'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def func_udf(df,col):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;column =list(df.columns)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;if col in column:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return df.col&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df.withColumn("col", lit(null))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.udf.register("ColumnChecker",func_udf)&lt;/P&gt;&lt;P&gt;dfg=a.ColumnChecker(a,ref_date)&lt;/P&gt;&lt;P&gt;dfg.show()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;this is the code i am running&lt;/B&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 07 Jun 2022 11:54:10 GMT</pubDate>
    <dc:creator>cuteabhi32</dc:creator>
    <dc:date>2022-06-07T11:54:10Z</dc:date>
    <item>
      <title>Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18584#M12357</link>
      <description>&lt;P&gt;from pyspark import SparkContext&lt;/P&gt;&lt;P&gt;from pyspark import SparkConf&lt;/P&gt;&lt;P&gt;from pyspark.sql.types import *&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import *&lt;/P&gt;&lt;P&gt;from pyspark.sql import *&lt;/P&gt;&lt;P&gt;from pyspark.sql.types import StringType&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import udf&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df1 = spark.read.format("csv").option("header","true").load("file:///home/cloudera/data/a.csv")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def func_udf(df,col):&lt;/P&gt;&lt;P&gt;	column = list(df.columns)&lt;/P&gt;&lt;P&gt;	if (col in column):&lt;/P&gt;&lt;P&gt; return df.col&lt;/P&gt;&lt;P&gt;	else:&lt;/P&gt;&lt;P&gt; return NULL&lt;/P&gt;&lt;P&gt;spark.udf.register("columncheck",func_udf)	&lt;/P&gt;&lt;P&gt;resultdf=df1.withColumn("ref_date",expr("CASE WHEN Flag = 3 THEN '06MAY2022' ELSE columncheck(df1,ref_date) END")) &lt;/P&gt;&lt;P&gt;resultdf.show()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;this is the code i am trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF and its throwing error for udf not sure what i am doing wrong.kindly help how to resolve error as resultdf dataframe is throwing below error&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Traceback (most recent call last):&lt;/P&gt;&lt;P&gt;&amp;nbsp;File "&amp;lt;stdin&amp;gt;", line 1, in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;File "/usr/local/spark/python/pyspark/sql/dataframe.py", line 1849, in withColumn&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)&lt;/P&gt;&lt;P&gt;&amp;nbsp;File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__&lt;/P&gt;&lt;P&gt;&amp;nbsp;File "/usr/local/spark/python/pyspark/sql/utils.py", line 69, in deco&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;raise AnalysisException(s.split(': ', 1)[1], stackTrace)&lt;/P&gt;&lt;P&gt;pyspark.sql.utils.AnalysisException: "cannot resolve '`df1`' given input columns: [ref_date, Salary_qtr4, Salary_qtr3, Flag, Name, Salary_qtr1, Salary, Std, Salary_qtr2, Id, Mean]; line 1 pos 53;\n'Project [Id#10, Name#11, Salary#12, CASE WHEN (cast(Flag#20 as int) = 3) THEN 06MAY2022 ELSE 'columncheck('df1, ref_date#13) END AS ref_date#35, Salary_qtr1#14, Salary_qtr2#15, Salary_qtr3#16, Salary_qtr4#17, Mean#18, Std#19, Flag#20]\n+- AnalysisBarrier\n&amp;nbsp;&amp;nbsp;&amp;nbsp;+- Relation&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jun 2022 15:17:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18584#M12357</guid>
      <dc:creator>cuteabhi32</dc:creator>
      <dc:date>2022-06-06T15:17:54Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18585#M12358</link>
      <description>&lt;P&gt;try to return a column with content = null instead of plain null.&lt;/P&gt;&lt;P&gt;So:&lt;/P&gt;&lt;P&gt;	if (col in column):&lt;/P&gt;&lt;P&gt; return df&lt;/P&gt;&lt;P&gt;	else:&lt;/P&gt;&lt;P&gt; df.withColumn("col", lit(null))&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 08:18:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18585#M12358</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-06-07T08:18:20Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18586#M12359</link>
      <description>&lt;P&gt;Hi Werners thanks for the input but still its throwing error&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;AnalysisException: Column 'a' does not exist. Did you mean one of the following? [a.id, a.std, a.Dept, a.flag, a.mean, a.salary, a.neg_std, a.new_var, a.ref_date, a.mean20perc, a.neg_mean20perc]; line 1 pos 57;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;this is my updated code i am using in case statement .Kindly help&lt;/B&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def func_udf(df,col):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;column =list(df.columns)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;for col in column:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return col&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;lit(NULL)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.udf.register("ColumnChecker",func_udf)&lt;/P&gt;&lt;P&gt;a = a.withColumn('ref_date',expr(f"CASE WHEN&amp;nbsp;flag = 3&amp;nbsp;THEN '{sf_start_dt}' ELSE ColumnChecker(a,ref_date) END"))&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 08:48:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18586#M12359</guid>
      <dc:creator>cuteabhi32</dc:creator>
      <dc:date>2022-06-07T08:48:50Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18587#M12360</link>
      <description>&lt;P&gt;ok so you have this columns list (df.columns).&lt;/P&gt;&lt;P&gt;If you then do:&lt;/P&gt;&lt;P&gt;if "columnName" in columns: return df&lt;/P&gt;&lt;P&gt;else return df.withColumn("col", lit(null))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You do not need to loop.&lt;/P&gt;&lt;P&gt;In your loop you return col, not a df. You want to return the complete dataframe with an optional extra column,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 09:19:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18587#M12360</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-06-07T09:19:54Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18588#M12361</link>
      <description>&lt;P&gt;from pyspark.sql.functions import *&amp;nbsp;&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import col&lt;/P&gt;&lt;P&gt;import pyspark.sql.functions as F&lt;/P&gt;&lt;P&gt;from datetime import date,datetime&lt;/P&gt;&lt;P&gt;import time&amp;nbsp;&lt;/P&gt;&lt;P&gt;from dateutil.relativedelta import relativedelta&lt;/P&gt;&lt;P&gt;from dateutil import parser&lt;/P&gt;&lt;P&gt;from pyspark.sql.window import Window&lt;/P&gt;&lt;P&gt;from pyspark.sql.types import *&lt;/P&gt;&lt;P&gt;import locale&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# ---- merge statements ----&amp;nbsp;&lt;/P&gt;&lt;P&gt;df1 = spark.read.format("csv").option("header", "true").load("dbfs:/FileStore/shared_uploads/a.csv")&lt;/P&gt;&lt;P&gt;df1.show()&lt;/P&gt;&lt;P&gt;df2 = spark.read.format("csv").option("header", "true").load("dbfs:/FileStore/shared_uploads/b.csv")&lt;/P&gt;&lt;P&gt;df2.show()&lt;/P&gt;&lt;P&gt;# ---- merge statements ----&amp;nbsp;&lt;/P&gt;&lt;P&gt;sf_start_dt = '02MAY2022'&lt;/P&gt;&lt;P&gt;a=df1&lt;/P&gt;&lt;P&gt;a = a.filter(col("ref_date") == f"{sf_start_dt}")&lt;/P&gt;&lt;P&gt;a = a.drop('name')&lt;/P&gt;&lt;P&gt;a = a.select('id','salary','ref_date','std','mean',)&lt;/P&gt;&lt;P&gt;b = df2&lt;/P&gt;&lt;P&gt;a.createOrReplaceTempView("a")&lt;/P&gt;&lt;P&gt;b.createOrReplaceTempView("b")&lt;/P&gt;&lt;P&gt;a = a.join(b,'id',"outer")&lt;/P&gt;&lt;P&gt;df1 = spark.sql("select &lt;A href="https://tbl1.id" alt="https://tbl1.id" target="_blank"&gt;tbl1.id&lt;/A&gt;,(select 1) as tempCol1&amp;nbsp;from a tbl1 inner join b tbl2 on &lt;A href="https://tbl1.id" alt="https://tbl1.id" target="_blank"&gt;tbl1.id&lt;/A&gt; = &lt;A href="https://tbl2.id" alt="https://tbl2.id" target="_blank"&gt;tbl2.id&lt;/A&gt;")&lt;/P&gt;&lt;P&gt;df2&amp;nbsp;= spark.sql("select &lt;A href="https://tbl1.id" alt="https://tbl1.id" target="_blank"&gt;tbl1.id&lt;/A&gt;,(select 2) as tempCol2&amp;nbsp;from a tbl1 left join b tbl2 on &lt;A href="https://tbl1.id" alt="https://tbl1.id" target="_blank"&gt;tbl1.id&lt;/A&gt; = &lt;A href="https://tbl2.id" alt="https://tbl2.id" target="_blank"&gt;tbl2.id&lt;/A&gt; where &lt;A href="https://tbl2.id" alt="https://tbl2.id" target="_blank"&gt;tbl2.id&lt;/A&gt; is null")&lt;/P&gt;&lt;P&gt;df3&amp;nbsp;= spark.sql("select &lt;A href="https://tbl1.id" alt="https://tbl1.id" target="_blank"&gt;tbl1.id&lt;/A&gt;,(select 3) as tempCol3&amp;nbsp;from b tbl1 left join a tbl2 on &lt;A href="https://tbl1.id" alt="https://tbl1.id" target="_blank"&gt;tbl1.id&lt;/A&gt; = &lt;A href="https://tbl2.id" alt="https://tbl2.id" target="_blank"&gt;tbl2.id&lt;/A&gt; where &lt;A href="https://tbl2.id" alt="https://tbl2.id" target="_blank"&gt;tbl2.id&lt;/A&gt; is null")&lt;/P&gt;&lt;P&gt;a = a.join(df1,'id',"outer").join(df2,'id',"outer").join(df3,'id',"outer")&lt;/P&gt;&lt;P&gt;a = a.na.fill(0,'tempCol1')&lt;/P&gt;&lt;P&gt;a = a.na.fill(0,'tempCol2')&lt;/P&gt;&lt;P&gt;a = a.na.fill(0,'tempCol3')&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;a = a.withColumn('flag', coalesce(col('tempCol1')+col('tempCol2')+col('tempCol3')) )&lt;/P&gt;&lt;P&gt;a = a.drop('tempCol1')&lt;/P&gt;&lt;P&gt;a = a.drop('tempCol2')&lt;/P&gt;&lt;P&gt;a = a.drop('tempCol3')&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;a = a\&lt;/P&gt;&lt;P&gt;.withColumn("neg_std",F.expr(f"(std*(-1))"))&lt;/P&gt;&lt;P&gt;a = a\&lt;/P&gt;&lt;P&gt;.withColumn("mean20perc",F.expr(f"(0.20*mean)"))&lt;/P&gt;&lt;P&gt;a = a\&lt;/P&gt;&lt;P&gt;.withColumn("neg_mean20perc",F.expr(f"(mean20perc*(-1))"))&lt;/P&gt;&lt;P&gt;a = a\&lt;/P&gt;&lt;P&gt;.withColumn("new_var",F.expr(f"'{sf_start_dt}'"))&lt;/P&gt;&lt;P&gt;columnsToDrop = []&lt;/P&gt;&lt;P&gt;selectClause = ''&lt;/P&gt;&lt;P&gt;a.createOrReplaceTempView("a")&lt;/P&gt;&lt;P&gt;a = spark.sql("select * from a")&lt;/P&gt;&lt;P&gt;from pyspark.sql.types import StringType&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import udf&lt;/P&gt;&lt;P&gt;column =list(a.columns)&lt;/P&gt;&lt;P&gt;print(column)&lt;/P&gt;&lt;P&gt;def func_udf(df,col):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;column =list(df.columns)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;if col in column:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return df&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df.withColumn("col", lit(null))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.udf.register("ColumnChecker",func_udf)&lt;/P&gt;&lt;P&gt;a = a.withColumn('ref_date',expr(f"CASE WHEN&amp;nbsp;flag = 3&amp;nbsp;THEN '{sf_start_dt}' ELSE ColumnChecker(a,ref_date) END"))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;a = a.withColumn('balance',expr(f"CASE WHEN&amp;nbsp;flag = 3&amp;nbsp;THEN 0&amp;nbsp;END"))&lt;/P&gt;&lt;P&gt;a.show()&lt;/P&gt;&lt;P&gt;a = a.withColumn('new_col',expr(f"CASE WHEN&amp;nbsp;flag = 3&amp;nbsp;THEN '{sf_start_dt}' END"))&lt;/P&gt;&lt;P&gt;a.show()&lt;/P&gt;&lt;P&gt;work_ppcin_bal2_2019_1 = a&lt;/P&gt;&lt;P&gt;work_ppcin_bal2_2019_1.show()&lt;/P&gt;&lt;P&gt;# ---- end of merge statements ----&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;this is the full fledge code &lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 10:58:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18588#M12361</guid>
      <dc:creator>cuteabhi32</dc:creator>
      <dc:date>2022-06-07T10:58:36Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18589#M12362</link>
      <description>&lt;P&gt;please find attached the files&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 10:59:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18589#M12362</guid>
      <dc:creator>cuteabhi32</dc:creator>
      <dc:date>2022-06-07T10:59:24Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18590#M12363</link>
      <description>&lt;P&gt;Why don't you first make sure the column is present in your df, and then use the CASE?&lt;/P&gt;&lt;P&gt;f.e. df2 = func_udf(df1, col)  #add missing col&lt;/P&gt;&lt;P&gt;df2 = df.withColumn( CASE...)&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 11:07:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18590#M12363</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-06-07T11:07:44Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18591#M12364</link>
      <description>&lt;P&gt;AttributeError: 'DataFrame' object has no attribute 'ColumnChecker'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def func_udf(df,col):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;column =list(df.columns)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;if col in column:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return df.col&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;df.withColumn("col", lit(null))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.udf.register("ColumnChecker",func_udf)&lt;/P&gt;&lt;P&gt;dfg=a.ColumnChecker(a,ref_date)&lt;/P&gt;&lt;P&gt;dfg.show()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;this is the code i am running&lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 11:54:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18591#M12364</guid>
      <dc:creator>cuteabhi32</dc:creator>
      <dc:date>2022-06-07T11:54:10Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18592#M12365</link>
      <description>&lt;P&gt;Let us make abstraction of what you already have written.&lt;/P&gt;&lt;P&gt;If I understand correctly, you need a certain column for a case statement.  But that column may or may not be present, correct?&lt;/P&gt;&lt;P&gt;If so, the proposed approach should work, no need for a function.&lt;/P&gt;&lt;P&gt;If not, can you explain what you try to do?&lt;/P&gt;&lt;P&gt;Because I have the impression that we are not aligned.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 12:31:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18592#M12365</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-06-07T12:31:56Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18593#M12366</link>
      <description>&lt;P&gt;yes your understanding is correct i need a column either it may be dynamically created on the fly or pre-existing in the dataframe .if that column is present that function shoud return that column as output if its newly getting created it should return null as value &lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 12:56:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18593#M12366</guid>
      <dc:creator>cuteabhi32</dc:creator>
      <dc:date>2022-06-07T12:56:58Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18594#M12367</link>
      <description>&lt;P&gt;ok so let us keep it simple:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df1 = spark.read.format("csv").option("header","true").load("file:///home/cloudera/data/a.csv")
&amp;nbsp;
if "col" in df1.columns:
 df2 = df1
else:
 df2 = df1.withColumn("col", lit(None).cast(TypeYouWant)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Next you can use your case etc on df2.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 13:32:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18594#M12367</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-06-07T13:32:44Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18595#M12368</link>
      <description>&lt;P&gt;&lt;B&gt;Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputs&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;dflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")&lt;/P&gt;&lt;P&gt;dfg=dflist.filter(col('name').isin('ref_date')).count()&lt;/P&gt;&lt;P&gt;if dfg==1 :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;a = a.withColumn('ref_date',expr(f"CASE WHEN&amp;nbsp;flag = 3&amp;nbsp;THEN '{sf_start_dt}' ELSE ref_date END"))&lt;/P&gt;&lt;P&gt;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;a = a.withColumn('ref_date',expr(f"CASE WHEN&amp;nbsp;flag = 3&amp;nbsp;THEN '{sf_start_dt}' END"))&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 14:29:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-check-if-a-column-exist-in-a-dataframe-or-not-if-not/m-p/18595#M12368</guid>
      <dc:creator>cuteabhi32</dc:creator>
      <dc:date>2022-06-07T14:29:16Z</dc:date>
    </item>
  </channel>
</rss>

