<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hello Experts -  I am facing one technical issue with Databricks SQL - IF-ELSE or CASE statement implementation when trying to execute two separate set of queries based on a valued of a column of the Delta table. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33219#M24282</link>
    <description>&lt;P&gt;Thanks a ton Gurpreet, Suggested approach was helpful in resolution of the issue.  Appreciate your help !! &lt;/P&gt;</description>
    <pubDate>Thu, 16 Dec 2021 18:19:57 GMT</pubDate>
    <dc:creator>Atul_Sharan</dc:creator>
    <dc:date>2021-12-16T18:19:57Z</dc:date>
    <item>
      <title>Hello Experts -  I am facing one technical issue with Databricks SQL - IF-ELSE or CASE statement implementation when trying to execute two separate set of queries based on a valued of a column of the Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33215#M24278</link>
      <description>&lt;P&gt;Hi Experts,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm reading  a pipe delimited source file where first row does not contain the data but contains the - REPLACE or UPDATE values which indicate the refresh type if it is Full refresh Or Upsert. The second row is nothing but header and the actual column wise data starts from 3rd row.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am processing this file using DF and saving the data into a temporary delta table for further processing. However I am facing issue in implementing the IF-ELSE logic where I can simply perform two separate SQL operations based on the value of the REPLACE or UPDATE in the delta table column as we can perform using T-SQL.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any pointes/leads/probable solution in this regard will be appreciated. Thanks !!&lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2021 12:56:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33215#M24278</guid>
      <dc:creator>Atul_Sharan</dc:creator>
      <dc:date>2021-12-15T12:56:27Z</dc:date>
    </item>
    <item>
      <title>Re: Hello Experts -  I am facing one technical issue with Databricks SQL - IF-ELSE or CASE statement implementation when trying to execute two separate set of queries based on a valued of a column of the Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33216#M24279</link>
      <description>&lt;P&gt;Hello @Atul Sharan​. My name is Piper, and I'm a moderator for Databricks. Welcome and thank you for your question! Let's give it a bit longer to see how the community responds. If necessary, we'll circle back to this.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2021 18:29:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33216#M24279</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-15T18:29:21Z</dc:date>
    </item>
    <item>
      <title>Re: Hello Experts -  I am facing one technical issue with Databricks SQL - IF-ELSE or CASE statement implementation when trying to execute two separate set of queries based on a valued of a column of the Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33217#M24280</link>
      <description>&lt;P&gt;Hi Atul,&lt;/P&gt;&lt;P&gt;Thanks for your question. Further can you please try to add sample dataset as an example.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My understanding is your data should look like this.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;File-1&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;******&lt;/B&gt;&lt;/P&gt;&lt;P&gt;REPLACE&lt;/P&gt;&lt;P&gt;ID|Name&lt;/P&gt;&lt;P&gt;1|Alex&lt;/P&gt;&lt;P&gt;2|James&lt;/P&gt;&lt;P&gt;3|Smith&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;File-2&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;******&lt;/B&gt;&lt;/P&gt;&lt;P&gt;UDPATE&lt;/P&gt;&lt;P&gt;ID|Name&lt;/P&gt;&lt;P&gt;1|Alex Ho&lt;/P&gt;&lt;P&gt;2|James King&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;&lt;P&gt;GS&lt;/P&gt;</description>
      <pubDate>Thu, 16 Dec 2021 02:59:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33217#M24280</guid>
      <dc:creator>GurpreetSethi</dc:creator>
      <dc:date>2021-12-16T02:59:14Z</dc:date>
    </item>
    <item>
      <title>Re: Hello Experts -  I am facing one technical issue with Databricks SQL - IF-ELSE or CASE statement implementation when trying to execute two separate set of queries based on a valued of a column of the Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33218#M24281</link>
      <description>&lt;P&gt;Hi Atul,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Based on the data which I cooked here is the code base which will read file and generate a dataframe where REFRESH TYPE will be added as a new column and rest of the columns will be split into different columns&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;*******************************************************************************&lt;/P&gt;&lt;P&gt;# Import Libararies&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import input_file_name, lit, split, col&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# Declare UDF to Sort Records (Later Used To Remove Headers)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def dfZipWithIndex (df, offset=1, colName="rowId"):&lt;/P&gt;&lt;P&gt;&amp;nbsp;'''&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Enumerates dataframe rows is native order, like rdd.ZipWithIndex(), but on a dataframe&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;and preserves a schema&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;:param df: source dataframe&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;:param offset: adjustment to zipWithIndex()'s index&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;:param colName: name of the index column&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;'''&lt;/P&gt;&lt;P&gt;&amp;nbsp;new_schema = StructType(&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;[StructField(colName,LongType(),True)]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# new added field in front&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ df.schema.fields&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# previous schema&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;zipped_rdd = df.rdd.zipWithIndex()&lt;/P&gt;&lt;P&gt;&amp;nbsp;new_rdd = zipped_rdd.map(lambda args: ([args[1] + offset] + list(args[0])))&lt;/P&gt;&lt;P&gt;&amp;nbsp;return spark.createDataFrame(new_rdd, new_schema)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# Read Data File&lt;/P&gt;&lt;P&gt;df = sqlContext.read.format('com.databricks.spark.csv').load("/FileStore/test.csv")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# Extract First Row Which is REFRESH TYPE&lt;/P&gt;&lt;P&gt;x = lit(str(df.limit(1).collect()[0][0]))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# Add REFRESH TYPE as a NEW COLUMN&lt;/P&gt;&lt;P&gt;df = df.withColumn("RefreshType",x)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# REMOVE/FILTER REFRESH TYPE FROM DATAFRAME&lt;/P&gt;&lt;P&gt;df2=dfZipWithIndex(df)&lt;/P&gt;&lt;P&gt;df2 = df2.filter(df2.rowId&amp;gt;1).drop("rowId")&lt;/P&gt;&lt;P&gt;df2.show()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;#Split UDF&amp;nbsp;&lt;/P&gt;&lt;P&gt;split_col =&amp;nbsp;split(df2._c0, '\\|',)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# Finally split ID &amp;amp; Name column and create a data frame&lt;/P&gt;&lt;P&gt;df3=df2.withColumn("ID",split_col.getItem(0)) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;.withColumn("Name",split_col.getItem(1)) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;.drop(col("_c0"))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df3.show()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2234iC7394CBAFA96B40F/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Idea is that finally you can save this dataframe into a temporary table and then it will be easy for you to do IF-ELSE statement. My understanding was that currently as you may be writing complete raw dataset into table its like complete data into a single column (like below)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2225i45936EF68A1F8375/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let me know if we have different understanding.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Dec 2021 07:38:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33218#M24281</guid>
      <dc:creator>GurpreetSethi</dc:creator>
      <dc:date>2021-12-16T07:38:49Z</dc:date>
    </item>
    <item>
      <title>Re: Hello Experts -  I am facing one technical issue with Databricks SQL - IF-ELSE or CASE statement implementation when trying to execute two separate set of queries based on a valued of a column of the Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33219#M24282</link>
      <description>&lt;P&gt;Thanks a ton Gurpreet, Suggested approach was helpful in resolution of the issue.  Appreciate your help !! &lt;/P&gt;</description>
      <pubDate>Thu, 16 Dec 2021 18:19:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33219#M24282</guid>
      <dc:creator>Atul_Sharan</dc:creator>
      <dc:date>2021-12-16T18:19:57Z</dc:date>
    </item>
    <item>
      <title>Re: Hello Experts -  I am facing one technical issue with Databricks SQL - IF-ELSE or CASE statement implementation when trying to execute two separate set of queries based on a valued of a column of the Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33220#M24283</link>
      <description>&lt;P&gt;Welcome Atul &lt;/P&gt;</description>
      <pubDate>Thu, 16 Dec 2021 20:56:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hello-experts-i-am-facing-one-technical-issue-with-databricks/m-p/33220#M24283</guid>
      <dc:creator>GurpreetSethi</dc:creator>
      <dc:date>2021-12-16T20:56:19Z</dc:date>
    </item>
  </channel>
</rss>

