<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to convert column type from decimal to date in sparksql in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27328#M19205</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select count(*) from prty_cdim where load_dt_id is not null;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;there are 732734948 records&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select count(*) from prty_cdim where load_dt_id is null;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;0 records&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;create table testtable as select * from prty_cdim where load_dt_is is not null limit 10;
select count(*) from testtable;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;table testtable successful created and 10 rows in it.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select load_dt_id from testtable;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;load_dt_id&lt;/P&gt;
&lt;P&gt;20140419&lt;/P&gt;
&lt;P&gt;20140419&lt;/P&gt;
&lt;P&gt;..&lt;/P&gt;
&lt;P&gt;20140419&lt;/P&gt;
&lt;P&gt;10 rows&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select * from testtable where to_date(cast(load_dt_id as string),'yyyyMMdd') = date_format(date_sub(current_date(),1),'yyyyMMdd');&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;OK with bulk glass 1 (don't know what it's meant?). When I click the bulk image, it's - 'Accelerate queries with Delta: This query contains a highly selective filter. To improve the performance of queries, convert the table to Delta and run the Opmitize ZORDER BY command on the table 'default'.'testtable"&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 11 Feb 2020 18:08:41 GMT</pubDate>
    <dc:creator>pepevo</dc:creator>
    <dc:date>2020-02-11T18:08:41Z</dc:date>
    <item>
      <title>How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27322#M19199</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I need to convert column type from decimal to date in sparksql when the format is not yyyy-mm-dd?&lt;/P&gt;
&lt;P&gt;A table contains column data declared as decimal (38,0) and data is in yyyymmdd format and I am unable to run sql queries on it in databrick notebook.&lt;/P&gt;
&lt;P&gt;I have tried to_date(column_name) = date_sub(current_date(),1) and it didn't work. I tried, "from_unixtime(cast(column_name as string), 'yyyy-MM-dd') or to_date(cast(unix_timestamp('column_name, 'yyyy-MM-dd') and none of them are work. From_unixtime(cast(column_name as string),'yyyy-mm-dd', it gives me right format but those data are not existed in the table.&lt;/P&gt;
&lt;P&gt;How can I convert this column type to a &lt;B&gt;date&lt;/B&gt; inside sql?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2020 15:23:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27322#M19199</guid>
      <dc:creator>pepevo</dc:creator>
      <dc:date>2020-02-10T15:23:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27323#M19200</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Here is a working example, &lt;/P&gt;
&lt;P&gt;In SQL, &lt;/P&gt;
&lt;P&gt;create table temp1 as (select 20010101.00 date);&lt;/P&gt;
&lt;P&gt;-- This creates a table with a single column named "date" with a datatype of decimal. &lt;/P&gt;
&lt;P&gt;-- To verify run "describe temp1";&lt;/P&gt;
&lt;P&gt;the to_date function takes a string as an input so first cast the decimal to string. &lt;/P&gt;
&lt;P&gt;select cast(date as String) from temp1;&lt;/P&gt;
&lt;P&gt;Once you have a string you can push to a date, &lt;/P&gt;
&lt;P&gt;select to_date(cast(date as String), 'yyyyMMdd') date from temp1;&lt;/P&gt;
&lt;P&gt;You could do the same using dataframe api. &lt;/P&gt;
&lt;P&gt;df1 = spark.sql("select 20010101.00 date")&lt;/P&gt;
&lt;P&gt;Convert to string&lt;/P&gt;
&lt;P&gt;df2 = df1.select(df1.date.cast("string"))&lt;/P&gt;
&lt;P&gt;Drop right of decimal&lt;/P&gt;
&lt;P&gt;from pyspark.sql.functions import substring_index&lt;/P&gt;
&lt;P&gt;df3 = df2.select(substring_index(df2.date, '.', 1).alias('date'))&lt;/P&gt;
&lt;P&gt;Convert String to Date&lt;/P&gt;
&lt;P&gt;from pyspark.sql.functions import to_date&lt;/P&gt;
&lt;P&gt;df4 = df3.select(to_date('date', 'yyyyMMdd').alias('date'))&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2020 16:24:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27323#M19200</guid>
      <dc:creator>User16857282152</dc:creator>
      <dc:date>2020-02-10T16:24:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27324#M19201</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;thank you for your input. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;I tried to select the column by itself from the table as you showed and yes, I get the format converted right from yyyyMMdd to yyyy-MM-dd on output, but when I put it in the where clause as below:
&lt;PRE&gt;&lt;CODE&gt;select count(*) from table_name where to_date(cast(column_name as string),'yyyyMMdd') = date_sub(current_date(),1));
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt; my output is 0 instead of more than 500000 records.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2020 16:44:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27324#M19201</guid>
      <dc:creator>pepevo</dc:creator>
      <dc:date>2020-02-10T16:44:07Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27325#M19202</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;That seems strange, &lt;/P&gt;
&lt;P&gt;Have you tried the reverse, convert the date you want to compare to a string, and then compare&lt;/P&gt;
&lt;P&gt;select * from table where (your column cast to string) = date_format(date_sub(current_date(),1), 'yyyyMMdd')&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2020 18:53:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27325#M19202</guid>
      <dc:creator>User16857282152</dc:creator>
      <dc:date>2020-02-10T18:53:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27326#M19203</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I did and the output is still = 0. &lt;/P&gt;
&lt;P&gt;* the column_name is load_dt_id&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select count(*) from table_name WHERE to_date(cast(load_dt_id as string),'yyyyMMdd') = date_format(date_sub(),1),'yyyyMMdd');&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;There are 700 million records and the count can't be 0 if load_dt_id is sysdate -1. &lt;/P&gt;
&lt;P&gt;The count from Oracle is at least a 500 million if the load_dt_id is sysdate -1. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2020 19:03:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27326#M19203</guid>
      <dc:creator>pepevo</dc:creator>
      <dc:date>2020-02-10T19:03:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27327#M19204</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I would start by debugging a small collection of records, &lt;/P&gt;
&lt;P&gt;First count nulls, perhaps your import failed to read that column for some formatting reason. &lt;/P&gt;
&lt;P&gt;Select count(*) from table where target_column is NOT NULL;&lt;/P&gt;
&lt;P&gt;compare with&lt;/P&gt;
&lt;P&gt;Select count(*) from table where target_column is NULL;&lt;/P&gt;
&lt;P&gt;If the table is massive and costs are significant, combine the above into a single query. &lt;/P&gt;
&lt;P&gt;Then grab a few rows&lt;/P&gt;
&lt;P&gt;Create testTable as select * from sourceTable where target_column is not null LIMIT 10; &lt;/P&gt;
&lt;P&gt;Note the limit clause, just grab a few rows. &lt;/P&gt;
&lt;P&gt;That should give you an idea of what is wrong, if anything with the actual data as imported and what you are expecting the data to be. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Feb 2020 17:14:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27327#M19204</guid>
      <dc:creator>User16857282152</dc:creator>
      <dc:date>2020-02-11T17:14:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27328#M19205</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select count(*) from prty_cdim where load_dt_id is not null;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;there are 732734948 records&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select count(*) from prty_cdim where load_dt_id is null;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;0 records&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;create table testtable as select * from prty_cdim where load_dt_is is not null limit 10;
select count(*) from testtable;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;table testtable successful created and 10 rows in it.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select load_dt_id from testtable;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;load_dt_id&lt;/P&gt;
&lt;P&gt;20140419&lt;/P&gt;
&lt;P&gt;20140419&lt;/P&gt;
&lt;P&gt;..&lt;/P&gt;
&lt;P&gt;20140419&lt;/P&gt;
&lt;P&gt;10 rows&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;select * from testtable where to_date(cast(load_dt_id as string),'yyyyMMdd') = date_format(date_sub(current_date(),1),'yyyyMMdd');&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;OK with bulk glass 1 (don't know what it's meant?). When I click the bulk image, it's - 'Accelerate queries with Delta: This query contains a highly selective filter. To improve the performance of queries, convert the table to Delta and run the Opmitize ZORDER BY command on the table 'default'.'testtable"&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Feb 2020 18:08:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27328#M19205</guid>
      <dc:creator>pepevo</dc:creator>
      <dc:date>2020-02-11T18:08:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27329#M19206</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have been tried many data functions to compare with sysdate and no luck even with 10 records in a table. Is there a way I can compare sysdate as decimal?&lt;/P&gt;
&lt;P&gt;even using, unix_timestamp, from_unixtime and none of them work.&lt;/P&gt;
&lt;P&gt;Why scala sql is not friendly much. what would the following where clause look like in databricks:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;WHERE to_date(trunc(load_dt_id),'yyyymmdd') = trunc(sysdate -1);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I tried "trunc(cast(load_dt_id as string),'yyyyMMdd') = date_format(date_sub(currentl_date(),1),'yyyyMMdd') ; "&lt;/P&gt;
&lt;P&gt;or "&lt;/P&gt;
&lt;P&gt;trunc(cast(load_dt_id as string),'yyyyMMdd') = trunc(date_sub(currentl_date(),1),'yyyyMMdd');&lt;/P&gt;
&lt;P&gt;"&lt;/P&gt;
&lt;P&gt;none of them are working&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 17:52:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27329#M19206</guid>
      <dc:creator>pepevo</dc:creator>
      <dc:date>2020-02-12T17:52:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27330#M19207</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;You can run &lt;/P&gt;
&lt;P&gt;"describe tablename;" to get the datatypes of a column. &lt;/P&gt;
&lt;P&gt;I ran a few quick examples and found the following to work, I think you are probably close but just some formatting issues. &lt;/P&gt;
&lt;P&gt;Explore some of these examples and see if you can find your error. &lt;/P&gt;
&lt;P&gt;Remember to delete any tables you create if you are on a shared workspace. &lt;/P&gt;
&lt;P&gt;Create a table with a decimal as a column named id. &lt;/P&gt;
&lt;P&gt;create table temp6 as (Select 20140419.00 id);&lt;/P&gt;
&lt;P&gt;describe temp6; &lt;/P&gt;
&lt;P&gt;-- to show it is indeed a decimal&lt;/P&gt;
&lt;P&gt;Select from it and note that sql deals with int to float&lt;/P&gt;
&lt;P&gt;%sql select * from temp6 where id = 20140419;&lt;/P&gt;
&lt;P&gt;Also casts strings for you&lt;/P&gt;
&lt;P&gt;%sql select * from temp6 where id = "20140419";&lt;/P&gt;
&lt;P&gt;Do the same with ints using&lt;/P&gt;
&lt;P&gt;create table temp7 as (Select 20140419 id);&lt;/P&gt;
&lt;P&gt;This last example might get you close, &lt;/P&gt;
&lt;P&gt;select * from temp6 where id &amp;lt; replace(cast(current_date() as String), '-');&lt;/P&gt;
&lt;P&gt;Break that down by running &lt;/P&gt;
&lt;P&gt;select current_date()&lt;/P&gt;
&lt;P&gt;Then cast to string&lt;/P&gt;
&lt;P&gt;select cast(current_date() as String);&lt;/P&gt;
&lt;P&gt;Then drop the dashes&lt;/P&gt;
&lt;P&gt;select replace(cast(current_date() as String), '-')&lt;/P&gt;
&lt;P&gt;Then use that in your where clause, but modify as needed to suit your logic. &lt;/P&gt;
&lt;P&gt;Note mine worked when I compared an int or a decimal to a string, sparkSQL casts as appropriate. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Feb 2020 14:42:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27330#M19207</guid>
      <dc:creator>User16857282152</dc:creator>
      <dc:date>2020-02-13T14:42:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27331#M19208</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;thank you Tom. I made it work already.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Feb 2020 19:35:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27331#M19208</guid>
      <dc:creator>pepevo</dc:creator>
      <dc:date>2020-02-13T19:35:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert column type from decimal to date in sparksql</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27332#M19209</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Glad to hear that, &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Feb 2020 22:37:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-convert-column-type-from-decimal-to-date-in-sparksql/m-p/27332#M19209</guid>
      <dc:creator>User16857282152</dc:creator>
      <dc:date>2020-02-13T22:37:32Z</dc:date>
    </item>
  </channel>
</rss>

