<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hive Table Creation - Parquet does not support Timestamp Datatype? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30127#M21805</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;At the end, I've changed the format type from parquet to orc and it works fine for me.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;format("orc")&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 21 Sep 2017 18:20:16 GMT</pubDate>
    <dc:creator>SirChokolate</dc:creator>
    <dc:date>2017-09-21T18:20:16Z</dc:date>
    <item>
      <title>Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30121#M21799</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Good afternoon,&lt;/P&gt;
&lt;P&gt;Attempting to run this statement:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;%sql 
CREATE EXTERNAL TABLE IF NOT EXISTS dev_user_login (
  event_name STRING,
  datetime TIMESTAMP,
  ip_address STRING,
  acting_user_id STRING
)
PARTITIONED BY
  (date DATE)
STORED AS 
  PARQUET
LOCATION
  "/mnt/bi-dev-data/warehouse/users.loggedIn"
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I get the following error message:&lt;/P&gt;
&lt;P&gt;Error in SQL statement: QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.UnsupportedOperationException: Parquet does not support timestamp. See HIVE-6384&lt;/P&gt;
&lt;P&gt;However, when I check HIVE-6384 (Implement all datatypes in Parquet) I see it was resolved some time ago.&lt;/P&gt;
&lt;P&gt;Is Databricks still on a version of Hive that has yet to support Timestamps in parquet? Any help would be appreciated. I tried this in both 1.4 and 1.5 experimental.&lt;/P&gt;
&lt;P&gt;Many thanks.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 06 Sep 2015 20:07:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30121#M21799</guid>
      <dc:creator>RobertWalsh</dc:creator>
      <dc:date>2015-09-06T20:07:57Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30122#M21800</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can you try this - which will use the Dataframes implementation of parquet rather than the Hive version:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE TEMPORARY TABLE &lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;dev_user_login (event_name STRING,  datetime TIMESTAMP,  ip_address STRING,  acting_user_id STRING)USING org.apache.spark.sql.parquetOPTIONS (  path "examples/src/main/resources/people.parquet")&lt;/CODE&gt;&lt;/PRE&gt; &lt;PRE&gt;&lt;CODE&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Sep 2015 16:45:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30122#M21800</guid>
      <dc:creator>vida</dc:creator>
      <dc:date>2015-09-11T16:45:48Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30123#M21801</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Many thanks! The above worked and I was able to create the table with the timestamp data type. Appreciate the automatic partition discovery also! Ill focus on using the Dataframes vs Hive implementation going forward.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2015 01:48:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30123#M21801</guid>
      <dc:creator>RobertWalsh</dc:creator>
      <dc:date>2015-09-12T01:48:49Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30124#M21802</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://users/290/omoshiroi.html" target="_blank"&gt;@omoshiroi&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;didn't work for me, can you paste the entire script here?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2017 05:02:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30124#M21802</guid>
      <dc:creator>jackAKAkarthik</dc:creator>
      <dc:date>2017-01-11T05:02:05Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30125#M21803</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Is there a way to specify the timezone as well. After following the approach mentioned above I was able to store date information like "2016-07-23" as 2016-07-23T00:00:00.000+0000. But now I need to specify the UTC+05:30 timezone. Let me know if this is possible.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jun 2017 13:59:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30125#M21803</guid>
      <dc:creator>SandeepCharugul</dc:creator>
      <dc:date>2017-06-28T13:59:09Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30126#M21804</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;How can apply the solution above, in spark script:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;package com.neoris.spark
import java.text.SimpleDateFormat
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{SQLContext, SaveMode}
import org.apache.spark.sql.types.{DateType, StringType, StructField, StructType}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.flume._
import org.apache.spark.streaming.{Seconds, StreamingContext}
//import org.apache.spark.sql.hive.thriftserver._
import org.apache.spark.sql.hive.HiveContext
object LogAnalyzerStreaming {
  def main(args: Array[String]) {
    if (args.length &amp;lt; 3) {
      System.err.println(
        "Uso: LogAnalyzerStreaming &amp;lt;host&amp;gt; &amp;lt;port&amp;gt; &amp;lt;batchInterval&amp;gt;")
      System.exit(1)
    }
    val Array(in_host, in_port, in_batchInterval) = args
    val host = in_host.trim
    val port = in_port.toInt
    val batchInterval = Seconds(in_batchInterval.toInt)
    val sparkConf = new SparkConf()
      .setAppName("LogAnalyzerStreaming")
      .setMaster("local[*]")
      .set("spark.executor.memory", "2g")
      .set("spark.sql.hive.thriftServer.singleSession", "true")
      .set("spark.driver.allowMultipleContexts", "true")
    val sparkStreamingContext = new StreamingContext(sparkConf, batchInterval)
    val stream = FlumeUtils.createStream(sparkStreamingContext, host, port, StorageLevel.MEMORY_ONLY_SER_2)
    val eventBody = stream.map(e =&amp;gt; new String(e.event.getBody.array))
    val eventBodySchema =
      StructType(
        Array(
          StructField("Fecha",StringType,true),
          StructField("Hora",StringType,true),
          StructField("filler_queries",StringType,true),
          StructField("filler_info",StringType,true),
          StructField("filler_client",StringType,true),
          StructField("ip_port",StringType,true),
          StructField("url01",StringType,true),
          StructField("filler_view",StringType,true),
          StructField("filler_default",StringType,true),
          StructField("filler_query",StringType,true),
          StructField("url02",StringType,true),
          StructField("filler_in",StringType,true),
          StructField("s_country",StringType,true),
          StructField("s_edc",StringType,true),
          StructField("url",StringType,true)
        )
      )
    eventBody.foreachRDD { rdd =&amp;gt;
      val sqlContext = new HiveContext(rdd.sparkContext)
      val streamRDD = rdd.map(x =&amp;gt; x.split(" ")).map(p =&amp;gt; org.apache.spark.sql.Row(p(0),p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10),p(11),p(12),p(13),p(14)))
      val streamSchemaRDD = sqlContext.applySchema(streamRDD,eventBodySchema)
      streamSchemaRDD.registerTempTable("log")
      val queryLog = sqlContext.sql("SELECT TO_DATE(CAST(UNIX_TIMESTAMP(Fecha, 'dd-MMM-yyyy') AS TIMESTAMP)) as FECHA, TO_DATE(CAST(UNIX_TIMESTAMP(Fecha, 'hh:mm:ss.SSS') AS TIMESTAMP)) as HORA FROM log")
      queryLog.show()
      queryLog.write
        .format("parquet")
        .mode("append")
        .saveAsTable("logs")
    }
    stream.count().map(cnt =&amp;gt; cnt + " eventos flume recibidos." ).print()
    sparkStreamingContext.start() 
    sparkStreamingContext.awaitTermination() 
  }
}&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Sep 2017 16:39:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30126#M21804</guid>
      <dc:creator>SirChokolate</dc:creator>
      <dc:date>2017-09-11T16:39:18Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30127#M21805</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;At the end, I've changed the format type from parquet to orc and it works fine for me.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;format("orc")&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Sep 2017 18:20:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/30127#M21805</guid>
      <dc:creator>SirChokolate</dc:creator>
      <dc:date>2017-09-21T18:20:16Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table Creation - Parquet does not support Timestamp Datatype?</title>
      <link>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/95698#M39153</link>
      <description>&lt;P&gt;1. change to spark native catalog approach (not hive metadata store) works. Syntax is essentially:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    CREATE TABLE IF NOT EXISTS dbName.tableName (columns names and types
    )
    USING parquet 
    PARTITIONED BY (
      runAt STRING
    )
    LOCATION 'abfss://path/to/parquet/folder';&lt;/LI-CODE&gt;&lt;P&gt;2. I found I still have to use MSCK repair table 'the-table-name' to ensure the query shows the data.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2024 11:40:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hive-table-creation-parquet-does-not-support-timestamp-datatype/m-p/95698#M39153</guid>
      <dc:creator>source2sea</dc:creator>
      <dc:date>2024-10-23T11:40:30Z</dc:date>
    </item>
  </channel>
</rss>

