cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RobertWalsh
by New Contributor II
  • 8860 Views
  • 7 replies
  • 2 kudos

Resolved! Hive Table Creation - Parquet does not support Timestamp Datatype?

Good afternoon, Attempting to run this statement: %sql CREATE EXTERNAL TABLE IF NOT EXISTS dev_user_login ( event_name STRING, datetime TIMESTAMP, ip_address STRING, acting_user_id STRING ) PARTITIONED BY (date DATE) STORED AS PARQUET ...

  • 8860 Views
  • 7 replies
  • 2 kudos
Latest Reply
source2sea
Contributor
  • 2 kudos

1. change to spark native catalog approach (not hive metadata store) works. Syntax is essentially: CREATE TABLE IF NOT EXISTS dbName.tableName (columns names and types ) USING parquet PARTITIONED BY ( runAt STRING ) LOCA...

  • 2 kudos
6 More Replies
lnsnarayanan
by New Contributor II
  • 10763 Views
  • 8 replies
  • 12 kudos

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

I am using Databricks community edition for learning purposes. I created some Hive-managed tables through spark sql as well as with df.saveAsTable options. But when I connect to a new cluser, "Show databases" only returns the default database....

  • 10763 Views
  • 8 replies
  • 12 kudos
Latest Reply
dhpaulino
New Contributor II
  • 12 kudos

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:db_name = "mydb" from pathlib import Path path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/" tables_dirs = dbutils.fs.l...

  • 12 kudos
7 More Replies
as999
by New Contributor III
  • 13316 Views
  • 8 replies
  • 6 kudos

Databrick hive metastore location?

In databrick, where is hive metastore location is it control plane or data plane? for prod systems In terms of security what preventions should be taken to secure hive metastore?

  • 13316 Views
  • 8 replies
  • 6 kudos
Latest Reply
Prabakar
Databricks Employee
  • 6 kudos

@as999​ The default metastore is managed by Databricks. If you are concerned about security and would like to have your own metastore you can go for the external metastore setup. You have the details steps in the below doc for setting up the external...

  • 6 kudos
7 More Replies
irispan
by New Contributor II
  • 3790 Views
  • 4 replies
  • 1 kudos

Recommended Hive metastore pattern for Trino integration

Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...

test - Databricks
  • 3790 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunlinZeng
Databricks Employee
  • 1 kudos

> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...

  • 1 kudos
3 More Replies
rami1
by New Contributor II
  • 8338 Views
  • 2 replies
  • 4 kudos

METASTORE_DOWN: Cannot connect to metastore

I am trying to view databases and tables, default as well user created but it looks like the cluster created is not able to connect. I am using databricks default hive metastore. Viewing cluster logs provide following ventMETASTORE_DOWN Metastore is...

  • 8338 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@rami​ :If the metastore is down, it means that the Databricks cluster is not able to connect to the metastore. Here are a few things you can try to resolve the issue:Check if the Hive metastore is up and running. You can try to connect to the metast...

  • 4 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 3793 Views
  • 1 replies
  • 0 kudos

best practices/steps for hive meta store backup and restore.

Hi Team,Could you share with us the best practices/steps for hive meta store backup and restore?Regards,Phanindra

  • 3793 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Janga Reddy​ :Certainly! Here are the steps for Hive metastore backup and restore on Databricks:Backup:Stop all running Hive services and jobs on the Databricks cluster.Create a backup directory in DBFS (Databricks File System) where the metadata fi...

  • 0 kudos
Anonymous
by Not applicable
  • 3307 Views
  • 1 replies
  • 0 kudos

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then create...

Image
  • 3307 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@vikashk84The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you ...

  • 0 kudos
RafaelGomez61
by New Contributor
  • 3360 Views
  • 2 replies
  • 0 kudos

Can't access delta tables under SQL Warehouse cluster. Getting Error while using path .../_delta_log/000000000.checkpoint

In our Databricks workspace, we have several delta tables available in the hive_metastore catalog. we are able to access and query the data via Data Science & Engineering persona clusters with no issues. The cluster have the credential passthrough en...

  • 3360 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rafael Gomez​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 0 kudos
1 More Replies
yzaehringer
by New Contributor
  • 1842 Views
  • 1 replies
  • 0 kudos

GET_COLUMNS fails with Unexpected character (\\'t\\' (code 116)): was expecting comma to separate Object entries - how to fix?

I just run `cursor.columns()` via the python client and I'll get back a `org.apache.hive.service.cli.HiveSQLException` as response. There is also a long stack trace, I'll just paste the last bit because it might be illuminating: org.apache.spark.sql....

  • 1842 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

this can be package issue or runtime issue, try to change both

  • 0 kudos
pantelis_mare
by Contributor III
  • 3583 Views
  • 6 replies
  • 1 kudos

Too long hive type string

Hello community!I have a table with a column that is an array of a struct that has a very very long schema.When the table is written, all works well. Though, when I create a view based on this table and I try to access the view I get the error:rg.apa...

  • 3583 Views
  • 6 replies
  • 1 kudos
Latest Reply
Afzal
New Contributor II
  • 1 kudos

@Pantelis Maroudis​ , were you able to solve this issue? Please advise if you got any tip. Thanks in advance

  • 1 kudos
5 More Replies
jeffreym9
by New Contributor III
  • 3808 Views
  • 4 replies
  • 0 kudos

Resolved! Hive version after Upgrade Azure Databricks from 6.4 (Spark 2) to 9.1 (Spark 3)

I have upgraded the Azure Databricks from 6.4 to 9.1 which enable me to use Spark3. As far as I know, the Hive version has to be upgraded to 2.3.7 as well as discussed in: https://community.databricks.com/s/question/0D53f00001HKHy2CAH/how-to-upgrade-...

  • 3808 Views
  • 4 replies
  • 0 kudos
Latest Reply
jeffreym9
New Contributor III
  • 0 kudos

I'm asking about Datatricks version 9.1. I've follow the url given (https://docs.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore). Do you mind letting me know where in the table is mentioning the supported hive version fo...

  • 0 kudos
3 More Replies
Autel
by New Contributor II
  • 3976 Views
  • 3 replies
  • 0 kudos

Resolved! concurrent update to same hive or deltalake table

HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...

  • 3976 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

The Hive table will not like this, as the underlying data is parquet format which is not ACID compliant.Delta lake however is:https://docs.delta.io/0.5.0/concurrency-control.htmlYou can see that inserts do not give conflicts.

  • 0 kudos
2 More Replies
as999
by New Contributor III
  • 1615 Views
  • 3 replies
  • 1 kudos

python dataframe or hiveSql update based on predecessor value?

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.  For example.Original DF.sno Object Name  shape  rating1  Fruit apple round ...

  • 1615 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...

  • 1 kudos
2 More Replies
Anonymous
by Not applicable
  • 2803 Views
  • 1 replies
  • 2 kudos

Are there any costs or quotas associated with the Databricks managed Hive metastore?

When using the default hive metastore that is managed within the Databricks control plane are there any associated costs? I.e. if I switched to an external metastore would I expect to see any reduction in my Databricks cost (ignoring total costs).Do ...

  • 2803 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 2 kudos

There are no costs associated by using the Databricks managed Hive metastore directly. Databricks pricing is on a compute consumption and not on data storage or access. The only real cost would be the compute used to access the data. I would not expe...

  • 2 kudos
ZeykUtra
by New Contributor
  • 843 Views
  • 0 replies
  • 0 kudos

java.io.IOException: While processing file s3://test/abc/request_dt=2021-07-28/someParquetFile. [XYZ] BINARY is not in the store

Hi Team, I am facing an issue "java.io.IOException: While processing file s3://test/abc/request_dt=2021-07-28/someParquetFile. [XYZ] BINARY is not in the store" The things i did before getting the above exception: 1. Alter table tableName1 add colum...

  • 843 Views
  • 0 replies
  • 0 kudos
Labels