cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

as999
by New Contributor III
  • 8084 Views
  • 8 replies
  • 6 kudos

Databrick hive metastore location?

In databrick, where is hive metastore location is it control plane or data plane? for prod systems In terms of security what preventions should be taken to secure hive metastore?

  • 8084 Views
  • 8 replies
  • 6 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 6 kudos

@as999​ The default metastore is managed by Databricks. If you are concerned about security and would like to have your own metastore you can go for the external metastore setup. You have the details steps in the below doc for setting up the external...

  • 6 kudos
7 More Replies
lnsnarayanan
by New Contributor II
  • 5829 Views
  • 7 replies
  • 9 kudos

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

I am using Databricks community edition for learning purposes. I created some Hive-managed tables through spark sql as well as with df.saveAsTable options. But when I connect to a new cluser, "Show databases" only returns the default database....

  • 5829 Views
  • 7 replies
  • 9 kudos
Latest Reply
dez
New Contributor II
  • 9 kudos

This "feature" in the Community edition is based on the fact that I cannot restart a cluster. So, in the morning I create a cluster for studies purposes and in the afternoon I have to recreate the cluster.If there's any dependent objects from previou...

  • 9 kudos
6 More Replies
irispan
by New Contributor II
  • 2137 Views
  • 4 replies
  • 1 kudos

Recommended Hive metastore pattern for Trino integration

Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...

test - Databricks
  • 2137 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunlinZeng
New Contributor II
  • 1 kudos

> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...

  • 1 kudos
3 More Replies
rami1
by New Contributor II
  • 4795 Views
  • 2 replies
  • 3 kudos

METASTORE_DOWN: Cannot connect to metastore

I am trying to view databases and tables, default as well user created but it looks like the cluster created is not able to connect. I am using databricks default hive metastore. Viewing cluster logs provide following ventMETASTORE_DOWN Metastore is...

  • 4795 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@rami​ :If the metastore is down, it means that the Databricks cluster is not able to connect to the metastore. Here are a few things you can try to resolve the issue:Check if the Hive metastore is up and running. You can try to connect to the metast...

  • 3 kudos
1 More Replies
Phani1
by Valued Contributor
  • 2253 Views
  • 1 replies
  • 0 kudos

best practices/steps for hive meta store backup and restore.

Hi Team,Could you share with us the best practices/steps for hive meta store backup and restore?Regards,Phanindra

  • 2253 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Janga Reddy​ :Certainly! Here are the steps for Hive metastore backup and restore on Databricks:Backup:Stop all running Hive services and jobs on the Databricks cluster.Create a backup directory in DBFS (Databricks File System) where the metadata fi...

  • 0 kudos
Anonymous
by Not applicable
  • 1957 Views
  • 1 replies
  • 0 kudos

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then create...

Image
  • 1957 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@vikashk84The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you ...

  • 0 kudos
RafaelGomez61
by New Contributor
  • 2043 Views
  • 2 replies
  • 0 kudos

Can't access delta tables under SQL Warehouse cluster. Getting Error while using path .../_delta_log/000000000.checkpoint

In our Databricks workspace, we have several delta tables available in the hive_metastore catalog. we are able to access and query the data via Data Science & Engineering persona clusters with no issues. The cluster have the credential passthrough en...

  • 2043 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rafael Gomez​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 0 kudos
1 More Replies
yzaehringer
by New Contributor
  • 1050 Views
  • 1 replies
  • 0 kudos

GET_COLUMNS fails with Unexpected character (\\'t\\' (code 116)): was expecting comma to separate Object entries - how to fix?

I just run `cursor.columns()` via the python client and I'll get back a `org.apache.hive.service.cli.HiveSQLException` as response. There is also a long stack trace, I'll just paste the last bit because it might be illuminating: org.apache.spark.sql....

  • 1050 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

this can be package issue or runtime issue, try to change both

  • 0 kudos
pantelis_mare
by Contributor III
  • 1951 Views
  • 6 replies
  • 1 kudos

Too long hive type string

Hello community!I have a table with a column that is an array of a struct that has a very very long schema.When the table is written, all works well. Though, when I create a view based on this table and I try to access the view I get the error:rg.apa...

  • 1951 Views
  • 6 replies
  • 1 kudos
Latest Reply
Afzal
New Contributor II
  • 1 kudos

@Pantelis Maroudis​ , were you able to solve this issue? Please advise if you got any tip. Thanks in advance

  • 1 kudos
5 More Replies
jeffreym9
by New Contributor III
  • 2111 Views
  • 5 replies
  • 0 kudos

Resolved! Hive version after Upgrade Azure Databricks from 6.4 (Spark 2) to 9.1 (Spark 3)

I have upgraded the Azure Databricks from 6.4 to 9.1 which enable me to use Spark3. As far as I know, the Hive version has to be upgraded to 2.3.7 as well as discussed in: https://community.databricks.com/s/question/0D53f00001HKHy2CAH/how-to-upgrade-...

  • 2111 Views
  • 5 replies
  • 0 kudos
Latest Reply
jeffreym9
New Contributor III
  • 0 kudos

I'm asking about Datatricks version 9.1. I've follow the url given (https://docs.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore). Do you mind letting me know where in the table is mentioning the supported hive version fo...

  • 0 kudos
4 More Replies
Autel
by New Contributor II
  • 2366 Views
  • 4 replies
  • 1 kudos

Resolved! concurrent update to same hive or deltalake table

HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...

  • 2366 Views
  • 4 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Weide Zhang​ , Does @[werners] (Customer)​ 's reply answer your question?

  • 1 kudos
3 More Replies
as999
by New Contributor III
  • 865 Views
  • 3 replies
  • 1 kudos

python dataframe or hiveSql update based on predecessor value?

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.  For example.Original DF.sno Object Name  shape  rating1  Fruit apple round ...

  • 865 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...

  • 1 kudos
2 More Replies
Kaniz
by Community Manager
  • 938 Views
  • 1 replies
  • 0 kudos
  • 938 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

The differences are as follows:-Pig operates on the client-side of a cluster whereas Hive operates on the server-side of a cluster.Pig uses pig-Latin language whereas Hive uses HiveQL language.Pig is a Procedural Data Flow Language whereas Hive is a ...

  • 0 kudos
Anonymous
by Not applicable
  • 1694 Views
  • 1 replies
  • 2 kudos

Are there any costs or quotas associated with the Databricks managed Hive metastore?

When using the default hive metastore that is managed within the Databricks control plane are there any associated costs? I.e. if I switched to an external metastore would I expect to see any reduction in my Databricks cost (ignoring total costs).Do ...

  • 1694 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 2 kudos

There are no costs associated by using the Databricks managed Hive metastore directly. Databricks pricing is on a compute consumption and not on data storage or access. The only real cost would be the compute used to access the data. I would not expe...

  • 2 kudos
Labels