cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

GS2312
by New Contributor II
  • 4387 Views
  • 6 replies
  • 5 kudos

KeyProviderException when trying to create external table on databricks

Hi There,I have been trying to create an external table on Azure Databricks with below statement.df.write.partitionBy("year", "month", "day").format('org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat').option("path",sourcepath).mod...

  • 4387 Views
  • 6 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Gaurishankar Sakhare​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

  • 5 kudos
5 More Replies
Direo
by Contributor
  • 1061 Views
  • 1 replies
  • 0 kudos

Operations applied when running fs.write_table to overwrite existing feature table in hive metastore

Hi,there was a need to query an older snapshot of a table. Therefore ran:deltaTable = DeltaTable.forPath(spark, 'dbfs:/<path>') display(deltaTable.history())and noticed that every fs.write_table run triggers two operations:Write and CREATE OR REPLACE...

image
  • 1061 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Direo Direo​ :When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:It writes the new data to disk in the Delta format, andIt at...

  • 0 kudos
DBX-Beginer
by New Contributor
  • 3659 Views
  • 2 replies
  • 0 kudos

Display count of records in all tables in hive meta store based on one of the column value.

I have a DB name called Test in Hive meta store of data bricks. This DB contains around 100 tables. Each table has the column name called sourcesystem and many other columns. Now I need to display the count of records in each table group by source sy...

  • 3659 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Krish K​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
asethia
by New Contributor
  • 3800 Views
  • 1 replies
  • 0 kudos

delta lake in Apache Spark

Hi,As per documentation https://docs.delta.io/latest/quick-start.html , we can configure DeltaCatalog using spark.sql.catalog.spark_catalog.The Iceberg supports two Catalog implementations (https://iceberg.apache.org/docs/latest/spark-configuration/#...

  • 3800 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Arun Sethia​ :Yes, Delta Lake also supports custom catalogs. Delta Lake uses the Spark Catalog API, which allows for pluggable catalog implementations. You can implement your own custom catalog to use with Delta Lake.To use a custom catalog, you can...

  • 0 kudos
Erik
by Valued Contributor II
  • 2048 Views
  • 2 replies
  • 2 kudos

Resolved! Can we have the powerbi connector step into "hive_metastore" automatically?

We are distributing pbids files providing the connection info to databricks. It contains options passed to the "Databricks.Catalogs " function implementing the connection to databricks. It is my understanding that databricks has made this together wi...

  • 2048 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Erik Parmann​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 2 kudos
1 More Replies
prasadvaze
by Valued Contributor II
  • 4965 Views
  • 8 replies
  • 2 kudos

Resolved! SQL endpoint is unable to connect to external hive metastore ( Azure databricks)

Using Azure databricks, I have set up SQL Endpoint with the connection details that match with global init script. I am able to browse tables from regular cluster in Data Engineering module but i get below error when trying a query using SQL Endpoint...

  • 4965 Views
  • 8 replies
  • 2 kudos
Latest Reply
prasadvaze
Valued Contributor II
  • 2 kudos

@Prabakar Ammeappin​  @Kaniz Fatma​  Also I found out that after delta table is created in external metastore (and the table data resides in ADLS) then in the sql end point settings I do not need to provide ADLS connection details. I only provided...

  • 2 kudos
7 More Replies
TimK
by New Contributor II
  • 3273 Views
  • 3 replies
  • 1 kudos

Resolved! Cannot Get Databricks SQL to read external Hive Metastore

I have followed the documentation and using the same metastore config that is working in the Data Engineering context. When attempting to view the Databases, I get the error:Encountered an internal errorThe following information failed to load:The li...

  • 3273 Views
  • 3 replies
  • 1 kudos
Latest Reply
TimK
New Contributor II
  • 1 kudos

@Bilal Aslam​  I didn't think to look there before since I hadn't tried to run any queries. I see the failed SHOW DATABASES queries in history and they identify the error: Builtin jars can only be used when hive execution version == hive metastore v...

  • 1 kudos
2 More Replies
prasadvaze
by Valued Contributor II
  • 1109 Views
  • 1 replies
  • 1 kudos

which table in external hive metastore stores the folder path for delta table's data?

which table in external hive metastore stores the folder path for delta table's data? Is it SDS table?

  • 1109 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Image here can be helpful https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png

  • 1 kudos
brickster_2018
by Esteemed Contributor
  • 1525 Views
  • 2 replies
  • 0 kudos

Resolved! External metastore version

I am setting up an external metastore to connect my Databricks cluster. Which is the preferred and recommended Hive metastore version? Also are there any preference or recommendations on the database instance size/type

  • 1525 Views
  • 2 replies
  • 0 kudos
Latest Reply
prasadvaze
Valued Contributor II
  • 0 kudos

@Harikrishnan Kunhumveettil​  we use databricks runtime 7.3LTS and 9.1LTS. And external hive metastore hosted on azue sql db. Using global init script I have set spark.sql.hive.metastore.version 2.3.7 and downloaded spark.sql.hive.metastore.jars f...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2381 Views
  • 1 replies
  • 2 kudos

Are there any costs or quotas associated with the Databricks managed Hive metastore?

When using the default hive metastore that is managed within the Databricks control plane are there any associated costs? I.e. if I switched to an external metastore would I expect to see any reduction in my Databricks cost (ignoring total costs).Do ...

  • 2381 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 2 kudos

There are no costs associated by using the Databricks managed Hive metastore directly. Databricks pricing is on a compute consumption and not on data storage or access. The only real cost would be the compute used to access the data. I would not expe...

  • 2 kudos
Labels