cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Authentication issue in HiveMetastore

Prajit0710
New Contributor II

Problem Statement:
When I execute the below code as a part of the notebook both manually and in workflow it works as expected
df.write.mode("overwrite") \
.format('delta') \
.option('path',ext_path) \
.saveAsTable("tbl_schema.Table_name")

but when I integrate the same code within the function and execute it still writs to the target but does not get recognised by HiveMetastore.

1 ACCEPTED SOLUTION

Accepted Solutions

lingareddy_Alva
Honored Contributor II

Hi @Prajit0710 

This is an interesting issue where your Delta table write operation works as expected when run directly,
but when executed within a function, the table doesn't get recognized by the HiveMetastore.

The key difference is likely related to how Spark interacts with the HiveMetastore when registering tables
from within functions versus at the notebook level.
Here are a few potential causes and solutions:

1. Spark Session Context: Functions might be executing in a different Spark context/session than your main notebook code.
2. Catalog Registration: The table metadata might not be properly registering with the Hive metastore within the function context.
3. Transaction Boundaries: Function execution might affect how transactions are handled.

Try these solutions:

1. Pass the SparkSession explicitly to your function:
def write_delta_table(spark, dataframe, ext_path, table_name):
dataframe.write.mode("overwrite") \
.format('delta') \
.option('path', ext_path) \
.saveAsTable(table_name)

2. Use catalog commands explicitly:
def write_delta_table(dataframe, ext_path, table_name):
# Write data first
dataframe.write.mode("overwrite") \
.format('delta') \
.option('path', ext_path) \
.save()

# Then explicitly register with metastore
spark.sql(f"CREATE TABLE IF NOT EXISTS {table_name} USING DELTA LOCATION '{ext_path}'")

3. Check for schema conflicts:
Make sure your function isn't creating schema conflicts with existing tables. Try adding before your table creation.
:
spark.sql(f"DROP TABLE IF EXISTS {table_name}")

4. Ensure proper database context:
Explicitly set the database context within your function:
spark.sql("USE schema_name")

 

 

LR

View solution in original post

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @Prajit0710 

This is an interesting issue where your Delta table write operation works as expected when run directly,
but when executed within a function, the table doesn't get recognized by the HiveMetastore.

The key difference is likely related to how Spark interacts with the HiveMetastore when registering tables
from within functions versus at the notebook level.
Here are a few potential causes and solutions:

1. Spark Session Context: Functions might be executing in a different Spark context/session than your main notebook code.
2. Catalog Registration: The table metadata might not be properly registering with the Hive metastore within the function context.
3. Transaction Boundaries: Function execution might affect how transactions are handled.

Try these solutions:

1. Pass the SparkSession explicitly to your function:
def write_delta_table(spark, dataframe, ext_path, table_name):
dataframe.write.mode("overwrite") \
.format('delta') \
.option('path', ext_path) \
.saveAsTable(table_name)

2. Use catalog commands explicitly:
def write_delta_table(dataframe, ext_path, table_name):
# Write data first
dataframe.write.mode("overwrite") \
.format('delta') \
.option('path', ext_path) \
.save()

# Then explicitly register with metastore
spark.sql(f"CREATE TABLE IF NOT EXISTS {table_name} USING DELTA LOCATION '{ext_path}'")

3. Check for schema conflicts:
Make sure your function isn't creating schema conflicts with existing tables. Try adding before your table creation.
:
spark.sql(f"DROP TABLE IF EXISTS {table_name}")

4. Ensure proper database context:
Explicitly set the database context within your function:
spark.sql("USE schema_name")

 

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now