Databricks Community

Prajit0710 · ‎05-15-2025

Problem Statement:
When I execute the below code as a part of the notebook both manually and in workflow it works as expected
df.write.mode("overwrite") \
.format('delta') \
.option('path',ext_path) \
.saveAsTable("tbl_schema.Table_name")

but when I integrate the same code within the function and execute it still writs to the target but does not get recognised by HiveMetastore.

lingareddy_Alva · ‎05-15-2025

Hi @Prajit0710

This is an interesting issue where your Delta table write operation works as expected when run directly,
but when executed within a function, the table doesn't get recognized by the HiveMetastore.

The key difference is likely related to how Spark interacts with the HiveMetastore when registering tables
from within functions versus at the notebook level.
Here are a few potential causes and solutions:

1. Spark Session Context: Functions might be executing in a different Spark context/session than your main notebook code.
2. Catalog Registration: The table metadata might not be properly registering with the Hive metastore within the function context.
3. Transaction Boundaries: Function execution might affect how transactions are handled.

Try these solutions:

1. Pass the SparkSession explicitly to your function:
def write_delta_table(spark, dataframe, ext_path, table_name):
dataframe.write.mode("overwrite") \
.format('delta') \
.option('path', ext_path) \
.saveAsTable(table_name)

2. Use catalog commands explicitly:
def write_delta_table(dataframe, ext_path, table_name):
# Write data first
dataframe.write.mode("overwrite") \
.format('delta') \
.option('path', ext_path) \
.save()

# Then explicitly register with metastore
spark.sql(f"CREATE TABLE IF NOT EXISTS {table_name} USING DELTA LOCATION '{ext_path}'")

3. Check for schema conflicts:
Make sure your function isn't creating schema conflicts with existing tables. Try adding before your table creation.
:
spark.sql(f"DROP TABLE IF EXISTS {table_name}")

4. Ensure proper database context:
Explicitly set the database context within your function:
spark.sql("USE schema_name")

LR

View solution in original post

lingareddy_Alva · ‎05-15-2025