cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Question: Issue with Overwatch Deployment on Databricks (on AWS) - Missing Tables in Gold Schema

jiteshraut20
New Contributor III

Hi all,

I'm working on setting up Overwatch in our Databricks workspace to monitor resources, and I've encountered an issue during overwatch deployment. I am able to deploy overwatch, but the validation for the `Gold_jobRunCostPotentialFact` module failed with the following error:

> "Unsupported component type interface scala.collection.Seq in arrays."

Due to this error, a table is missing in the gold schema: `overwatch_global.jobRunCostPotentialFact`. this tables is essential for our reporting needs.

What I've tried so far:
- I attempted using different versions of the Overwatch libraries installed on the compute cluster, but the same error persists.
- I deployed each layer of the Medallion architecture for Overwatch separately. While all deployments were successful, the mentioned error still appeared in the logs, and the table remains missing from the consumer database.

I am using Scala version 2.12, Spark version 3.5.0 and DBR 15.3 ML

Has anyone else encountered this issue or have suggestions on how to resolve it? Any help would be greatly appreciated!

Thanks in advance!

Jitesh Raut
1 ACCEPTED SOLUTION

Accepted Solutions

jiteshraut20
New Contributor III

HI @SriramMohanty , I have solved the issue and explained all the steps in blog post. The official documentation is outdated and incomplete. 

If anybody is facing same issue, please refer to my blog post 

Hi, If anybody is facing problem with Overwatch deployment please refer to my blog post or contact me @ contact@jiteshraut.me

https://medium.com/@deploytoprod/deploying-overwatch-on-databricks-aws-with-system-tables-as-the-dat...

 

Jitesh Raut

View solution in original post

8 REPLIES 8

jiteshraut20
New Contributor III

I referred below docs and blogs for setting up Overwatch:
Overwatch -Databricks own Observability tool by @SriramMohanty 

Jitesh Raut

SriramMohanty
Databricks Employee
Databricks Employee

Hi @SriramMohanty ,
I've tried DBR 11.3LTS as you suggested, but I'm facing issues. It seems like DBR 11.3 doesn't support delta table features. I'm using Overwatch version 0.8.1.2.

When I ran validation using a config file in Delta format, I got a NoClassDefFoundError. When I used a Delta Table instead, I got a DeltaTableFeatureException, indicating that the version of Databricks I'm using doesn't support the required reader table features, specifically deletionVectors.
Let me know if you'd like any further changes! 
If you could share some resources that might be helpful, that would be a great help.

Thank you!

Jitesh Raut

SriramMohanty
Databricks Employee
Databricks Employee

Hi  @jiteshraut20 ,

If you are using CSV file please use the below code to convert it into delta.

 

spark.read .option("header", "true") .option("ignoreLeadingWhiteSpace", true) .option("ignoreTrailingWhiteSpace", true) .csv("/path/to/config.csv") .coalesce(1) .write.format("delta") .save("/myPath/overwatch/configs/prod_config")

Please validate the data which is present in "/myPath/overwatch/configs/prod_config" and use the below code to run the pipeline

import com.databricks.labs.overwatch.MultiWorkspaceDeployment
val configTable = "/myPath/overwatch/configs/prod_config" // Path to the config table
val tempLocation = "/tmp/overwatch/templocation"
MultiWorkspaceDeployment(configTable, tempLocation).deploy(1)

jiteshraut20
New Contributor III

HI @SriramMohanty , I have solved the issue and explained all the steps in blog post. The official documentation is outdated and incomplete. 

If anybody is facing same issue, please refer to my blog post 

Hi, If anybody is facing problem with Overwatch deployment please refer to my blog post or contact me @ contact@jiteshraut.me

https://medium.com/@deploytoprod/deploying-overwatch-on-databricks-aws-with-system-tables-as-the-dat...

 

Jitesh Raut

SriramMohanty
Databricks Employee
Databricks Employee

Hi @jiteshraut20 ,

Can you please explain what is outdated in the official document as we update the official documentation frequently with every release. 

jiteshraut20
New Contributor III

Hi @SriramMohanty ,

I encountered a few issues while configuring Overwatch based on the official documentation, and I wanted to highlight some of them for your reference:

  1. Overwatch Configuration Changes: The documentation does not reflect recent changes in the Overwatch config file. For example, the column previously named etl_storage_prefix is now required as storage_prefix, but this update is not mentioned.

  2. Databricks Runtime (DBR) Version Requirements: The documentation does not specify the required DBR version for the latest Overwatch JAR. I found that Overwatch version 0.8.1.2 works without errors only on DBR 13.3 LTS. The document linked in your previous replies does not clearly state this compatibility. Using DBR 11.3, which is mentioned, leads to issues with the latest JAR. Thanks to the Databricks team, I was able to resolve this by using the correct DBR version.

  3. SQL Query History Gold Deployment Error: There is an error in the deployment of sql_query_history_gold (Gold View: Create view failed, column name error_message cannot be resolved), which is resolved in Overwatch version 0.8.1.2. This information is not covered in the documentation.

  4. I had to install some of the missing libraries on the cluster:

    - org.scalaj:scalaj-http_2.12:2.4.2

    - dataframe_rules_engine_2.12:0.2.0

These are some of the points I noticed were missing from the documentation. While the documentation has been a valuable resource, it could be even more effective with proper updates. Please correct me if Iā€™m wrong.

The contacts you provided were extremely helpful, and thanks to you and them, I was able to figure things out.

Jitesh Raut

SriramMohanty
Databricks Employee
Databricks Employee

Hi @jiteshraut20 ,

1) Storage_prefix: It is updated in the documents please reffer config .

2)If the system table is in use, the recommended Databricks runtime version is 13.3 LTS. For other cases, 11.3 LTS should work seamlessly. Please see the documentation on system table requirements. doc: system table requirements 

3)For fresh deployments on version 8.1.2, you should not encounter any exceptions. This is why we always recommend using the latest version of Overwatch.

4)The following two jars are not required and, therefore, not mentioned in the documentation: org.scalaj:scalaj-http_2.12:2.4.2 and dataframe_rules_engine_2.12:0.2.0.

 

Thanks,

Sriram

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonā€™t want to miss the chance to attend and share knowledge.

If there isnā€™t a group near you, start one and help create a community that brings people together.

Request a New Group