cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

GC Driver Error

aschiff
Contributor II

I am using a cluster in databricks to connect to a Tableau workbook through the JDBC connector. My Tableau workbook has been unable to load due to resources not being available through the data connection. I went to look at the driver log for my cluster and I see Full GC (Ergonomics) errors and Full GC Allocation errors. How do I resolve this? I've tried increasing the storage of my driver and worker by changing them in my cluster but that didn't fix it.

33 REPLIES 33

I am officially lost. After attempting the above strategy I went offline for about an hour and came back to see the Tableau workbook loaded successfully and that beast CASE query is the sql tab in the spark ui. Furthermore, there are queries to tables I don't recall executing. They involve tables I never looked at/queried in databricks or tableau.

aschiff
Contributor II

I recreated the problematic workbook connecting to the same cluster and using the same data with its three sheets/charts successfully and all were able to load properly. I then went to databricks to look at the spark UI and the SQL tab to find out the query but none of it loaded (and I waited for it to). So I then restarted my cluster and refreshed my workbook (big mistake). It was struggling to load again. I restarted the cluster again and turned on photon acceleration.

Here are the queries for each sheet:

Sheet 1 that works fine: SELECT `salesforce_export_1_explorium_15sept2022`.`Contact_ID_18_digit` AS `contact_id_18_digit`,

`salesforce_export_1_explorium_15sept2022`.`Emails` AS `emails`,

`salesforce_export_1_explorium_15sept2022`.`Professional_email` AS `professional_email`,

`salesforce_export_1_explorium_15sept2022_professional_email_val`.`Status` AS `status`

FROM `default`.`salesforce_export_1_explorium_15sept2022` `salesforce_export_1_explorium_15sept2022`

JOIN `default`.`salesforce_export_1_explorium_15sept2022_professional_email_validation` `salesforce_export_1_explorium_15sept2022_professional_email_val` ON (`salesforce_export_1_explorium_15sept2022`.`Professional_email` = `salesforce_export_1_explorium_15sept2022_professional_email_val`.`Email`)

WHERE (CASE WHEN ((`salesforce_export_1_explorium_15sept2022_professional_email_val`.`Status` IN ('valid')) OR (`salesforce_export_1_explorium_15sept2022_professional_email_val`.`Status` IS NULL)) THEN false ELSE true END)

GROUP BY 1,

2,

3,

4

Sheet 2 that works fine but has a really messy query:

SELECT

(CASE WHEN ((CASE

WHEN (((CASE

WHEN ((CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END) IS NULL) THEN NULL

And the above repeats with the WHEN statements and the INSTR functions to become a very long query too long to copy and paste here.

I am unable to get the query for the third and troublesome sheet. I think I may have seen it in the SQL tab in spark UI when originally recreating the workbook before the "big mistake" of restarting the cluster but can't find it now. So as we discussed I created a post on Tableau community regarding finding the SQL query for a sheet in a workbook: https://community.tableau.com/s/question/0D58b0000ACAwyOCQT/how-to-extract-sql-query-from-a-specific...

Attached is the sql query data with the expanded blue rectangles for the query in sheet 2.

In terms of logically what the troublesome query could be similar to portions of the query from sheet 1 in my previous message. My guesstimate is as follows:

SELECT secondEmailAddress FROM `default`.`salesforce_export_1_explorium_15sept2022` `salesforce_export_1_explorium_15sept2022`

JOIN `default`.`salesforce_export_1_explorium_15sept2022_professional_email_validation` `salesforce_export_1_explorium_15sept2022_professional_email_val` ON (`salesforce_export_1_explorium_15sept2022`.`Professional_email` = `salesforce_export_1_explorium_15sept2022_professional_email_val`.`Email`)

WHERE (CASE WHEN ((`salesforce_export_1_explorium_15sept2022_professional_email_val`.`Status` IN ('valid')) OR (`salesforce_export_1_explorium_15sept2022_professional_email_val`.`Status` IS NULL)) THEN false ELSE true END) AND secondEmailAddress IS NOT null

only 447 records(email addresses) were returned. This is all I can logically tell you about the query, but as for the specific query itself I still don't have it.

galang123
New Contributor II

yesasd

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group