cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

GC Driver Error

aschiff
Contributor II

I am using a cluster in databricks to connect to a Tableau workbook through the JDBC connector. My Tableau workbook has been unable to load due to resources not being available through the data connection. I went to look at the driver log for my cluster and I see Full GC (Ergonomics) errors and Full GC Allocation errors. How do I resolve this? I've tried increasing the storage of my driver and worker by changing them in my cluster but that didn't fix it.

32 REPLIES 32

Dooley
Valued Contributor

Thanks for the files.

(1) So in this Databricks Shell - Details for Query file, in the graph you need to expand the blocks and see if you see any data spill in there.

(2) Can you please submit the query that you were doing that ran into this problem? How much data are you running this on? Is that data partitioned by you in a custom way?

Side note: I did see in one of the stages you are doing a count distinct and I wanted to draw your attention to a Approx Count Distinct SQL function.

I tried to get these answers for you but all the spark UI tabs are coming up blank/default again. I've tried restarting my cluster to solve the issue like before but no dice. Is it possible I can give you permission to look at my cluster?

Dooley
Valued Contributor

Usually it is blank because the page is still loading. You should try to wait to see it comes up. Plus the cluster should be on and active & you just run the job to see it in Spark UI.

Dooley
Valued Contributor

Also can you run the command in tableau that caused that issue and then look at the cluster to identify the job number that is associated to that effort? I can see the SQL queries in your stages tab but they are cut off, so I would like to see the full queries that are made that give you this type of error.

I thought the query was select * from default.salesforce_export_1_explorium_15sept2022 but it is a lot more. Attached is the 878 pages of word document it took to copy and paste the query. The table I am asking select * from is only 700KB. The job ids associated with this ridiculous query are 293, 294, and 295

And here is a picture of the query details with the boxes expanded to show if a data spill happened.

Dooley
Valued Contributor

Did this JDBC from Databricks to Tableau Online work with a simple query?

Example of simple:

SELECT 'Hello World';

I am not sure how to test queries in a Tableau Online to Databricks connection but I was able to run a different Tableau workbook attached to the same cluster and this query below was in the SQL tab of the Spark UI.

SELECT (CASE WHEN (`delta_table_campaign`.`Name` IN ('CRE_MAT_2023', 'CRE_MAT_2023_Email_Had_Prior_Communication', 'CRE_MAT_2023_Email_NO_Prior_Communication')) THEN 'CRE_MAT_2023 campaign group' ELSE `delta_table_campaign`.`Name` END) AS `name__group_`,

COUNT(DISTINCT `delta_table_contact`.`Id`) AS `usr_calculation_496170019756503042_ok`

FROM `salesforce_raw`.`delta_table_contact` `delta_table_contact`

JOIN `salesforce_raw`.`delta_table_campaignmember` `delta_table_campaignmember` ON (`delta_table_contact`.`Id` = `delta_table_campaignmember`.`ContactId`)

JOIN `salesforce_raw`.`delta_table_campaign` `delta_table_campaign` ON (`delta_table_campaignmember`.`CampaignId` = `delta_table_campaign`.`Id`)

WHERE ((NOT (`delta_table_contact`.`LaunchLeads_Status__c` IS NULL)) AND ((CASE WHEN (`delta_table_campaign`.`Name` IN ('CRE_MAT_2023', 'CRE_MAT_2023_Email_Had_Prior_Communication', 'CRE_MAT_2023_Email_NO_Prior_Communication')) THEN 'CRE_MAT_2023 campaign group' ELSE `delta_table_campaign`.`Name` END) IN ('CRE_MAT_Q1_2023', 'CRE_MAT_Q1_2023_PART_1_MF', 'CRE_MAT_Q1_2023_PART_2_TEXT', 'CRE_MAT_Q3_2024_OTHER', 'CRE_MAT_Q3_2024_OTHER_PART_1', 'CRE_MAT_Q3_2024_OTHER_PART_2')))

GROUP BY 1

Dooley
Valued Contributor

I know you can't figure out exactly what the SQL query is until Tableau tells you a way to download that from Tableau Online, but with the query you just gave that you think is the problem query, did you test that in Databricks to see if it runs without an error?

My thought is, if it runs on Databricks directly yet Tableau it does not, then it is the Tableau support team that needs to help you with this problem. If it doesn't work on our side, then we can look at the Spark UI of that specific run to try to identify what is the problem. I see what you sent for the other queries and it seems Photon is helping with those joins, so when you run it on our side, please make sure to use the Photon enabled cluster you have been using so we can troubleshoot this further.

I will try what you suggested and see what happens. We are currently working on other projects while all this databricks/tableau stuff is happening and we have figured out how to get the information from this query without running it in Tableau. Obviously I want to avoid this problem in the future so I would like to solve this regardless. I tried recreating the problematic sheet again and got stuck loading once applying a filter. This query was the last to run in databricks spark UI to show me all the "statuses" before I then filtered on them causing the issue.

SELECT `salesforce_export_1_explorium_15sept2022_professional_email_val`.`Status` AS `status`

FROM `default`.`salesforce_export_1_explorium_15sept2022_professional_email_validation` `salesforce_export_1_explorium_15sept2022_professional_email_val`

GROUP BY 1

ORDER BY `status` ASC

Below is a picture of a failed query, this is the really long case when INSTR stuff I talked about earlier and wonder if you can click in to see the full query.

image.pngIf you can't see the whole thing, here is some of what I was able to copy and paste (and the query keeps going on like this for what seems like forever) :

SELECT

(CASE WHEN ((CASE

WHEN (((CASE

WHEN ((CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END) IS NULL) THEN NULL

WHEN (CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END) < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, 'address' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST((CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END) AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - ((CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END)) + 1 AS INT)), 'address' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST((CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END) AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - ((CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END)) + 1 AS INT)), 'address' ) + (CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END) - 1

END) + 10) IS NULL) THEN NULL

WHEN ((CASE

WHEN ((CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

ELSE INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) + 0 - 1

END) IS NULL) THEN NULL

WHEN (CASE

WHEN (0 IS NULL) THEN NULL

WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )

WHEN 0 = INSTR( SUBSTRING(`salesforce_export_1_explorium_15sept2022`.`Emails`,CAST(0 AS INT),CAST(LENGTH(`salesforce_export_1_explorium_15sept2022`.`Emails`) - (0) + 1 AS INT)), '}' ) THEN 0

Dooley
Valued Contributor

It's hilarious to me that that SELECT query turned into that beast with a filter button in Tableau. So just to confirm, this CASE beast you have here does not run on Databricks when you copy and paste it to run in a notebook on the cluster you use for Tableau Online?

spark.sql("SQL QUERY HERE")

Dooley
Valued Contributor

or just run the cell as SQL

Dooley
Valued Contributor

So looking at this more closely, this looks like a Tableau problem. To go from that nice SELECT statement to this mess looks like a bug on Tableau's side. I mean look at the query, "WHEN (0 IS NULL) THEN NULL" and then "WHEN 0 < 1 THEN INSTR( `salesforce_export_1_explorium_15sept2022`.`Emails`, '}' )" When is 0 not less than 1? Why does a filter have to have an CASE statement like that for a filter? You see these strange portions of the query repeated in that SQL beast like a broken record.

I'm not sure why a WHERE clause wouldn't work so here is my work around Tableau, change the SQL query and create a table that does the filter and then read that into Tableau.

CREATE TABLE add_new_tablename_here AS SELECT `salesforce_export_1_explorium_15sept2022_professional_email_val`.`Status` AS `status`
FROM `default`.`salesforce_export_1_explorium_15sept2022_professional_email_validation` `salesforce_export_1_explorium_15sept2022_professional_email_val`
GROUP BY 1
ORDER BY `status` ASC
WHERE column_you_are_filter_clause_goes_here

I hope this helps.

Yeah this is very weird. Earlier today I was able to load the tableau sheet just fine and also run that beast CASE query in a databricks notebook. But then a few hours ago the tableau sheets weren't loading and everything messed up again. I will look into trying your solution and let you know what happens, fingers crossed.

I tried your strategy of doing the status filtering and making the table in databricks then connecting to this new table in tableau with the same cluster. Everything went well when connecting to the table in tableau and making the sheet but when I hit publish workbook and gave the workbook a name, the continuous loading leading to a "Resource temporarily unavailable" error happened. This certainly seems to be a tableau issue, at least we isolated the problem right? 🙂

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.