cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Has anyone come across an issue where a table join fails for a single row, when there is no reason for this to happen?

spyderfaye
New Contributor II

So, I have a super simple left join from one table to another it's purpose to retrieve the date of birth for a customer from the customer ID FK in the transaction table to the customer ID PK in the customer table. A customer will have several transaction lines and a single customer record so the expected behaviour is for a single date of birth to be retrieved for each of these lines (from which we calculate age at transaction date). However, every now and then, over the past 2 months, I get occurances where one transaction line will retrieve null from the customer table, despite the customer ID having several transaction lines and a DOB being retrieved for the other lines in the transaction, which tells me that the join was successful for the FK/PK except for a single row? This seems like really odd behaviour. I cannot repeat the issue for that row, the join will work on the next run of the workflow, the issue is intermittent but always just a single row, with nothing special about it. It has occured maybe 20 times in 2 months. It is idenified after some merge updates fail later in the workflow as the issue creates duplication (updated table grouped on customer and age - a customer can only be one age on a day so duplication shouldn't be an issue, except in this instance where we end up with the actual age against one line and "unknown" from null on the other). The duplicate is initially let in to the table via the merge update then the merge fails on the next run as a result, although the base table with the join is no longer repeating this bug (it feels like a bug) for this row. Has anyone come across this?

3 REPLIES 3

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @Faye Hughes​ It would be great if you could share the repro of the issue with input data in a notebook exported from your workspace. We can run the notebook and understand the behaviour to help you out.

jose_gonzalez
Moderator
Moderator

Hi @Faye Hughes​,

Could you share the error message? If there is no error message, can you provide the minimal steps to repro this behavior like Noopur said.

Vidula
Honored Contributor

Hi @Faye Hughes​ 

Thank you so much for getting back to us. It's really great of you to send in the solution and mark the answer as best. 

We really appreciate your time.

Wish you a great Databricks journey ahead!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.