cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Has anyone come across an issue where a table join fails for a single row, when there is no reason for this to happen?

spyderfaye
New Contributor II

So, I have a super simple left join from one table to another it's purpose to retrieve the date of birth for a customer from the customer ID FK in the transaction table to the customer ID PK in the customer table. A customer will have several transaction lines and a single customer record so the expected behaviour is for a single date of birth to be retrieved for each of these lines (from which we calculate age at transaction date). However, every now and then, over the past 2 months, I get occurances where one transaction line will retrieve null from the customer table, despite the customer ID having several transaction lines and a DOB being retrieved for the other lines in the transaction, which tells me that the join was successful for the FK/PK except for a single row? This seems like really odd behaviour. I cannot repeat the issue for that row, the join will work on the next run of the workflow, the issue is intermittent but always just a single row, with nothing special about it. It has occured maybe 20 times in 2 months. It is idenified after some merge updates fail later in the workflow as the issue creates duplication (updated table grouped on customer and age - a customer can only be one age on a day so duplication shouldn't be an issue, except in this instance where we end up with the actual age against one line and "unknown" from null on the other). The duplicate is initially let in to the table via the merge update then the merge fails on the next run as a result, although the base table with the join is no longer repeating this bug (it feels like a bug) for this row. Has anyone come across this?

3 REPLIES 3

Noopur_Nigam
Databricks Employee
Databricks Employee

Hi @Faye Hughesโ€‹ It would be great if you could share the repro of the issue with input data in a notebook exported from your workspace. We can run the notebook and understand the behaviour to help you out.

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Faye Hughesโ€‹,

Could you share the error message? If there is no error message, can you provide the minimal steps to repro this behavior like Noopur said.

Vidula
Honored Contributor

Hi @Faye Hughesโ€‹ 

Thank you so much for getting back to us. It's really great of you to send in the solution and mark the answer as best. 

We really appreciate your time.

Wish you a great Databricks journey ahead!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group