cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

Emiel_Smeenk
New Contributor III

Hello,

We are working to migrate to databricks runtime 10.4 LTS from 9.1 LTS but we're running into weird behavioral issues. Our existing code works up until runtime 10.3 and in 10.4 it stopped working.

Problem:

We have a nested json file that we are flattening into a spark data frame using the code below:

adaccountsdf = df.withColumn('Exp_Organizations', F.explode(F.col('organizations.organization')))\
                  .withColumn('Exp_AdAccounts', F.explode(F.col('Exp_Organizations.ad_accounts')))\
                  .select(F.col('Exp_Organizations.id').alias('organizationId'),
                                  F.col('Exp_Organizations.name').alias('organizationName'),
                                  F.col('Exp_AdAccounts.id').alias('adAccountId'),
                                  F.col('Exp_AdAccounts.name').alias('adAccountName'),
                                  F.col('Exp_AdAccounts.timezone').alias('timezone'))

Now when we query the dataframe it works when we do the following selects (hid results due to confidentiality):

display(adaccountsdf.select("*"))
 
OR
 
display(adaccountsdf)

imageWhen I display the schema of the dataframe we get the following:

root
 |-- organizationId: string (nullable = true)
 |-- organizationName: string (nullable = true)
 |-- adAccountId: string (nullable = true)
 |-- adAccountName: string (nullable = true)
 |-- timezone: string (nullable = true)

so everything looks like it should. The moment we start selecting the last 3 fields(adAccountId, adAccountName and timezone) we get the following error:

imageHowever when we select a single column it works fine:

image 

Does anyone know why this is happening? It's a very strange error that only shows up in databricks runtime 10.4. All previous runtimes incl 10.3, 10.2,10.1 and 9.1 LTS work fine. The issue seems to be caused by using the explode function on an already exploded column in the dataframe.

UPDATE:

For some reason when I run adaccountsdf.cache() before I run my select statements the issue disappears. Would still like to know what's causing this issue in runtime 10.4 but not the other ones.

1 ACCEPTED SOLUTION

Accepted Solutions

Emiel_Smeenk
New Contributor III

It seems like the issue was miraculously resolved. I did not make any code changes but everything is now running as expected.

Maybe the latest runtime 10.4 fix released on April 19th also resolved this issue unintentionally.

View solution in original post

11 REPLIES 11

Kaniz
Community Manager
Community Manager

Hi @Emiel Smeenk​ ,

This guide helps you migrate your Azure Databricks workloads to the latest version of Databricks Runtime 10.x.

Databricks recommends that you migrate your workloads to a supported Databricks Runtime LTS version from that version’s most recent supported LTS version.

Therefore, this article focuses on migrating workloads from Databricks Runtime 9.1 LTS to Databricks Runtime 10.4 LTS.

Emiel_Smeenk
New Contributor III

It seems like the issue was miraculously resolved. I did not make any code changes but everything is now running as expected.

Maybe the latest runtime 10.4 fix released on April 19th also resolved this issue unintentionally.

Nirupam
New Contributor III

@Emiel Smeenk​ 

We were facing the same issue and suddenly 2022-Apr-20 onwards it resolved itself.

Question:- Is there any website where I can see/track these "patches"?

Edit: Added Question.

Kaniz
Community Manager
Community Manager

Hi @Nirupam Nishant​  and @Emiel Smeenk​ , This page lists maintenance updates issued for Databricks Runtime releases.

April 19, 2022 - Maintenance updates

  • We upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.
  • We fixed an issue with notebook-scoped libraries not working in batch streaming jobs.
  • [SPARK-38616][SQL] Keep track of SQL query text in Catalyst TreeNode
  • Operating system security updates.

Kaniz
Community Manager
Community Manager

Hi @Nirupam Nishant​ , Just a friendly follow-up. Do you still need help, or does my response help you to find the solution? Please let us know.

Nirupam
New Contributor III

@Kaniz Fatma​ 

Your answer suffices my query. Thanks!

In addition, for fellow developers, I later noticed that these release notes are also available on the home screen of your Databricks workspace.

Kaniz
Community Manager
Community Manager

Hi @Nirupam Nishant​ , Thank you for the update and the valuable message for our community members. Since my answer suffices your query, would you like to mark my answer as the best?

Nirupam
New Contributor III

@Kaniz Fatma​ I did not ask the original question.

@Emiel Smeenk​ had asked and answered his own question stating that the issue was fixed on its own (probably due to latest patch).

Kaniz
Community Manager
Community Manager

No worries @Nirupam Nishant​ . Either of you can mark the best answer. As the initial question was answered by @Emiel Smeenk​  himself, you can mark his answer as the best.

Emiel_Smeenk
New Contributor III

Issue resolved on its own so selected that as the best answer for this post.

Thanks,

Emiel

Kaniz
Community Manager
Community Manager

Awesome. Thank you @Emiel Smeenk​ 😊 .

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.