cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

when and otherwise issue

SailajaB
Valued Contributor III

Hi,

Here in our scenario we are reading json files as input and it contains nested structure. Few of the attributes are array type struct. Where we need to change name of nested ones. So we created a new structure and doing cast.

We are facing below problem while doing cast

For ex : test is a arry type struct

{"test":[{"nestedattr1":"df","columnfield":"er"}]

we need above one as

{"test":[{"nestedAttr1":"df","columnField":"er"}]

So we defined a new structure and applying cast but when we are receiving test as an empty array {"test":[]} the casting is getting failed. So we are trying to apply below code but its not working

df = df.withColumn("test",when(size(df.test)>0,col("test").cast(newteststruct)).otherwise(df.test))

error : cannot resolve '`test`' due to data type mismatch: cannot cast array<string> to array<struct>

Please add your comment to avoid this issue

1 ACCEPTED SOLUTION

Accepted Solutions

SailajaB
Valued Contributor III

We used below condition to resolve the issue

if dict(df.dtypes)['test'] != 'array<string>':

df = df.withColumn("test",col("test").cast(newteststruct))

Thank you

View solution in original post

5 REPLIES 5

AmanSehgal
Honored Contributor III

Can you provide the structure that you're using?

Also, a more elaborate sample input and output.

SailajaB
Valued Contributor III

Hi,

Thank you for the reply..

We are using below structure to change/cast the array type struct with nested new names

newteststruct= ArrayType(StructType([

StructField(""nestedAttr1" ,StringType()),

StructField("columnField" ,StringType())]))

Input will come from other source in json format and we are reading into databricks as df.

So here we are applying the schema level transformations as per business to get output in target schema.

So while casting we are facing an issue where when we get an empty array through the i/p extract.

Thank you

AmanSehgal
Honored Contributor III

Is it possible for you to replace {"test":[]} with {"test":[{"nestedattr1":"","columnfield":""}]} ?

Because I think THEN and ELSE expressions should have same type.

SailajaB
Valued Contributor III

Because I think THEN and ELSE expressions should have same type. I think yes

But we don't need to convert {"test":[]} to {"test":[{"nestedattr1":"","columnfield":""}]}

If we get test as an empty array we should avoid this conversion

If we get test as {"test":[{"nestedattr1":"df","columnfield":"er"}] then we have to proceed with conversion

Is there any way to achieve this? mostly at schema level instead of at each column level

SailajaB
Valued Contributor III

We used below condition to resolve the issue

if dict(df.dtypes)['test'] != 'array<string>':

df = df.withColumn("test",col("test").cast(newteststruct))

Thank you

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group