02-23-2022 08:49 AM
Hi,
Here in our scenario we are reading json files as input and it contains nested structure. Few of the attributes are array type struct. Where we need to change name of nested ones. So we created a new structure and doing cast.
We are facing below problem while doing cast
For ex : test is a arry type struct
{"test":[{"nestedattr1":"df","columnfield":"er"}]
we need above one as
{"test":[{"nestedAttr1":"df","columnField":"er"}]
So we defined a new structure and applying cast but when we are receiving test as an empty array {"test":[]} the casting is getting failed. So we are trying to apply below code but its not working
df = df.withColumn("test",when(size(df.test)>0,col("test").cast(newteststruct)).otherwise(df.test))
error : cannot resolve '`test`' due to data type mismatch: cannot cast array<string> to array<struct>
Please add your comment to avoid this issue
02-25-2022 06:10 AM
We used below condition to resolve the issue
if dict(df.dtypes)['test'] != 'array<string>':
df = df.withColumn("test",col("test").cast(newteststruct))
Thank you
02-23-2022 07:53 PM
Can you provide the structure that you're using?
Also, a more elaborate sample input and output.
02-24-2022 12:58 AM
Hi,
Thank you for the reply..
We are using below structure to change/cast the array type struct with nested new names
newteststruct= ArrayType(StructType([
StructField(""nestedAttr1" ,StringType()),
StructField("columnField" ,StringType())]))
Input will come from other source in json format and we are reading into databricks as df.
So here we are applying the schema level transformations as per business to get output in target schema.
So while casting we are facing an issue where when we get an empty array through the i/p extract.
Thank you
02-24-2022 03:36 AM
Is it possible for you to replace {"test":[]} with {"test":[{"nestedattr1":"","columnfield":""}]} ?
Because I think THEN and ELSE expressions should have same type.
02-24-2022 03:51 AM
Because I think THEN and ELSE expressions should have same type. I think yes
But we don't need to convert {"test":[]} to {"test":[{"nestedattr1":"","columnfield":""}]}
If we get test as an empty array we should avoid this conversion
If we get test as {"test":[{"nestedattr1":"df","columnfield":"er"}] then we have to proceed with conversion
Is there any way to achieve this? mostly at schema level instead of at each column level
02-25-2022 06:10 AM
We used below condition to resolve the issue
if dict(df.dtypes)['test'] != 'array<string>':
df = df.withColumn("test",col("test").cast(newteststruct))
Thank you
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group