- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-23-2022 08:49 AM
Hi,
Here in our scenario we are reading json files as input and it contains nested structure. Few of the attributes are array type struct. Where we need to change name of nested ones. So we created a new structure and doing cast.
We are facing below problem while doing cast
For ex : test is a arry type struct
{"test":[{"nestedattr1":"df","columnfield":"er"}]
we need above one as
{"test":[{"nestedAttr1":"df","columnField":"er"}]
So we defined a new structure and applying cast but when we are receiving test as an empty array {"test":[]} the casting is getting failed. So we are trying to apply below code but its not working
df = df.withColumn("test",when(size(df.test)>0,col("test").cast(newteststruct)).otherwise(df.test))
error : cannot resolve '`test`' due to data type mismatch: cannot cast array<string> to array<struct>
Please add your comment to avoid this issue
- Labels:
-
JSON Files
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-25-2022 06:10 AM
We used below condition to resolve the issue
if dict(df.dtypes)['test'] != 'array<string>':
df = df.withColumn("test",col("test").cast(newteststruct))
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-23-2022 07:53 PM
Can you provide the structure that you're using?
Also, a more elaborate sample input and output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-24-2022 12:58 AM
Hi,
Thank you for the reply..
We are using below structure to change/cast the array type struct with nested new names
newteststruct= ArrayType(StructType([
StructField(""nestedAttr1" ,StringType()),
StructField("columnField" ,StringType())]))
Input will come from other source in json format and we are reading into databricks as df.
So here we are applying the schema level transformations as per business to get output in target schema.
So while casting we are facing an issue where when we get an empty array through the i/p extract.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-24-2022 03:36 AM
Is it possible for you to replace {"test":[]} with {"test":[{"nestedattr1":"","columnfield":""}]} ?
Because I think THEN and ELSE expressions should have same type.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-24-2022 03:51 AM
Because I think THEN and ELSE expressions should have same type. I think yes
But we don't need to convert {"test":[]} to {"test":[{"nestedattr1":"","columnfield":""}]}
If we get test as an empty array we should avoid this conversion
If we get test as {"test":[{"nestedattr1":"df","columnfield":"er"}] then we have to proceed with conversion
Is there any way to achieve this? mostly at schema level instead of at each column level
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-25-2022 06:10 AM
We used below condition to resolve the issue
if dict(df.dtypes)['test'] != 'array<string>':
df = df.withColumn("test",col("test").cast(newteststruct))
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday
Looking for <aherf=”https://360digitmg.com/blog/data-engineering-jobs-in-bangalore” > data engineer fresher jobs in Bangalore </a>Jobs in Bangalore? Explore job roles, skills, salary insights, and companies hiring like Amazon, Flipkart, Google & Microsoft
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""