cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Created nested struct schema SPARK - Schema Jira

weldermartins
Honored Contributor

Hello guys,

I'm using Jira API to return "ISSUES". But to be able to use pyspark I need to create the Dataframe passing in the Schema. But I am not able to create the Schema based on the model below. Would you have any ideas?

root
 |-- expand: string (nullable = true)
 |-- fields: struct (nullable = true)
 |    |-- aggregateprogress: struct (nullable = true)
 |    |    |-- progress: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- aggregatetimeestimate: string (nullable = true)
 |    |-- aggregatetimeoriginalestimate: string (nullable = true)
 |    |-- aggregatetimespent: string (nullable = true)
 |    |-- assignee: string (nullable = true)
 |    |-- attachment: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- comment: struct (nullable = true)
 |    |    |-- comments: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- author: struct (nullable = true)
 |    |    |    |    |    |-- accountId: string (nullable = true)
 |    |    |    |    |    |-- accountType: string (nullable = true)
 |    |    |    |    |    |-- active: boolean (nullable = true)
 |    |    |    |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |    |    |    |-- 48x48: string (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- emailAddress: string (nullable = true)
 |    |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |    |-- timeZone: string (nullable = true)
 |    |    |    |    |-- body: struct (nullable = true)
 |    |    |    |    |    |-- content: array (nullable = true)
 |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |-- content: array (nullable = true)
 |    |    |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |    |    |-- text: string (nullable = true)
 |    |    |    |    |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |-- version: long (nullable = true)
 |    |    |    |    |-- created: string (nullable = true)
 |    |    |    |    |-- id: string (nullable = true)
 |    |    |    |    |-- jsdPublic: boolean (nullable = true)
 |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |-- updateAuthor: struct (nullable = true)
 |    |    |    |    |    |-- accountId: string (nullable = true)
 |    |    |    |    |    |-- accountType: string (nullable = true)
 |    |    |    |    |    |-- active: boolean (nullable = true)
 |    |    |    |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |    |    |    |-- 48x48: string (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- emailAddress: string (nullable = true)
 |    |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |    |-- timeZone: string (nullable = true)
 |    |    |    |    |-- updated: string (nullable = true)
 |    |    |-- maxResults: long (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- startAt: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- components: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- created: string (nullable = true)
 |    |-- creator: struct (nullable = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- accountType: string (nullable = true)
 |    |    |-- active: boolean (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- displayName: string (nullable = true)
 |    |    |-- emailAddress: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- timeZone: string (nullable = true)
 |    |-- customfield_10001: string (nullable = true)
 |    |-- customfield_10002: string (nullable = true)
 |    |-- customfield_10003: string (nullable = true)
 |    |-- customfield_10004: string (nullable = true)
 |    |-- customfield_10005: string (nullable = true)
 |    |-- customfield_10006: string (nullable = true)
 |    |-- customfield_10007: string (nullable = true)
 |    |-- customfield_10008: string (nullable = true)
 |    |-- customfield_10009: string (nullable = true)
 |    |-- customfield_10010: string (nullable = true)
 |    |-- customfield_10014: string (nullable = true)
 |    |-- customfield_10015: string (nullable = true)
 |    |-- customfield_10016: string (nullable = true)
 |    |-- customfield_10017: string (nullable = true)
 |    |-- customfield_10018: struct (nullable = true)
 |    |    |-- hasEpicLinkFieldDependency: boolean (nullable = true)
 |    |    |-- nonEditableReason: struct (nullable = true)
 |    |    |    |-- message: string (nullable = true)
 |    |    |    |-- reason: string (nullable = true)
 |    |    |-- showField: boolean (nullable = true)
 |    |-- customfield_10019: string (nullable = true)
 |    |-- customfield_10020: string (nullable = true)
 |    |-- customfield_10021: string (nullable = true)
 |    |-- customfield_10022: string (nullable = true)
 |    |-- customfield_10023: string (nullable = true)
 |    |-- customfield_10024: string (nullable = true)
 |    |-- customfield_10025: string (nullable = true)
 |    |-- customfield_10026: string (nullable = true)
 |    |-- customfield_10027: string (nullable = true)
 |    |-- customfield_10028: string (nullable = true)
 |    |-- customfield_10029: string (nullable = true)
 |    |-- customfield_10030: string (nullable = true)
 |    |-- description: string (nullable = true)
 |    |-- duedate: string (nullable = true)
 |    |-- environment: string (nullable = true)
 |    |-- fixVersions: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- issuelinks: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- issuerestriction: struct (nullable = true)
 |    |    |-- shouldDisplay: boolean (nullable = true)
 |    |-- issuetype: struct (nullable = true)
 |    |    |-- avatarId: long (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- entityId: string (nullable = true)
 |    |    |-- hierarchyLevel: long (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- subtask: boolean (nullable = true)
 |    |-- labels: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- lastViewed: string (nullable = true)
 |    |-- priority: struct (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |-- progress: struct (nullable = true)
 |    |    |-- progress: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- project: struct (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- projectTypeKey: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- simplified: boolean (nullable = true)
 |    |-- reporter: struct (nullable = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- accountType: string (nullable = true)
 |    |    |-- active: boolean (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- displayName: string (nullable = true)
 |    |    |-- emailAddress: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- timeZone: string (nullable = true)
 |    |-- resolution: string (nullable = true)
 |    |-- resolutiondate: string (nullable = true)
 |    |-- security: string (nullable = true)
 |    |-- status: struct (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 
 |-- id: string (nullable = true)
 |-- key: string (nullable = true)
 |-- self: string (nullable = true)

17 REPLIES 17

It stayed the same.

jira2222

-werners-
Esteemed Contributor III

if columns are missing, that particular data is not present in the json. I am not aware of spark skipping columns when reading json with inferschema. There is an option dropFieldIfAllNull but that is False by default.

That makes me think: you might wanna look into the options of read.json

https://spark.apache.org/docs/latest/sql-data-sources-json.html

Now it's working, when the message returned that it was not parallelized I searched and found the answer. When creating the Dataframe I changed it to:

@Werner Stinckensโ€‹  Thanks for the support.

df = spark.read.json(sc.parallelize([answer.text]))