cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Created nested struct schema SPARK - Schema Jira

weldermartins
Honored Contributor

Hello guys,

I'm using Jira API to return "ISSUES". But to be able to use pyspark I need to create the Dataframe passing in the Schema. But I am not able to create the Schema based on the model below. Would you have any ideas?

root
 |-- expand: string (nullable = true)
 |-- fields: struct (nullable = true)
 |    |-- aggregateprogress: struct (nullable = true)
 |    |    |-- progress: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- aggregatetimeestimate: string (nullable = true)
 |    |-- aggregatetimeoriginalestimate: string (nullable = true)
 |    |-- aggregatetimespent: string (nullable = true)
 |    |-- assignee: string (nullable = true)
 |    |-- attachment: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- comment: struct (nullable = true)
 |    |    |-- comments: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- author: struct (nullable = true)
 |    |    |    |    |    |-- accountId: string (nullable = true)
 |    |    |    |    |    |-- accountType: string (nullable = true)
 |    |    |    |    |    |-- active: boolean (nullable = true)
 |    |    |    |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |    |    |    |-- 48x48: string (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- emailAddress: string (nullable = true)
 |    |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |    |-- timeZone: string (nullable = true)
 |    |    |    |    |-- body: struct (nullable = true)
 |    |    |    |    |    |-- content: array (nullable = true)
 |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |-- content: array (nullable = true)
 |    |    |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |    |    |-- text: string (nullable = true)
 |    |    |    |    |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |-- version: long (nullable = true)
 |    |    |    |    |-- created: string (nullable = true)
 |    |    |    |    |-- id: string (nullable = true)
 |    |    |    |    |-- jsdPublic: boolean (nullable = true)
 |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |-- updateAuthor: struct (nullable = true)
 |    |    |    |    |    |-- accountId: string (nullable = true)
 |    |    |    |    |    |-- accountType: string (nullable = true)
 |    |    |    |    |    |-- active: boolean (nullable = true)
 |    |    |    |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |    |    |    |-- 48x48: string (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- emailAddress: string (nullable = true)
 |    |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |    |-- timeZone: string (nullable = true)
 |    |    |    |    |-- updated: string (nullable = true)
 |    |    |-- maxResults: long (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- startAt: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- components: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- created: string (nullable = true)
 |    |-- creator: struct (nullable = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- accountType: string (nullable = true)
 |    |    |-- active: boolean (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- displayName: string (nullable = true)
 |    |    |-- emailAddress: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- timeZone: string (nullable = true)
 |    |-- customfield_10001: string (nullable = true)
 |    |-- customfield_10002: string (nullable = true)
 |    |-- customfield_10003: string (nullable = true)
 |    |-- customfield_10004: string (nullable = true)
 |    |-- customfield_10005: string (nullable = true)
 |    |-- customfield_10006: string (nullable = true)
 |    |-- customfield_10007: string (nullable = true)
 |    |-- customfield_10008: string (nullable = true)
 |    |-- customfield_10009: string (nullable = true)
 |    |-- customfield_10010: string (nullable = true)
 |    |-- customfield_10014: string (nullable = true)
 |    |-- customfield_10015: string (nullable = true)
 |    |-- customfield_10016: string (nullable = true)
 |    |-- customfield_10017: string (nullable = true)
 |    |-- customfield_10018: struct (nullable = true)
 |    |    |-- hasEpicLinkFieldDependency: boolean (nullable = true)
 |    |    |-- nonEditableReason: struct (nullable = true)
 |    |    |    |-- message: string (nullable = true)
 |    |    |    |-- reason: string (nullable = true)
 |    |    |-- showField: boolean (nullable = true)
 |    |-- customfield_10019: string (nullable = true)
 |    |-- customfield_10020: string (nullable = true)
 |    |-- customfield_10021: string (nullable = true)
 |    |-- customfield_10022: string (nullable = true)
 |    |-- customfield_10023: string (nullable = true)
 |    |-- customfield_10024: string (nullable = true)
 |    |-- customfield_10025: string (nullable = true)
 |    |-- customfield_10026: string (nullable = true)
 |    |-- customfield_10027: string (nullable = true)
 |    |-- customfield_10028: string (nullable = true)
 |    |-- customfield_10029: string (nullable = true)
 |    |-- customfield_10030: string (nullable = true)
 |    |-- description: string (nullable = true)
 |    |-- duedate: string (nullable = true)
 |    |-- environment: string (nullable = true)
 |    |-- fixVersions: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- issuelinks: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- issuerestriction: struct (nullable = true)
 |    |    |-- shouldDisplay: boolean (nullable = true)
 |    |-- issuetype: struct (nullable = true)
 |    |    |-- avatarId: long (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- entityId: string (nullable = true)
 |    |    |-- hierarchyLevel: long (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- subtask: boolean (nullable = true)
 |    |-- labels: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- lastViewed: string (nullable = true)
 |    |-- priority: struct (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |-- progress: struct (nullable = true)
 |    |    |-- progress: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- project: struct (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- projectTypeKey: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- simplified: boolean (nullable = true)
 |    |-- reporter: struct (nullable = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- accountType: string (nullable = true)
 |    |    |-- active: boolean (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- displayName: string (nullable = true)
 |    |    |-- emailAddress: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- timeZone: string (nullable = true)
 |    |-- resolution: string (nullable = true)
 |    |-- resolutiondate: string (nullable = true)
 |    |-- security: string (nullable = true)
 |    |-- status: struct (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 
 |-- id: string (nullable = true)
 |-- key: string (nullable = true)
 |-- self: string (nullable = true)

17 REPLIES 17

It stayed the same.

jira2222

-werners-
Esteemed Contributor III

if columns are missing, that particular data is not present in the json. I am not aware of spark skipping columns when reading json with inferschema. There is an option dropFieldIfAllNull but that is False by default.

That makes me think: you might wanna look into the options of read.json

https://spark.apache.org/docs/latest/sql-data-sources-json.html

Now it's working, when the message returned that it was not parallelized I searched and found the answer. When creating the Dataframe I changed it to:

@Werner Stinckensโ€‹  Thanks for the support.

df = spark.read.json(sc.parallelize([answer.text]))

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group