cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Created nested struct schema SPARK - Schema Jira

weldermartins
Honored Contributor

Hello guys,

I'm using Jira API to return "ISSUES". But to be able to use pyspark I need to create the Dataframe passing in the Schema. But I am not able to create the Schema based on the model below. Would you have any ideas?

root
 |-- expand: string (nullable = true)
 |-- fields: struct (nullable = true)
 |    |-- aggregateprogress: struct (nullable = true)
 |    |    |-- progress: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- aggregatetimeestimate: string (nullable = true)
 |    |-- aggregatetimeoriginalestimate: string (nullable = true)
 |    |-- aggregatetimespent: string (nullable = true)
 |    |-- assignee: string (nullable = true)
 |    |-- attachment: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- comment: struct (nullable = true)
 |    |    |-- comments: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- author: struct (nullable = true)
 |    |    |    |    |    |-- accountId: string (nullable = true)
 |    |    |    |    |    |-- accountType: string (nullable = true)
 |    |    |    |    |    |-- active: boolean (nullable = true)
 |    |    |    |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |    |    |    |-- 48x48: string (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- emailAddress: string (nullable = true)
 |    |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |    |-- timeZone: string (nullable = true)
 |    |    |    |    |-- body: struct (nullable = true)
 |    |    |    |    |    |-- content: array (nullable = true)
 |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |-- content: array (nullable = true)
 |    |    |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |    |    |-- text: string (nullable = true)
 |    |    |    |    |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |-- type: string (nullable = true)
 |    |    |    |    |    |-- version: long (nullable = true)
 |    |    |    |    |-- created: string (nullable = true)
 |    |    |    |    |-- id: string (nullable = true)
 |    |    |    |    |-- jsdPublic: boolean (nullable = true)
 |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |-- updateAuthor: struct (nullable = true)
 |    |    |    |    |    |-- accountId: string (nullable = true)
 |    |    |    |    |    |-- accountType: string (nullable = true)
 |    |    |    |    |    |-- active: boolean (nullable = true)
 |    |    |    |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |    |    |    |-- 48x48: string (nullable = true)
 |    |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |    |-- emailAddress: string (nullable = true)
 |    |    |    |    |    |-- self: string (nullable = true)
 |    |    |    |    |    |-- timeZone: string (nullable = true)
 |    |    |    |    |-- updated: string (nullable = true)
 |    |    |-- maxResults: long (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- startAt: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- components: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- created: string (nullable = true)
 |    |-- creator: struct (nullable = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- accountType: string (nullable = true)
 |    |    |-- active: boolean (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- displayName: string (nullable = true)
 |    |    |-- emailAddress: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- timeZone: string (nullable = true)
 |    |-- customfield_10001: string (nullable = true)
 |    |-- customfield_10002: string (nullable = true)
 |    |-- customfield_10003: string (nullable = true)
 |    |-- customfield_10004: string (nullable = true)
 |    |-- customfield_10005: string (nullable = true)
 |    |-- customfield_10006: string (nullable = true)
 |    |-- customfield_10007: string (nullable = true)
 |    |-- customfield_10008: string (nullable = true)
 |    |-- customfield_10009: string (nullable = true)
 |    |-- customfield_10010: string (nullable = true)
 |    |-- customfield_10014: string (nullable = true)
 |    |-- customfield_10015: string (nullable = true)
 |    |-- customfield_10016: string (nullable = true)
 |    |-- customfield_10017: string (nullable = true)
 |    |-- customfield_10018: struct (nullable = true)
 |    |    |-- hasEpicLinkFieldDependency: boolean (nullable = true)
 |    |    |-- nonEditableReason: struct (nullable = true)
 |    |    |    |-- message: string (nullable = true)
 |    |    |    |-- reason: string (nullable = true)
 |    |    |-- showField: boolean (nullable = true)
 |    |-- customfield_10019: string (nullable = true)
 |    |-- customfield_10020: string (nullable = true)
 |    |-- customfield_10021: string (nullable = true)
 |    |-- customfield_10022: string (nullable = true)
 |    |-- customfield_10023: string (nullable = true)
 |    |-- customfield_10024: string (nullable = true)
 |    |-- customfield_10025: string (nullable = true)
 |    |-- customfield_10026: string (nullable = true)
 |    |-- customfield_10027: string (nullable = true)
 |    |-- customfield_10028: string (nullable = true)
 |    |-- customfield_10029: string (nullable = true)
 |    |-- customfield_10030: string (nullable = true)
 |    |-- description: string (nullable = true)
 |    |-- duedate: string (nullable = true)
 |    |-- environment: string (nullable = true)
 |    |-- fixVersions: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- issuelinks: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- issuerestriction: struct (nullable = true)
 |    |    |-- shouldDisplay: boolean (nullable = true)
 |    |-- issuetype: struct (nullable = true)
 |    |    |-- avatarId: long (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- entityId: string (nullable = true)
 |    |    |-- hierarchyLevel: long (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- subtask: boolean (nullable = true)
 |    |-- labels: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- lastViewed: string (nullable = true)
 |    |-- priority: struct (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |-- progress: struct (nullable = true)
 |    |    |-- progress: long (nullable = true)
 |    |    |-- total: long (nullable = true)
 |    |-- project: struct (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- projectTypeKey: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- simplified: boolean (nullable = true)
 |    |-- reporter: struct (nullable = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- accountType: string (nullable = true)
 |    |    |-- active: boolean (nullable = true)
 |    |    |-- avatarUrls: struct (nullable = true)
 |    |    |    |-- 16x16: string (nullable = true)
 |    |    |    |-- 24x24: string (nullable = true)
 |    |    |    |-- 32x32: string (nullable = true)
 |    |    |    |-- 48x48: string (nullable = true)
 |    |    |-- displayName: string (nullable = true)
 |    |    |-- emailAddress: string (nullable = true)
 |    |    |-- self: string (nullable = true)
 |    |    |-- timeZone: string (nullable = true)
 |    |-- resolution: string (nullable = true)
 |    |-- resolutiondate: string (nullable = true)
 |    |-- security: string (nullable = true)
 |    |-- status: struct (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- iconUrl: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 
 |-- id: string (nullable = true)
 |-- key: string (nullable = true)
 |-- self: string (nullable = true)

17 REPLIES 17

It stayed the same.

jira2222

-werners-
Esteemed Contributor III

if columns are missing, that particular data is not present in the json. I am not aware of spark skipping columns when reading json with inferschema. There is an option dropFieldIfAllNull but that is False by default.

That makes me think: you might wanna look into the options of read.json

https://spark.apache.org/docs/latest/sql-data-sources-json.html

Now it's working, when the message returned that it was not parallelized I searched and found the answer. When creating the Dataframe I changed it to:

@Werner Stinckens​  Thanks for the support.

df = spark.read.json(sc.parallelize([answer.text]))

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!