10-08-2022 06:20 AM
Hello guys,
I'm using Jira API to return "ISSUES". But to be able to use pyspark I need to create the Dataframe passing in the Schema. But I am not able to create the Schema based on the model below. Would you have any ideas?
root
|-- expand: string (nullable = true)
|-- fields: struct (nullable = true)
| |-- aggregateprogress: struct (nullable = true)
| | |-- progress: long (nullable = true)
| | |-- total: long (nullable = true)
| |-- aggregatetimeestimate: string (nullable = true)
| |-- aggregatetimeoriginalestimate: string (nullable = true)
| |-- aggregatetimespent: string (nullable = true)
| |-- assignee: string (nullable = true)
| |-- attachment: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- comment: struct (nullable = true)
| | |-- comments: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- author: struct (nullable = true)
| | | | | |-- accountId: string (nullable = true)
| | | | | |-- accountType: string (nullable = true)
| | | | | |-- active: boolean (nullable = true)
| | | | | |-- avatarUrls: struct (nullable = true)
| | | | | | |-- 16x16: string (nullable = true)
| | | | | | |-- 24x24: string (nullable = true)
| | | | | | |-- 32x32: string (nullable = true)
| | | | | | |-- 48x48: string (nullable = true)
| | | | | |-- displayName: string (nullable = true)
| | | | | |-- emailAddress: string (nullable = true)
| | | | | |-- self: string (nullable = true)
| | | | | |-- timeZone: string (nullable = true)
| | | | |-- body: struct (nullable = true)
| | | | | |-- content: array (nullable = true)
| | | | | | |-- element: struct (containsNull = true)
| | | | | | | |-- content: array (nullable = true)
| | | | | | | | |-- element: struct (containsNull = true)
| | | | | | | | | |-- text: string (nullable = true)
| | | | | | | | | |-- type: string (nullable = true)
| | | | | | | |-- type: string (nullable = true)
| | | | | |-- type: string (nullable = true)
| | | | | |-- version: long (nullable = true)
| | | | |-- created: string (nullable = true)
| | | | |-- id: string (nullable = true)
| | | | |-- jsdPublic: boolean (nullable = true)
| | | | |-- self: string (nullable = true)
| | | | |-- updateAuthor: struct (nullable = true)
| | | | | |-- accountId: string (nullable = true)
| | | | | |-- accountType: string (nullable = true)
| | | | | |-- active: boolean (nullable = true)
| | | | | |-- avatarUrls: struct (nullable = true)
| | | | | | |-- 16x16: string (nullable = true)
| | | | | | |-- 24x24: string (nullable = true)
| | | | | | |-- 32x32: string (nullable = true)
| | | | | | |-- 48x48: string (nullable = true)
| | | | | |-- displayName: string (nullable = true)
| | | | | |-- emailAddress: string (nullable = true)
| | | | | |-- self: string (nullable = true)
| | | | | |-- timeZone: string (nullable = true)
| | | | |-- updated: string (nullable = true)
| | |-- maxResults: long (nullable = true)
| | |-- self: string (nullable = true)
| | |-- startAt: long (nullable = true)
| | |-- total: long (nullable = true)
| |-- components: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- created: string (nullable = true)
| |-- creator: struct (nullable = true)
| | |-- accountId: string (nullable = true)
| | |-- accountType: string (nullable = true)
| | |-- active: boolean (nullable = true)
| | |-- avatarUrls: struct (nullable = true)
| | | |-- 16x16: string (nullable = true)
| | | |-- 24x24: string (nullable = true)
| | | |-- 32x32: string (nullable = true)
| | | |-- 48x48: string (nullable = true)
| | |-- displayName: string (nullable = true)
| | |-- emailAddress: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- timeZone: string (nullable = true)
| |-- customfield_10001: string (nullable = true)
| |-- customfield_10002: string (nullable = true)
| |-- customfield_10003: string (nullable = true)
| |-- customfield_10004: string (nullable = true)
| |-- customfield_10005: string (nullable = true)
| |-- customfield_10006: string (nullable = true)
| |-- customfield_10007: string (nullable = true)
| |-- customfield_10008: string (nullable = true)
| |-- customfield_10009: string (nullable = true)
| |-- customfield_10010: string (nullable = true)
| |-- customfield_10014: string (nullable = true)
| |-- customfield_10015: string (nullable = true)
| |-- customfield_10016: string (nullable = true)
| |-- customfield_10017: string (nullable = true)
| |-- customfield_10018: struct (nullable = true)
| | |-- hasEpicLinkFieldDependency: boolean (nullable = true)
| | |-- nonEditableReason: struct (nullable = true)
| | | |-- message: string (nullable = true)
| | | |-- reason: string (nullable = true)
| | |-- showField: boolean (nullable = true)
| |-- customfield_10019: string (nullable = true)
| |-- customfield_10020: string (nullable = true)
| |-- customfield_10021: string (nullable = true)
| |-- customfield_10022: string (nullable = true)
| |-- customfield_10023: string (nullable = true)
| |-- customfield_10024: string (nullable = true)
| |-- customfield_10025: string (nullable = true)
| |-- customfield_10026: string (nullable = true)
| |-- customfield_10027: string (nullable = true)
| |-- customfield_10028: string (nullable = true)
| |-- customfield_10029: string (nullable = true)
| |-- customfield_10030: string (nullable = true)
| |-- description: string (nullable = true)
| |-- duedate: string (nullable = true)
| |-- environment: string (nullable = true)
| |-- fixVersions: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- issuelinks: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- issuerestriction: struct (nullable = true)
| | |-- shouldDisplay: boolean (nullable = true)
| |-- issuetype: struct (nullable = true)
| | |-- avatarId: long (nullable = true)
| | |-- description: string (nullable = true)
| | |-- entityId: string (nullable = true)
| | |-- hierarchyLevel: long (nullable = true)
| | |-- iconUrl: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- subtask: boolean (nullable = true)
| |-- labels: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- lastViewed: string (nullable = true)
| |-- priority: struct (nullable = true)
| | |-- iconUrl: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- self: string (nullable = true)
| |-- progress: struct (nullable = true)
| | |-- progress: long (nullable = true)
| | |-- total: long (nullable = true)
| |-- project: struct (nullable = true)
| | |-- avatarUrls: struct (nullable = true)
| | | |-- 16x16: string (nullable = true)
| | | |-- 24x24: string (nullable = true)
| | | |-- 32x32: string (nullable = true)
| | | |-- 48x48: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- key: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- projectTypeKey: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- simplified: boolean (nullable = true)
| |-- reporter: struct (nullable = true)
| | |-- accountId: string (nullable = true)
| | |-- accountType: string (nullable = true)
| | |-- active: boolean (nullable = true)
| | |-- avatarUrls: struct (nullable = true)
| | | |-- 16x16: string (nullable = true)
| | | |-- 24x24: string (nullable = true)
| | | |-- 32x32: string (nullable = true)
| | | |-- 48x48: string (nullable = true)
| | |-- displayName: string (nullable = true)
| | |-- emailAddress: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- timeZone: string (nullable = true)
| |-- resolution: string (nullable = true)
| |-- resolutiondate: string (nullable = true)
| |-- security: string (nullable = true)
| |-- status: struct (nullable = true)
| | |-- description: string (nullable = true)
| | |-- iconUrl: string (nullable = true)
| | |-- id: string (nullable = true)
|-- id: string (nullable = true)
|-- key: string (nullable = true)
|-- self: string (nullable = true)
10-10-2022 09:34 AM
10-11-2022 01:21 AM
if columns are missing, that particular data is not present in the json. I am not aware of spark skipping columns when reading json with inferschema. There is an option dropFieldIfAllNull but that is False by default.
That makes me think: you might wanna look into the options of read.json
https://spark.apache.org/docs/latest/sql-data-sources-json.html
10-11-2022 04:19 AM
Now it's working, when the message returned that it was not parallelized I searched and found the answer. When creating the Dataframe I changed it to:
@Werner Stinckens Thanks for the support.
df = spark.read.json(sc.parallelize([answer.text]))
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group