- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-08-2022 06:20 AM
Hello guys,
I'm using Jira API to return "ISSUES". But to be able to use pyspark I need to create the Dataframe passing in the Schema. But I am not able to create the Schema based on the model below. Would you have any ideas?
root
|-- expand: string (nullable = true)
|-- fields: struct (nullable = true)
| |-- aggregateprogress: struct (nullable = true)
| | |-- progress: long (nullable = true)
| | |-- total: long (nullable = true)
| |-- aggregatetimeestimate: string (nullable = true)
| |-- aggregatetimeoriginalestimate: string (nullable = true)
| |-- aggregatetimespent: string (nullable = true)
| |-- assignee: string (nullable = true)
| |-- attachment: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- comment: struct (nullable = true)
| | |-- comments: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- author: struct (nullable = true)
| | | | | |-- accountId: string (nullable = true)
| | | | | |-- accountType: string (nullable = true)
| | | | | |-- active: boolean (nullable = true)
| | | | | |-- avatarUrls: struct (nullable = true)
| | | | | | |-- 16x16: string (nullable = true)
| | | | | | |-- 24x24: string (nullable = true)
| | | | | | |-- 32x32: string (nullable = true)
| | | | | | |-- 48x48: string (nullable = true)
| | | | | |-- displayName: string (nullable = true)
| | | | | |-- emailAddress: string (nullable = true)
| | | | | |-- self: string (nullable = true)
| | | | | |-- timeZone: string (nullable = true)
| | | | |-- body: struct (nullable = true)
| | | | | |-- content: array (nullable = true)
| | | | | | |-- element: struct (containsNull = true)
| | | | | | | |-- content: array (nullable = true)
| | | | | | | | |-- element: struct (containsNull = true)
| | | | | | | | | |-- text: string (nullable = true)
| | | | | | | | | |-- type: string (nullable = true)
| | | | | | | |-- type: string (nullable = true)
| | | | | |-- type: string (nullable = true)
| | | | | |-- version: long (nullable = true)
| | | | |-- created: string (nullable = true)
| | | | |-- id: string (nullable = true)
| | | | |-- jsdPublic: boolean (nullable = true)
| | | | |-- self: string (nullable = true)
| | | | |-- updateAuthor: struct (nullable = true)
| | | | | |-- accountId: string (nullable = true)
| | | | | |-- accountType: string (nullable = true)
| | | | | |-- active: boolean (nullable = true)
| | | | | |-- avatarUrls: struct (nullable = true)
| | | | | | |-- 16x16: string (nullable = true)
| | | | | | |-- 24x24: string (nullable = true)
| | | | | | |-- 32x32: string (nullable = true)
| | | | | | |-- 48x48: string (nullable = true)
| | | | | |-- displayName: string (nullable = true)
| | | | | |-- emailAddress: string (nullable = true)
| | | | | |-- self: string (nullable = true)
| | | | | |-- timeZone: string (nullable = true)
| | | | |-- updated: string (nullable = true)
| | |-- maxResults: long (nullable = true)
| | |-- self: string (nullable = true)
| | |-- startAt: long (nullable = true)
| | |-- total: long (nullable = true)
| |-- components: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- created: string (nullable = true)
| |-- creator: struct (nullable = true)
| | |-- accountId: string (nullable = true)
| | |-- accountType: string (nullable = true)
| | |-- active: boolean (nullable = true)
| | |-- avatarUrls: struct (nullable = true)
| | | |-- 16x16: string (nullable = true)
| | | |-- 24x24: string (nullable = true)
| | | |-- 32x32: string (nullable = true)
| | | |-- 48x48: string (nullable = true)
| | |-- displayName: string (nullable = true)
| | |-- emailAddress: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- timeZone: string (nullable = true)
| |-- customfield_10001: string (nullable = true)
| |-- customfield_10002: string (nullable = true)
| |-- customfield_10003: string (nullable = true)
| |-- customfield_10004: string (nullable = true)
| |-- customfield_10005: string (nullable = true)
| |-- customfield_10006: string (nullable = true)
| |-- customfield_10007: string (nullable = true)
| |-- customfield_10008: string (nullable = true)
| |-- customfield_10009: string (nullable = true)
| |-- customfield_10010: string (nullable = true)
| |-- customfield_10014: string (nullable = true)
| |-- customfield_10015: string (nullable = true)
| |-- customfield_10016: string (nullable = true)
| |-- customfield_10017: string (nullable = true)
| |-- customfield_10018: struct (nullable = true)
| | |-- hasEpicLinkFieldDependency: boolean (nullable = true)
| | |-- nonEditableReason: struct (nullable = true)
| | | |-- message: string (nullable = true)
| | | |-- reason: string (nullable = true)
| | |-- showField: boolean (nullable = true)
| |-- customfield_10019: string (nullable = true)
| |-- customfield_10020: string (nullable = true)
| |-- customfield_10021: string (nullable = true)
| |-- customfield_10022: string (nullable = true)
| |-- customfield_10023: string (nullable = true)
| |-- customfield_10024: string (nullable = true)
| |-- customfield_10025: string (nullable = true)
| |-- customfield_10026: string (nullable = true)
| |-- customfield_10027: string (nullable = true)
| |-- customfield_10028: string (nullable = true)
| |-- customfield_10029: string (nullable = true)
| |-- customfield_10030: string (nullable = true)
| |-- description: string (nullable = true)
| |-- duedate: string (nullable = true)
| |-- environment: string (nullable = true)
| |-- fixVersions: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- issuelinks: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- issuerestriction: struct (nullable = true)
| | |-- shouldDisplay: boolean (nullable = true)
| |-- issuetype: struct (nullable = true)
| | |-- avatarId: long (nullable = true)
| | |-- description: string (nullable = true)
| | |-- entityId: string (nullable = true)
| | |-- hierarchyLevel: long (nullable = true)
| | |-- iconUrl: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- subtask: boolean (nullable = true)
| |-- labels: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- lastViewed: string (nullable = true)
| |-- priority: struct (nullable = true)
| | |-- iconUrl: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- self: string (nullable = true)
| |-- progress: struct (nullable = true)
| | |-- progress: long (nullable = true)
| | |-- total: long (nullable = true)
| |-- project: struct (nullable = true)
| | |-- avatarUrls: struct (nullable = true)
| | | |-- 16x16: string (nullable = true)
| | | |-- 24x24: string (nullable = true)
| | | |-- 32x32: string (nullable = true)
| | | |-- 48x48: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- key: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- projectTypeKey: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- simplified: boolean (nullable = true)
| |-- reporter: struct (nullable = true)
| | |-- accountId: string (nullable = true)
| | |-- accountType: string (nullable = true)
| | |-- active: boolean (nullable = true)
| | |-- avatarUrls: struct (nullable = true)
| | | |-- 16x16: string (nullable = true)
| | | |-- 24x24: string (nullable = true)
| | | |-- 32x32: string (nullable = true)
| | | |-- 48x48: string (nullable = true)
| | |-- displayName: string (nullable = true)
| | |-- emailAddress: string (nullable = true)
| | |-- self: string (nullable = true)
| | |-- timeZone: string (nullable = true)
| |-- resolution: string (nullable = true)
| |-- resolutiondate: string (nullable = true)
| |-- security: string (nullable = true)
| |-- status: struct (nullable = true)
| | |-- description: string (nullable = true)
| | |-- iconUrl: string (nullable = true)
| | |-- id: string (nullable = true)
|-- id: string (nullable = true)
|-- key: string (nullable = true)
|-- self: string (nullable = true)
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-11-2022 04:19 AM
Now it's working, when the message returned that it was not parallelized I searched and found the answer. When creating the Dataframe I changed it to:
@Werner Stinckens Thanks for the support.
df = spark.read.json(sc.parallelize([answer.text]))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-08-2022 06:39 AM
@Werner Stinckens or @Hubert Dudek Could you help me?
I don't want all the information, just some. However, I can only do it in a static file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 01:24 AM
you want help on how to define the schema?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:44 AM
Yes, it is returning null values as in the example I showed above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 04:53 AM
@Werner Stinckens If you look at the Schema that was shown above, it has many levels and sub-levels, like: Struct, Array. In this Schema I created is returning only null values, I don't know where I'm going wrong.
schema = StructType([
StructField('fields', StructType([
StructField('comment', StructType([
StructField("comments",ArrayType( StructField('body', StringType())),True),
])),
])),
StructField('id', StringType()),
StructField('key', StringType()),
StructField('self', StringType())
])
df = spark.createDataFrame([response],schema)
df = df.withColumn("fields", explode((("fields"))))\
.withColumn("comment", explode((("fields.comment"))))\
.withColumn("comments", explode((("comment.comments"))))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:01 AM
Well, your schema seems ok, but I can't tell without the data itself.
Can you read the JSON files from JIRA with schema inference and then compare?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:08 AM
I did the same, copied the json return and saved it to a file to check the schema. But it is not returning the comment data that is in the body.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:15 AM
I mean a printschema or something of the json when you read it in a df.
Do you see all the data when you read the json with schema inference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:21 AM
I'd land the json first (coming from the REST call), and then process it.
Do you now call the API using 'request' or something similar?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:24 AM
includes a picture in your post above, see please.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:25 AM
save the response as a json on your datalake, read it with spark and you have your schema.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:31 AM
I had already done that, but I would like to consume this data in the Dataframe do the transformations and then save the data in a DB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:38 AM
yes, but hence my question: what happens if you read the json with schema inference?
Does that work?
If the JSON files can have a different schema, it is a good idea to use schema inference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 05:47 AM
Got it, I'll do it this way. And I get back to you, thank you very much.

