Topics with Label: JSON

Forum Posts

Sorted by:

by cmotla • New Contributor III

03-18-2022 2:58:04 PM

1361 Views
3 replies
8 kudos

Issue with complex json based data frame select

We are getting the below error when trying to select the nested columns (string type in a struct) even though we don't have more than a 1000 records in the data frame. The schema is very complex and has few columns as struct type and few as array typ...

Data Engineering

1361 Views
3 replies
8 kudos

03-18-2022 2:58:04 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-09-2022 6:41:09 AM

8 kudos

Hi @Chaitanya Motla , Just a friendly follow-up. Do you still need help, or did you find the solution? Please let us know.

8 kudos

05-09-2022 6:41:09 AM

2 More Replies

by LanceYoung • New Contributor III

03-19-2022 8:43:36 PM

6276 Views
7 replies
6 kudos

Resolved! Unable to make Databricks API calls from an HTML iframe rendered by a notebook's `displayHTML()` call, due to the browser enforcing CORS policy.

My GoalI want to make my Databricks Notebooks more interactive and have custom HTML/JS UI widgets that guide non-technical people through a business/data process. I want the HTML/JS widget to be able to execute a DB job, or execute some python code t...

Data Engineering

6276 Views
7 replies
6 kudos

03-19-2022 8:43:36 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-26-2022 3:35:41 PM

6 kudos

Hi @Lance Young , Just a friendly follow-up. Do you still need help, or have you resolved your problem using the above solutions? Please let us know.

6 kudos

04-26-2022 3:35:41 PM

6 More Replies

by MartinB • Contributor III

02-13-2022 7:59:14 AM

12067 Views
26 replies
6 kudos

Resolved! Does partition pruning / partition elimination not work for folder partitioned JSON files? (Spark 3.1.2)

Data Engineering

12067 Views
26 replies
6 kudos

02-13-2022 7:59:14 AM

View Replies

Latest Reply

MartinB
Contributor III

03-04-2022 7:51:13 AM

6 kudos

@Kaniz Fatma could you maybe involve a Databricks expert?

6 kudos

03-04-2022 7:51:13 AM

25 More Replies

by Jana • New Contributor III

02-15-2022 9:26:54 AM

4787 Views
8 replies
2 kudos

Resolved! Parsing 5 GB json file is running long on cluster

I was creating delta table from ADLS json input file. but the job was running long while creating delta table from json. Below is my cluster configuration. Is the issue related to cluster config ? Do I need to upgrade the cluster config ?The cluster ...

Data Engineering

4787 Views
8 replies
2 kudos

02-15-2022 9:26:54 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

03-01-2022 12:48:29 AM

2 kudos

with multiline = true, the json is read as a whole and processed as such.I'd try with a beefier cluster.

2 kudos

03-01-2022 12:48:29 AM

7 More Replies

by SailajaB • Valued Contributor III

02-09-2022 10:39:24 PM

4152 Views
12 replies
4 kudos

Resolved! JSON validation is getting failed after writing Pyspark dataframe to json format

Hi We have to convert transformed dataframe to json format. So we used write and json format on top of final dataframe to convert it to json. But when we validating the output json its not in proper json format.Could you please provide your suggestio...

Data Engineering

4152 Views
12 replies
4 kudos

02-09-2022 10:39:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-02-2022 9:01:40 AM

4 kudos

@Sailaja B - Does @Aman Sehgal's most recent answer help solve the problem? If it does, would you be happy to mark their answer as best?

4 kudos

03-02-2022 9:01:40 AM

11 More Replies

by SailajaB • Valued Contributor III

01-19-2022 5:29:16 AM

2361 Views
4 replies
6 kudos

Resolved! how to create a nested(unflatten) json from flatten json

Hi ,Is there any function in pyspark which can convert flatten json to nested json.Ex : if we have attribute in flatten is like a_b_c : 23then in unflatten it should be{"a":{"b":{"c":23}}}Thank you

Data Engineering

2361 Views
4 replies
6 kudos

01-19-2022 5:29:16 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-20-2022 2:44:30 AM

6 kudos

As @Chuck Connell said can you share more of your source json as that example is not json. Additionally flatten is usually to change something like {"status": {"A": 1,"B": 2}} to {"status.A": 1, "status.B": 2} which can be done easily with spark da...

6 kudos

01-20-2022 2:44:30 AM

3 More Replies

by cconnell • Contributor II

12-20-2021 6:58:18 AM

492 Views
2 replies
1 kudos

www.linkedin.com

Importing JSON to Databricks (PySpark) is simple in the simple case. But of course there are wrinkles for real-world data. Here are some tips/tricks to help...https://www.linkedin.com/pulse/json-databricks-pyspark-chuck-connell/

Data Engineering

492 Views
2 replies
1 kudos

12-20-2021 6:58:18 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-20-2021 7:10:13 AM

1 kudos

Hi @Chuck Connell , Thank you for sharing such an amazing article!

1 kudos

12-20-2021 7:10:13 AM

1 More Replies

by SailajaB • Valued Contributor III

12-07-2021 9:50:05 PM

1051 Views
4 replies
4 kudos

facing format issue while converting one type nested json to other brand new json schema

Hi,We are writing our flatten json dataframe to user defined nested schema json using pysprk in Databricks.But we are not getting the expected formatExpecting : {"ID":"aaa",c_id":[{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"},...

Data Engineering

1051 Views
4 replies
4 kudos

12-07-2021 9:50:05 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-08-2021 2:24:24 AM

4 kudos

as @wereners said you need to share the code. If it is dataframe to json probably you need to use StructType - Array to get that list but without code is hard to help.

4 kudos

12-08-2021 2:24:24 AM

3 More Replies

by Braxx • Contributor II

11-23-2021 6:50:41 AM

6685 Views
12 replies
2 kudos

Resolved! Validate a schema of json in column

I have a dataframe like below with col2 as key-value pairs. I would like to filter col2 to only the rows with a valid schema. There could be many of pairs, sometimes less, sometimes more and this is fine as long as the structure is fine. Nulls in col...

Data Engineering

6685 Views
12 replies
2 kudos

11-23-2021 6:50:41 AM

View Replies

Latest Reply

Anonymous
Not applicable

12-01-2021 8:41:53 AM

2 kudos

@Bartosz Wachocki - Thank you for sharing your solution and marking it as best.

2 kudos

12-01-2021 8:41:53 AM

11 More Replies

by Orianh • Valued Contributor II

10-17-2021 4:55:24 AM

4985 Views
7 replies
3 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

Data Engineering

4985 Views
7 replies
3 kudos

10-17-2021 4:55:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-11-2021 8:48:53 AM

3 kudos

@orian hindi - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

3 kudos

11-11-2021 8:48:53 AM

6 More Replies

by D3nnisd • New Contributor III

10-20-2021 7:39:35 AM

8873 Views
15 replies
6 kudos

Resolved! BufferHolder Exceeded on Json flattening

On Databricks, we use the following code to flatten JSON in Python. The data is from a REST API:```df = spark.read.format("json").option("header", "true").option("multiline", "true").load(SourceFileFolder + sourcetable + "*.json")df2 = df.select(psf....

Data Engineering

8873 Views
15 replies
6 kudos

10-20-2021 7:39:35 AM

View Replies

Latest Reply

Dan_Z
Honored Contributor

10-20-2021 8:57:02 AM

6 kudos

@Dennis D , what's happening here is that more than 2 GB (2147483648 bytes) is being loaded into a single column value. This is a hard-limit for serialization. This KB article addresses it. The solution would be to find some way to have this loaded ...

6 kudos

10-20-2021 8:57:02 AM

14 More Replies

by Kaniz • Community Manager

09-27-2021 12:30:34 AM

8823 Views
2 replies
1 kudos

Resolved! How to write json in file in s3 directly in python?

Data Engineering

8823 Views
2 replies
1 kudos

09-27-2021 12:30:34 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

09-27-2021 11:47:20 AM

1 kudos

Assuming that the S3 bucket is mounted in the workspace you can provide a file path. If you want to write a PySpark DF then you can do something like the following: df.write.format('json').save('/path/to/file_name.json')You could also use the json py...

1 kudos

09-27-2021 11:47:20 AM

1 More Replies

by User16856693631 • New Contributor II

06-11-2021 11:20:10 AM

931 Views
2 replies
0 kudos

Can you create Clusters via a REST API?

Yes, you can. See here: https://docs.databricks.com/dev-tools/api/latest/clusters.htmlThe JSON payload would look as follows:{ "cluster_name": "my-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "spark_conf": { ...

Data Engineering

931 Views
2 replies
0 kudos

06-11-2021 11:20:10 AM

View Replies

Latest Reply

ManishPatil
New Contributor II

09-08-2021 6:05:17 AM

0 kudos

One can create a Cluster(s) using CLuster API @ https://docs.databricks.com/dev-tools/api/latest/clusters.html#create However, REST API 2.0 doesn't provide certain features like "Enable Table Access Control", which has been introduced after REST API ...

0 kudos

09-08-2021 6:05:17 AM

1 More Replies

by MithuWagh • New Contributor

12-24-2019 4:14:09 AM

5191 Views
1 replies
0 kudos

How to deal with column name with .(dot) in pyspark dataframe??

We are streaming data from kafka source with json but in some column we are getting .(dot) in column names.streaming json data: df1 = df.selectExpr("CAST(value AS STRING)") {"pNum":"A14","from":"telecom","payload":{"TARGET":"1","COUNTRY":"India"...

Data Engineering

5191 Views
1 replies
0 kudos

12-24-2019 4:14:09 AM

View Replies

Latest Reply

shyam_9
Valued Contributor

12-30-2019 3:27:03 AM

0 kudos

Hi @Mithu Wagh you can use backticks to enclose the column name.df.select("`col0.1`")

0 kudos

12-30-2019 3:27:03 AM

by Yogi • New Contributor III

04-17-2019 4:50:09 AM

6950 Views
15 replies
0 kudos

Resolved! Can we pass Databricks output to Azure function body?

Hi, Can anyone help me with Databricks and Azure function. I'm trying to pass databricks json output to azure function body in ADF job, is it possible? If yes, How? If No, what other alternative to do the same?

Data Engineering

6950 Views
15 replies
0 kudos

04-17-2019 4:50:09 AM

View Replies

Latest Reply

AbhishekNarain_
New Contributor III

09-10-2019 9:02:02 PM

0 kudos

You can now pass values back to ADF from a notebook.@@Yogi Though there is a size limit, so if you are passing dataset of larger than 2MB then rather write it on storage, and consume it directly with Azure Functions. You can pass the file path/ refe...

0 kudos

09-10-2019 9:02:02 PM

14 More Replies