cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

andreiten
by New Contributor II
  • 3073 Views
  • 1 replies
  • 3 kudos

Is there any example or guideline how to pass JSON parameters to the pipeline in Databricks workflow?

I used this source https://docs.databricks.com/workflows/jobs/jobs.html#:~:text=You%20can%20use%20Run%20Now,different%20values%20for%20existing%20parameters.&text=next%20to%20Run%20Now%20and,on%20the%20type%20of%20task. But there is no example of how...

  • 3073 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Hi @Andre Ten​ That's exactly how you specify the json parameters in databricks workflow. I have been doing in the same format and it works for me..removed the parameters as it is a bit sensitive. But I hope you get the point.Cheers.

  • 3 kudos
Kavin
by New Contributor II
  • 1244 Views
  • 2 replies
  • 2 kudos

Issue converting the datasets into JSON

Im a newbie to Databricks, I need to convert the data sets into JSON. i tried bth FOR JSON AUTO AND FOR JSON PATH, However im getting an issue - [PARSE_SYNTAX_ERROR] Syntax error at or near 'json'line My Query works fine without FOR JSON AUTO AND FOR...

  • 1244 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Kavin Natarajan​, We haven’t heard from you since the last response from @Debayan Mukherjee​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be hel...

  • 2 kudos
1 More Replies
AmineHY
by Contributor
  • 6041 Views
  • 7 replies
  • 9 kudos

Resolved! How to read JSON files embedded in a list of lists?

HelloI am trying to read this JSON file but didn't succeed  You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

image image image
  • 6041 Views
  • 7 replies
  • 9 kudos
Latest Reply
AmineHY
Contributor
  • 9 kudos

Here is my solution, I am sure it can be optimizedimport json  data=[] with open(path_to_json_file, 'r') as f:   data.extend(json.load(f))   df = spark.createDataFrame(data[0], schema=schema)

  • 9 kudos
6 More Replies
KarimSegura
by New Contributor III
  • 2446 Views
  • 3 replies
  • 4 kudos

databricks-connect throws an exception when showing a dataframe with json content

I'm facing an issue when I want to show a dataframe with JSON content.All this happens when the script runs in databricks-connect from VS Code.Basically, I would like any help or guidance to get this run as it should be. Thanks in advance.This is how...

  • 2446 Views
  • 3 replies
  • 4 kudos
Latest Reply
KarimSegura
New Contributor III
  • 4 kudos

The code works fine on databricks cluster, but this code is part of a unit test in local env. then submitted to a branch->PR->merged into master branch.Thanks for the advice on using DBX. I will give DBX a try again even though I've already tried.I'l...

  • 4 kudos
2 More Replies
hare
by New Contributor III
  • 3200 Views
  • 3 replies
  • 6 kudos

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Hi All, We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".We are getting an error "AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned scr...

  • 3200 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16856839485
New Contributor II
  • 6 kudos

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga...

  • 6 kudos
2 More Replies
Data_Engineer3
by Contributor II
  • 4107 Views
  • 4 replies
  • 1 kudos

Unable to read data from Elasticsearch with spark in Databricks.

When I am trying to read data from elasticsearch by spark sql, it throw an error like RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string...

  • 4107 Views
  • 4 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi there @KARTHICK N​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 1 kudos
3 More Replies
rdobbss
by New Contributor II
  • 1082 Views
  • 2 replies
  • 0 kudos

RPC Disassociate error due to container threshold exceeding and garbage collector error when reading 23 gb multiline JSON file.

I am reading 23 gb multi line json file and flattening it using udf and writing datframe as parquet using psypark.Cluster I am using is 3 node (8 core) 64gb memory with limit to go upto 8 nodes.I am able to process 7gb file with no issue and takes ar...

  • 1082 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hi @Ravi Dobariya​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
1 More Replies
laus
by New Contributor III
  • 5747 Views
  • 7 replies
  • 3 kudos

Resolved! How to load a json file in pyspark with colon character in file name

Hi,I'm trying to load this json file which contains the colon character in its name: file_name.2022-03-05_11:30:00.json but I get the error in screenshot below saying that there is a relative path in an absolute url - Any idea how to read this file...

image
  • 5747 Views
  • 7 replies
  • 3 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 3 kudos

Hi @Laura Blancarte​ I hope that @Pearl Ubaru​'s answer would have helped you in resolving your issue.Please let us know if you need more help on this.

  • 3 kudos
6 More Replies
MattM
by New Contributor III
  • 1597 Views
  • 1 replies
  • 0 kudos

Unstructured Data - PDF and a semi-structured data

I have a scenario where one source is unstructered pdf files and another source is semi-structered JSON files. I get files from these two sources on a daily basis into an ADLS storage. What is the best way to load this into a medallion structure by s...

  • 1597 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Matt M​, Please import this notebook and read this excellent Medallion Architecture article. Let us know how it goes. Thanks.

  • 0 kudos
Kash
by Contributor III
  • 6760 Views
  • 19 replies
  • 13 kudos

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...

  • 6760 Views
  • 19 replies
  • 13 kudos
Latest Reply
Kash
Contributor III
  • 13 kudos

Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis​ I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...

  • 13 kudos
18 More Replies
steelman
by New Contributor III
  • 7961 Views
  • 6 replies
  • 7 kudos

Resolved! how to flatten non standard Json files in a dataframe

hello, I have a non standard Json file with a nested file structure that I have issues with. Here is an example of the json file. jsonfile= """[ { "success":true, "numRows":2, "data":{ "58251":{ "invoiceno":"58...

desired format in the dataframe after processing the json file
  • 7961 Views
  • 6 replies
  • 7 kudos
Latest Reply
Deepak_Bhutada
Contributor III
  • 7 kudos

@stale stokkereit​ You can use the below function to flatten the struct fieldimport pyspark.sql.functions as F   def flatten_df(nested_df): flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct'] nested_cols = [c[0] for c in nest...

  • 7 kudos
5 More Replies
Devarsh
by Contributor
  • 6356 Views
  • 3 replies
  • 7 kudos

Resolved! Getting the error 'No such file or directory', when trying to access the json file

I am trying to write in my google sheet through Databricks but when it comes to reading the json, file containing the credentials, I am getting the error that No such file or directory exists.import gspread     gc = gspread.service_account(filename='...

  • 6356 Views
  • 3 replies
  • 7 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 7 kudos

Hi @Devarsh Shah​ The issue is not with json file but the location you are specifying while reading.As suggested by @Werner Stinckens​ please start using spark API to read the json file as below:spark.read.format("json").load("testjson")Please check ...

  • 7 kudos
2 More Replies
repcak
by New Contributor III
  • 3519 Views
  • 6 replies
  • 3 kudos

Resolved! Delta Live Tables with EventHub

Hello,I would like to integrate Databricks Delta Live Tables with Eventhub, but i cannot install com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17 on delta live cluster.I tried installed in using Init script (by adding it in Json cluster settings...

image
  • 3519 Views
  • 6 replies
  • 3 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 3 kudos

I think this has some details https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-spark-tutorial @Kacper Mucha​ is the issue resolved ?

  • 3 kudos
5 More Replies
User16783855534
by New Contributor III
  • 6234 Views
  • 6 replies
  • 5 kudos
  • 6234 Views
  • 6 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Neil Patel​ ​ , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

  • 5 kudos
5 More Replies
AmanSehgal
by Honored Contributor III
  • 5084 Views
  • 2 replies
  • 10 kudos

Resolved! How to merge all the columns into one column as JSON?

I have a task to transform a dataframe. The task is to collect all the columns in a row and embed it into a JSON string as a column.Source DF:Target DF: 

image image
  • 5084 Views
  • 2 replies
  • 10 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 10 kudos

I was able to do this by converting df to rdd and then by applying map function to it.rdd_1 = df.rdd.map(lambda row: (row['ID'], row.asDict() ) )   ...

  • 10 kudos
1 More Replies
Labels