Data Engineering

Forum Posts

Sorted by:

by SailajaB • Valued Contributor III

02-23-2022 8:49:14 AM

4588 Views
6 replies
4 kudos

Resolved! when and otherwise issue

Hi,Here in our scenario we are reading json files as input and it contains nested structure. Few of the attributes are array type struct. Where we need to change name of nested ones. So we created a new structure and doing cast.We are facing below pr...

Data Engineering

4588 Views
6 replies
4 kudos

02-23-2022 8:49:14 AM

View Replies

Latest Reply

pradeep24
New Contributor II

02-17-2025 8:16:09 AM

4 kudos

Looking for <aherf=”https://360digitmg.com/blog/data-engineering-jobs-in-bangalore” > data engineer fresher jobs in Bangalore </a>Jobs in Bangalore? Explore job roles, skills, salary insights, and companies hiring like Amazon, Flipkart, Google & Micr...

4 kudos

02-17-2025 8:16:09 AM

5 More Replies

by Maksym • New Contributor III

01-19-2022 1:36:14 AM

9338 Views
5 replies
7 kudos

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark .readStream .format("cloudFiles") .option('cloudFil...

Data Engineering

9338 Views
5 replies
7 kudos

01-19-2022 1:36:14 AM

View Replies

Latest Reply

lassebe
New Contributor II

08-31-2023 1:22:00 AM

7 kudos

I had the same issue: files would randomly not be loaded.Setting `.option("cloudFiles.useIncrementalListing", False)` Seemed to do the trick!

7 kudos

08-31-2023 1:22:00 AM

4 More Replies

by AmineHY • Contributor

11-16-2022 5:24:01 AM

11835 Views
5 replies
6 kudos

Resolved! How to read JSON files embedded in a list of lists?

HelloI am trying to read this JSON file but didn't succeed You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

Data Engineering

11835 Views
5 replies
6 kudos

11-16-2022 5:24:01 AM

View Replies

Latest Reply

adriennn
Valued Contributor

09-12-2024 10:32:36 PM

6 kudos

The correct way to do this without using open, which will work only with local/mounted files is to read the files as binaryfile and then you will get the entire json string on each row, from there you can use from_json() and explode() to extract the ...

6 kudos

09-12-2024 10:32:36 PM

4 More Replies

by rdobbss • New Contributor II

07-11-2022 8:20:14 AM

4729 Views
3 replies
3 kudos

How to use foreachbatch in deltalivetable or DLT?

I need to process some transformation on incoming data as a batch and want to know if there is way to use foreachbatch option in deltalivetable. I am using autoloader to load json files and then I need to apply foreachbatch and store results into ano...

Data Engineering

4729 Views
3 replies
3 kudos

07-11-2022 8:20:14 AM

View Replies

Latest Reply

TomRenish
New Contributor III

01-18-2023 11:33:47 AM

3 kudos

Not sure if this will apply to you or not...I was looking at the foreachbatch tool to reduce the workload of getting distinct data from a history table of 20million + records because the df.dropDuplicates() function was intermittently running out of ...

3 kudos

01-18-2023 11:33:47 AM

2 More Replies

by Bbren • New Contributor

05-11-2023 4:51:33 AM

3269 Views
2 replies
1 kudos

Resolved! Handling of millions of xml in json files

Hi all, i have some questions related to the handling of many smalls files and possible improvements and augmentations. We have many small xml files. These files are previously processed by another system that puts them in our datalake, but as an add...

Data Engineering

3269 Views
2 replies
1 kudos

05-11-2023 4:51:33 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-21-2023 11:56:20 PM

1 kudos

Hi @Bauke Brenninkmeijer Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

1 kudos

05-21-2023 11:56:20 PM

1 More Replies

by Sagacious • New Contributor II

01-31-2023 9:02:46 PM

17785 Views
5 replies
0 kudos

How to upload large files to Databricks? and how to unzip files successfully?

I have two JSON files, one ~3 gb and one ~5 gb. I am unable to upload them to databricks community edition as they exceed the max allowed up-loadable file size (~2 gb). If I zip them I am able to upload them, but I am also having issues figuring out ...

Data Engineering

17785 Views
5 replies
0 kudos

01-31-2023 9:02:46 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 9:04:55 PM

0 kudos

Hi @Sage Olson Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

0 kudos

04-08-2023 9:04:55 PM

4 More Replies

by MerelyPerfect • New Contributor II

03-24-2023 8:39:40 AM

3845 Views
3 replies
1 kudos

read base64 json column with Autoloader and inferschema.

I have json files falling in our blob with two fields, 1. offset(integer), 2. value(base64).This value column is json with unicode. so they sent it as base64. Challenge is this json is very large with 100+ fields. so we cannot define the schema. We c...

Data Engineering

3845 Views
3 replies
1 kudos

03-24-2023 8:39:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-25-2023 10:56:10 PM

1 kudos

Hi @MerelyPerfect Per Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

1 kudos

03-25-2023 10:56:10 PM

2 More Replies

by MikeJohnsonZa • New Contributor

02-02-2023 1:05:49 AM

2810 Views
3 replies
0 kudos

Resolved! Importing irregularly formatted json files

HiI'm importing a large collection of json files, the problem is that they are not what I would expect a well-formatted json file to be (although probably still valid), each file consists of only a single record that looks something like this (this i...

Data Engineering

2810 Views
3 replies
0 kudos

02-02-2023 1:05:49 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

03-01-2023 10:37:07 AM

0 kudos

Hi @Michael Johnson,I would like to share the following notebook which contains examples on how to process complex data types, like JSON. Please check the following link and let us know if you still need help https://docs.databricks.com/optimization...

0 kudos

03-01-2023 10:37:07 AM

2 More Replies

by SudiptaBiswas • New Contributor III

12-21-2022 6:01:26 AM

2871 Views
3 replies
3 kudos

databricks autoloader getting stuck in flattening json files for different scenarios similar in nature.

I have a databricks autoloader notebook that reads json files from an input location and writes the flattened version of json files to an output location. However, the notebook is behaving differently for two different but similar scenarios as descri...

Data Engineering

2871 Views
3 replies
3 kudos

12-21-2022 6:01:26 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

12-27-2022 3:51:16 PM

3 kudos

Could you provide a code snippet? also do you see any error logs in the driver logs?

3 kudos

12-27-2022 3:51:16 PM

2 More Replies

by sudhanshu1 • New Contributor III

12-21-2022 4:43:11 AM

769 Views
0 replies
0 kudos

Structured Streaming

I need some solution for below problem.We have set of json files which are keep coming to aws s3, these files contains details for a property . please note 1 property can have 10-12 rows in this json file. Attached is sample json file.We need to read...

Data Engineering

769 Views
0 replies
0 kudos

12-21-2022 4:43:11 AM

by Vickyster • New Contributor II

09-07-2022 8:33:36 PM

1434 Views
0 replies
0 kudos

Column partitioning is not working in delta live table when `columnMapping` table property is enabled.

I'm trying to create delta live table on top of json files placed in azure blob. The json files contains white spaces in column names instead of renaming I tried `columnMapping` table property which let me create the table with spaces but the column ...

Data Engineering

1434 Views
0 replies
0 kudos

09-07-2022 8:33:36 PM

by AzureDatabricks • New Contributor III

11-21-2021 11:34:20 PM

7165 Views
5 replies
1 kudos

Parallel processing of json files in databricks pyspark

How we can read files from azure blob storage and process parallel in databricks using pyspark.As of now we are reading all 10 files at a time into dataframe and flattening it.Thanks & Regards,Sujata

Data Engineering

7165 Views
5 replies
1 kudos

11-21-2021 11:34:20 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-22-2021 1:54:07 AM

1 kudos

spark.read.json("/mnt/dbfs/<ENTER PATH OF JSON DIR HERE>/*.jsonyou first have to mount your blob storage to databricks, I assume that is already done.https://spark.apache.org/docs/latest/sql-data-sources-json.html

1 kudos

11-22-2021 1:54:07 AM

4 More Replies

by kaslan • New Contributor II

10-28-2021 11:46:45 PM

7809 Views
5 replies
0 kudos

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want to filter them out, preferably in the stream itself rather than using a filter operation.A...

Data Engineering

7809 Views
5 replies
0 kudos

10-28-2021 11:46:45 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-29-2021 1:29:39 AM

0 kudos

According to the docs you linked, the glob filter on input-path only works on directories, not on the files themselves.So if you want to filter on certain files in the concerning dirs, you can include an additional filter through the pathGlobFilter o...

0 kudos

10-29-2021 1:29:39 AM

4 More Replies

by Orianh • Valued Contributor II

10-14-2021 1:59:31 AM

26586 Views
11 replies
10 kudos

Resolved! Read JSON files from the s3 bucket

Hello guys, I'm trying to read JSON files from the s3 bucket. but no matter what I try I get Query returned no result or if I don't specify the schema I get unable to infer a schema.I tried to mount the s3 bucket, still not works.here is some code th...

Data Engineering

26586 Views
11 replies
10 kudos

10-14-2021 1:59:31 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

10-14-2021 3:42:37 AM

10 kudos

Please refer to the doc that helps you to read JSON. If you are getting this error the problem should be with the JSON schema. Please validate it.As a test, create a simple JSON file (you can get it on the internet), upload it to your S3 bucket, and ...

10 kudos

10-14-2021 3:42:37 AM

10 More Replies