Topics with Label: Databricks autoloader

Forum Posts

Sorted by:

by Maksym • New Contributor III

01-19-2022 1:36:14 AM

9340 Views
5 replies
7 kudos

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark .readStream .format("cloudFiles") .option('cloudFil...

Data Engineering

9340 Views
5 replies
7 kudos

01-19-2022 1:36:14 AM

View Replies

Latest Reply

lassebe
New Contributor II

08-31-2023 1:22:00 AM

7 kudos

I had the same issue: files would randomly not be loaded.Setting `.option("cloudFiles.useIncrementalListing", False)` Seemed to do the trick!

7 kudos

08-31-2023 1:22:00 AM

4 More Replies

by SRK • Contributor III

10-01-2022 3:15:10 AM

3765 Views
5 replies
7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1. I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2. I am using Spark code to read data from Kafka and write into landing...

Data Engineering

3765 Views
5 replies
7 kudos

10-01-2022 3:15:10 AM

View Replies

Latest Reply

maddy08
New Contributor II

10-24-2024 10:01:27 PM

7 kudos

just to clarify, are you reading kafka and writing into adls in json files? like for each message from kafka is 1 json file in adls ?

7 kudos

10-24-2024 10:01:27 PM

4 More Replies

by baatchus • New Contributor III

10-18-2021 3:58:36 AM

5866 Views
3 replies
1 kudos

Deduplication, Bronze (raw) or Silver (enriched)

Need some help in choosing between where to do deduplication of data. So I have sensor data in blob storage that I'm picking up with Databricks Autoloader. The data and files can have duplicates in them.Which of the 2 options do I choose?Option 1:Cre...

Data Engineering

5866 Views
3 replies
1 kudos

10-18-2021 3:58:36 AM

View Replies

Latest Reply

Tharun-Kumar
Databricks Employee

07-17-2023 8:25:12 PM

1 kudos

@peter_mcnally You can use watermark to pick the late records and send only the latest records to the bronze table. This will ensure that you always have the latest information in your bronze table.This feature is explained in detail here - https://w...

1 kudos

07-17-2023 8:25:12 PM

2 More Replies

by tech2cloud • New Contributor II

02-11-2023 6:22:21 AM

3117 Views
3 replies
2 kudos

Resolved! Databricks Autoloader simply does not return results if the schema provided has something wrong. Whereas it should provide some error. It Simply does not returns any rows making it difficult to debug.

Data Engineering

3117 Views
3 replies
2 kudos

02-11-2023 6:22:21 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2023 2:46:18 AM

2 kudos

Hi @Ravi Vishwakarma Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

2 kudos

04-12-2023 2:46:18 AM

2 More Replies

by rakeshprasad1 • New Contributor III

12-03-2022 5:27:17 AM

3417 Views
3 replies
4 kudos

databricks autoloader not updating table immediately

I have a simple autoloader job which looks like thisdf_dwu_limit = spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "JSON") \ .schema(schemaFromJson) \ .load("abfss://synapse-usage@xxxxx.dfs.core.windows.net/synapse-us...

Data Engineering

3417 Views
3 replies
4 kudos

12-03-2022 5:27:17 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-03-2022 8:08:33 AM

4 kudos

Can you share the whole code with the counts, which you mentioned?

4 kudos

12-03-2022 8:08:33 AM

2 More Replies

by Prem1 • New Contributor III

08-10-2022 3:00:57 PM

17113 Views
21 replies
11 kudos

java.lang.IllegalArgumentException: java.net.URISyntaxException

I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day ...

Data Engineering

17113 Views
21 replies
11 kudos

08-10-2022 3:00:57 PM

View Replies

Latest Reply

jshields
New Contributor II

01-04-2023 6:56:35 AM

11 kudos

Hi Everyone,I'm seeing this issue as well - same configuration of the previous posts, using autoloader with incremental file listing turned on. The strange part is that it mostly works despite almost all of the files we're loading having colons incl...

11 kudos

01-04-2023 6:56:35 AM

20 More Replies

by alxsbn • Contributor

01-11-2023 2:40:56 AM

2614 Views
2 replies
2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

Data Engineering

2614 Views
2 replies
2 kudos

01-11-2023 2:40:56 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-11-2023 3:43:05 AM

2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

2 kudos

01-11-2023 3:43:05 AM

1 More Replies

by SudiptaBiswas • New Contributor III

12-21-2022 6:01:26 AM

2872 Views
3 replies
3 kudos

databricks autoloader getting stuck in flattening json files for different scenarios similar in nature.

I have a databricks autoloader notebook that reads json files from an input location and writes the flattened version of json files to an output location. However, the notebook is behaving differently for two different but similar scenarios as descri...

Data Engineering

2872 Views
3 replies
3 kudos

12-21-2022 6:01:26 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

12-27-2022 3:51:16 PM

3 kudos

Could you provide a code snippet? also do you see any error logs in the driver logs?

3 kudos

12-27-2022 3:51:16 PM

2 More Replies

by SRK • Contributor III

12-01-2022 5:10:41 PM

3123 Views
3 replies
5 kudos

Resolved! I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.

I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.Following is the issue:java.lang.IllegalArgumentException:...

Data Engineering

3123 Views
3 replies
5 kudos

12-01-2022 5:10:41 PM

View Replies

Latest Reply

SRK
Contributor III

12-02-2022 1:34:56 AM

5 kudos

I got the issue resolved. The issues was by mistake we have duplicate columns in the schema files. Because of that it was showing that error. However, the error is totally mis-leading, that's why didn't able to rectify it.

5 kudos

12-02-2022 1:34:56 AM

2 More Replies

by Magnus • Contributor

11-23-2022 5:57:05 AM

3308 Views
3 replies
10 kudos

Resolved! How to retrieve Auto Loader client secret from Azure Key Vault?

I'm using Auto Loader in a SQL notebook and I would like to configure file notification mode, but I don't know how to retrieve the client secret of the service principal from Azure Key Vault. Is there any example notebook somewhere? The notebook is p...

Data Engineering

3308 Views
3 replies
10 kudos

11-23-2022 5:57:05 AM

View Replies

Latest Reply

Geeta1
Valued Contributor

11-24-2022 7:43:18 AM

10 kudos

Hi @Magnus Johannesson , you must use the Secrets utility (dbutils.secrets) in a notebook or job to read a secret.https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils#dbutils-secretsHope it helps!

10 kudos

11-24-2022 7:43:18 AM

2 More Replies

by kaslan • New Contributor II

10-28-2021 11:46:45 PM

7812 Views
5 replies
0 kudos

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want to filter them out, preferably in the stream itself rather than using a filter operation.A...

Data Engineering

7812 Views
5 replies
0 kudos

10-28-2021 11:46:45 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-29-2021 1:29:39 AM

0 kudos

According to the docs you linked, the glob filter on input-path only works on directories, not on the files themselves.So if you want to filter on certain files in the concerning dirs, you can include an additional filter through the pathGlobFilter o...

0 kudos

10-29-2021 1:29:39 AM

4 More Replies

by JD2 • Contributor

07-19-2021 7:16:43 PM

4718 Views
5 replies
4 kudos

Resolved! Auto Loader for Shape File

Hello: As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???Any help on this is greatly appreciated. Thanks. https://docs.microsoft.com/...

Data Engineering

4718 Views
5 replies
4 kudos

07-19-2021 7:16:43 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-29-2021 1:56:25 AM

4 kudos

You could try to use the binary file type. But the disadvantage of this is that the content of the shape files will be put into a column, that might not be what you want.If you absolutely want to use the autoloader, maybe some thinking outside the b...

4 kudos

09-29-2021 1:56:25 AM

4 More Replies

Databricks Community

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

How to handle schema validation for Json file. Using Databricks Autoloader?

Deduplication, Bronze (raw) or Silver (enriched)

Resolved! Databricks Autoloader simply does not return results if the schema provided has something wrong. Whereas it should provide some error. It Simply does not returns any rows making it difficult to debug.

databricks autoloader not updating table immediately

java.lang.IllegalArgumentException: java.net.URISyntaxException

Resolved! Autloader on CSV file didn't infer well cell with JSON data

databricks autoloader getting stuck in flattening json files for different scenarios similar in nature.

Resolved! I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.

Resolved! How to retrieve Auto Loader client secret from Azure Key Vault?

How to filter files in Databricks Autoloader stream

Resolved! Auto Loader for Shape File