cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Maksym
by New Contributor III
  • 4744 Views
  • 4 replies
  • 7 kudos

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark .readStream .format("cloudFiles") .option('cloudFil...

  • 4744 Views
  • 4 replies
  • 7 kudos
Latest Reply
lassebe
New Contributor II
  • 7 kudos

I had the same issue: files would randomly not be loaded.Setting `.option("cloudFiles.useIncrementalListing", False)` Seemed to do the trick!

  • 7 kudos
3 More Replies
baatchus
by New Contributor III
  • 3277 Views
  • 3 replies
  • 1 kudos

Deduplication, Bronze (raw) or Silver (enriched)

Need some help in choosing between where to do deduplication of data. So I have sensor data in blob storage that I'm picking up with Databricks Autoloader. The data and files can have duplicates in them.Which of the 2 options do I choose?Option 1:Cre...

  • 3277 Views
  • 3 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 1 kudos

@peter_mcnally You can use watermark to pick the late records and send only the latest records to the bronze table. This will ensure that you always have the latest information in your bronze table.This feature is explained in detail here - https://w...

  • 1 kudos
2 More Replies
tech2cloud
by New Contributor II
  • 1260 Views
  • 3 replies
  • 2 kudos
  • 1260 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 2 kudos
2 More Replies
rakeshprasad1
by New Contributor III
  • 1663 Views
  • 3 replies
  • 4 kudos

databricks autoloader not updating table immediately

I have a simple autoloader job which looks like thisdf_dwu_limit = spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "JSON") \ .schema(schemaFromJson) \ .load("abfss://synapse-usage@xxxxx.dfs.core.windows.net/synapse-us...

auto-loader issue
  • 1663 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

Can you share the whole code with the counts, which you mentioned?

  • 4 kudos
2 More Replies
Prem1
by New Contributor III
  • 7671 Views
  • 21 replies
  • 11 kudos

java.lang.IllegalArgumentException: java.net.URISyntaxException

I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day ...

  • 7671 Views
  • 21 replies
  • 11 kudos
Latest Reply
jshields
New Contributor II
  • 11 kudos

Hi Everyone,I'm seeing this issue as well - same configuration of the previous posts, using autoloader with incremental file listing turned on. The strange part is that it mostly works despite almost all of the files we're loading having colons incl...

  • 11 kudos
20 More Replies
alxsbn
by New Contributor III
  • 1101 Views
  • 2 replies
  • 2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

  • 1101 Views
  • 2 replies
  • 2 kudos
Latest Reply
daniel_sahal
Honored Contributor III
  • 2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

  • 2 kudos
1 More Replies
SudiptaBiswas
by New Contributor III
  • 1575 Views
  • 3 replies
  • 3 kudos

databricks autoloader getting stuck in flattening json files for different scenarios similar in nature.

I have a databricks autoloader notebook that reads json files from an input location and writes the flattened version of json files to an output location. However, the notebook is behaving differently for two different but similar scenarios as descri...

  • 1575 Views
  • 3 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Could you provide a code snippet? also do you see any error logs in the driver logs?

  • 3 kudos
2 More Replies
SRK
by Contributor III
  • 1399 Views
  • 3 replies
  • 5 kudos

Resolved! I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.

I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.Following is the issue:java.lang.IllegalArgumentException:...

  • 1399 Views
  • 3 replies
  • 5 kudos
Latest Reply
SRK
Contributor III
  • 5 kudos

I got the issue resolved. The issues was by mistake we have duplicate columns in the schema files. Because of that it was showing that error. However, the error is totally mis-leading, that's why didn't able to rectify it.

  • 5 kudos
2 More Replies
SRK
by Contributor III
  • 1688 Views
  • 4 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 1688 Views
  • 4 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Swapnil Kamle​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 7 kudos
3 More Replies
Magnus
by Contributor
  • 1469 Views
  • 3 replies
  • 10 kudos

Resolved! How to retrieve Auto Loader client secret from Azure Key Vault?

I'm using Auto Loader in a SQL notebook and I would like to configure file notification mode, but I don't know how to retrieve the client secret of the service principal from Azure Key Vault. Is there any example notebook somewhere? The notebook is p...

  • 1469 Views
  • 3 replies
  • 10 kudos
Latest Reply
Geeta1
Valued Contributor
  • 10 kudos

Hi @Magnus Johannesson​ , you must use the Secrets utility (dbutils.secrets) in a notebook or job to read a secret.https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils#dbutils-secretsHope it helps!

  • 10 kudos
2 More Replies
kaslan
by New Contributor II
  • 4151 Views
  • 6 replies
  • 0 kudos

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want to filter them out, preferably in the stream itself rather than using a filter operation.A...

  • 4151 Views
  • 6 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

According to the docs you linked, the glob filter on input-path only works on directories, not on the files themselves.So if you want to filter on certain files in the concerning dirs, you can include an additional filter through the pathGlobFilter o...

  • 0 kudos
5 More Replies
JD2
by Contributor
  • 2141 Views
  • 6 replies
  • 4 kudos

Resolved! Auto Loader for Shape File

Hello: As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???Any help on this is greatly appreciated. Thanks. https://docs.microsoft.com/...

  • 2141 Views
  • 6 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

You could try to use the binary file type. But the disadvantage of this is that the content of the shape files will be put into a column, that might not be what you want.If you absolutely want to use the autoloader, maybe some thinking outside the b...

  • 4 kudos
5 More Replies
Labels