cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Rishitha
by New Contributor III
  • 2081 Views
  • 2 replies
  • 2 kudos

Resolved! Normalizing data from autoloader

I have data on s3 and i'm using autoloader to load the data. My json docs have fields which are array of structures. When I don't specify any schema the whole data is stored as strings even the array of structures are just a blob of string making it ...

  • 2081 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Rishitha Reddy​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 2 kudos
1 More Replies
sanjay
by Valued Contributor II
  • 7674 Views
  • 0 replies
  • 0 kudos

autoloader with real time and batch processing concurrently

Hi,I have data pipeline which is running continuously, processes the micro batch data and store data in delta lake. This is taking care of any new data.But at times, I need to process historical data without disturbing real time data processing.Is th...

  • 7674 Views
  • 0 replies
  • 0 kudos
jhgorse
by New Contributor III
  • 2524 Views
  • 2 replies
  • 1 kudos

Resolved! autoloader break on migration from community to trial premium with s3 mount

in dbx community edition, the autoloader works using the s3 mount. s3 mount, autoloader:dbutils.fs.mount(f"s3a://{access_key}:{encoded_secret_key}@{aws_bucket_name}", f"/mnt/{mount_name}from pyspark.sql import SparkSession from pyspark.sql.functions ...

  • 2524 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Joe Gorse​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 1 kudos
1 More Replies
DomDuf
by New Contributor II
  • 6173 Views
  • 3 replies
  • 3 kudos

Resolved! Roll back to previous version of an AutoLoader checkpoint file

I know to "reset" AutoLoader, you can delete the checkpoint file entirely. I was wondering if it's possible to and how would someone :Get the checkpoint file to a previous version so I can reload certain files that were already processedDelete certai...

  • 6173 Views
  • 3 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

This would for sure be a useful feature.

  • 3 kudos
2 More Replies
Ryan512
by New Contributor III
  • 7852 Views
  • 2 replies
  • 5 kudos

Resolved! Does the `pathGlobFilter` option work on the entire file path or just the file name?

I'm working in the Google Cloud environment. I have an Autoloader job that uses the cloud files notifications to load data into a delta table. I want to filter the files from the PubSub topic based on the path in GCS where the files are located, not...

  • 7852 Views
  • 2 replies
  • 5 kudos
Latest Reply
Ryan512
New Contributor III
  • 5 kudos

Thank you for confirming what I observed that differed from the documentation.

  • 5 kudos
1 More Replies
YSF
by New Contributor III
  • 1418 Views
  • 1 replies
  • 1 kudos

Delta Live Table & Autoloader adding a non-existent column

I'm trying to setup autoloader to read some csv files. I tried with both autoloader with the DLT decorator as well as just autoloader by itself. The first column of the data is called "run_id", when I do a spark.read.csv() directly on the file it com...

  • 1418 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

can you attach the exact output so that I can have a look on that .

  • 1 kudos
nolanlavender00
by New Contributor
  • 1461 Views
  • 1 replies
  • 0 kudos

Garbage Collection on AutoLoader

Once a week, I get very long run times with AutoLoader. The spark job says it is done, but garbage collection keeps rising on the driver. I assume this is because of the backfill interval that I am using with FileNotification Type. I have this set to...

  • 1461 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @nolanlavender008​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...

  • 0 kudos
Kenny92
by New Contributor III
  • 10227 Views
  • 2 replies
  • 1 kudos

Resolved! How does Auto Loader ingest data?

I have recently completed the Data Engineering with Databricks v3 course on the Partner Academy. Some of the quiz questions have me mixed up.Specifically, I am wondering about this question from the "Build Data Pipelines with Delta Live Tables and Sp...

Which of the following correctly describes how Auto Loader ingests data_ Select one response.
  • 10227 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kenny Shaevel​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 1 kudos
1 More Replies
ayesharahmat
by New Contributor II
  • 3263 Views
  • 3 replies
  • 2 kudos

AutoLoader issue - java.lang.AssertionError

The below error I am encountering . I am using microbatch for autoloader. please help to rectify this issuejava.lang.AssertionError: assertion failed: Invalid batch: path#36188,modificationTime#36189,length#36190L,content#36191,PROVIDER#36192,LOCATIO...

  • 3263 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Ayesha Rahmatali​ :The error message you provided suggests that there is an assertion failure due to invalid batch data in your AutoLoader implementation. The error specifically indicates that the schema of the incoming data is not matching with the...

  • 2 kudos
2 More Replies
Arty
by New Contributor II
  • 7290 Views
  • 5 replies
  • 6 kudos

Resolved! How to make Autoloader delete files after a successful load

Hi AllCan you please advise how I can arrange loaded file deletion from Azure Storage upon its successful load via Autoloader? As I understood, Spark streaming "cleanSource" option is unavailable for Autoloader, so I'm trying to find the best way to ...

  • 7290 Views
  • 5 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Artem Sachuk​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 6 kudos
4 More Replies
tech2cloud
by New Contributor II
  • 3302 Views
  • 3 replies
  • 2 kudos
  • 3302 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 2 kudos
2 More Replies
fhmessas
by New Contributor II
  • 3394 Views
  • 1 replies
  • 0 kudos

Resolved! Autoloader stream with EventBridge message

Hi All,I have a few streaming jobs running but we have been facing an issue related to messaging. We have multiple feeds within the same root rolder i.e. logs/{accountId}/CloudWatch|CloudTrail|vpcflow/yyyy-mm-dd/logs. Hence, the SQS allows to setup o...

  • 3394 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Fernando Messas​ :Yes, you can configure Autoloader to consume messages from an SQS queue using EventBridge. Here are the steps you can follow:Create an EventBridge rule to filter messages from the SQS queue based on a specific criteria (such as the...

  • 0 kudos
Harsh_Paliwal
by New Contributor
  • 3506 Views
  • 1 replies
  • 0 kudos

java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, kernel exited with exit code 1.

I am running a parameterized autoloader notebook in a workflow.This notebook is being called 29 times in parallel, and FYI UC is also enabled.I am facing this error:java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, ke...

image
  • 3506 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Harsh Paliwal​ :The error message suggests that there might be a conflict with the xtables lock.One thing you could try is to add the -w option as suggested by the error message. You can add the following command to the beginning of your notebook t...

  • 0 kudos
tech2cloud
by New Contributor II
  • 2596 Views
  • 2 replies
  • 0 kudos

Databricks Autoloader streamReader does not include the partition column as part of output.

I have folder structure at source such as/transaction/date_=2023-01-20/hr_=02/tras01.csv/transaction/date_=2023-01-20/hr_=03/tras02.csvWhere 'date_' and 'hr_' are my partitions and present in the dataset as well. But the streamReader does not read th...

image
  • 2596 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
nolanlavender00
by New Contributor
  • 5479 Views
  • 2 replies
  • 0 kudos

How to control garbage collection while using Autoloader File Notification?

I am using Autoloader to load files from a directory. I have set up File Notification with the Event Subscription. I have a backfill interval set to 1 day and have not run the stream for a week. There should only be about ~100 new files to pick up an...

  • 5479 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @nolanlavender008​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
Labels