Re: ADF logs into Databricks

szymon_dybczak · ‎07-24-2024

Auto loader is using spark structered streaming, but you can use it in a "batch" mode. In one of my earlier responses I've mentioned that you can ran it as batch jobs with Trigger.AvailableNow. And once again link to documentation.

Configure Auto Loader for production workloads | Databricks on AWS

How it works:

- you setup diagonostic setting to load logs into storage directory (for the sake of example let's called it -> "input_data")

- you configure auto loader and in configuration you point to that path -> "input_data"

- you configure job to run once in an hour

- your job start and auto loader will load all files that are in "input_data" to the target table. When the job ends the job cluster will be terminated

- in the meantime another logs are written to the storage (to the "input_data" directory)

- hours passed, so once again you're job is starting. This time auto loader will load only new files that arrived since last time

View solution in original post