<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ADF logs into Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79883#M35870</link>
    <description>&lt;P&gt;How fancy do you want to go? You can send ADF diagnostic settings to an event hub and stream them into a delta table in Databricks. Or you can send them to a storage account and build a workflow with 5 minute interval that loads the storage blob into a delta table. The new &lt;A href="https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark" target="_self"&gt;&lt;EM&gt;Variant&lt;/EM&gt;&lt;/A&gt; datatype might be your friend here.&lt;/P&gt;</description>
    <pubDate>Mon, 22 Jul 2024 14:39:59 GMT</pubDate>
    <dc:creator>jacovangelder</dc:creator>
    <dc:date>2024-07-22T14:39:59Z</dc:date>
    <item>
      <title>ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79700#M35829</link>
      <description>&lt;P&gt;Hello, I would like to know the best way to insert Datafactory activity logs into my Databricks delta table, so that I can use dashbosrd and create monitoring in Databricks itself , can you help me? I would like every 5 minutes for all activity logs in the data factory to be inserted into the Databricks delta table, that is, if 10 pipelines are completed, the logs of these 10 are inserted into the delta. Please note: no logs cannot be missing. I want a solution that is considered good practice, economical and efficient, can you help me with this?&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jul 2024 22:22:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79700#M35829</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-21T22:22:36Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79868#M35867</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/112751"&gt;@8b1tz&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;You can use ADF Rest API to read the logs.&lt;BR /&gt;Ex:&amp;nbsp;&lt;A href="https://medium.com/creative-data/custom-logging-in-azure-data-factory-and-azure-synapse-analytics-f084643a5489" target="_blank"&gt;https://medium.com/creative-data/custom-logging-in-azure-data-factory-and-azure-synapse-analytics-f084643a5489&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 12:46:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79868#M35867</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2024-07-22T12:46:48Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79871#M35868</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/112751"&gt;@8b1tz&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;You can also configure ADF diagnostic settings.&amp;nbsp;&lt;SPAN&gt;You can send it to a&amp;nbsp;&lt;/SPAN&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;storage location&lt;/FONT&gt;&lt;SPAN&gt;, Log Analytics, or Event Hubs.&amp;nbsp;&lt;BR /&gt;If you send it to storage location, then you can create i.e external storage location and directly query those logs in Databricks.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/data-factory/monitor-configure-diagnostics" target="_blank"&gt;Configure diagnostic settings and a workspace - Azure Data Factory | Microsoft Learn&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 12:58:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79871#M35868</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-22T12:58:03Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79883#M35870</link>
      <description>&lt;P&gt;How fancy do you want to go? You can send ADF diagnostic settings to an event hub and stream them into a delta table in Databricks. Or you can send them to a storage account and build a workflow with 5 minute interval that loads the storage blob into a delta table. The new &lt;A href="https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark" target="_self"&gt;&lt;EM&gt;Variant&lt;/EM&gt;&lt;/A&gt; datatype might be your friend here.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 14:39:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79883#M35870</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-07-22T14:39:59Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79900#M35873</link>
      <description>&lt;P&gt;I'm thinking about sending the logs to the event hub and leaving a job running continuously in Databricks taking the events and inserting them, what do you think?&amp;nbsp;Will it be too expensive?&amp;nbsp;If it is expensive, at least I believe it is the most scalable and robust solution.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 15:44:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/79900#M35873</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-22T15:44:05Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80068#M35912</link>
      <description>&lt;P&gt;It depends on what you find costly. I would ask yourself the question if you really need it in a 5 minute interval. If so, then there won't be much difference in pricing leaving a cheap cluster running and streaming it, compared to having a (serverless) workflow run every 5 minutes.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 06:40:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80068#M35912</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-07-23T06:40:29Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80136#M35942</link>
      <description>&lt;P&gt;So, yesterday I spent almost the whole day trying to implement Databricks consuming the Event hub, it ended up not working, should I try another way? Do you suggest something simpler to implement?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 13:07:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80136#M35942</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-23T13:07:32Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80137#M35943</link>
      <description>&lt;P&gt;Simpler would be to just send logs to storage location and consume it in that way, maybe with autloader.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 13:08:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80137#M35943</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-23T13:08:58Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80141#M35944</link>
      <description>&lt;P&gt;Send to storage and consume one by one? Would this be scalable? How would I fetch only the missing ones? Should I delete the ones that have already been processed? Would this be more costly? What do you think?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 13:15:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80141#M35944</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-23T13:15:15Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80142#M35945</link>
      <description>&lt;P&gt;No one by one, you need to configure diagnostic setting to dump logs into storage account. Then you configure databricks autoloader to point to this log location and it will handle loading those file for you. Under the hood autoloader uses spark structered streaming, so with each run it will only load newly added logs files.&lt;BR /&gt;&lt;BR /&gt;Read below documentation entry and as a best practice use File notification mode:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/" target="_blank"&gt;What is Auto Loader? - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 13:18:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80142#M35945</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-23T13:18:43Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80144#M35946</link>
      <description>&lt;P&gt;Oh, can you give me a video that shows this better? I've never used Databricks auto loader (I'm new to this area) Do you need new configurations on the cluster? Can you do it with a job?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 13:25:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80144#M35946</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-23T13:25:36Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80147#M35947</link>
      <description>&lt;P&gt;Sure, here is a couple of worth watching. And yes, you can use it with job. Only configuration that is required, is setting up storage queue and event grid if you want use File Notification mode. Databricks can do it for you automatically if you give service principal sufficient premissiom. Watch below videos and you will get the idea.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.youtube.com/watch?v=8a38Fv9cpd8" target="_blank"&gt;Accelerating Data Ingestion with Databricks Autoloader (youtube.com)&lt;/A&gt;&lt;A href="https://www.youtube.com/watch?v=TIju0uNKtkE" target="_blank"&gt;Autoloader in databricks (youtube.com)&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://www.youtube.com/watch?v=Zm_cv0QMu1s" target="_blank"&gt;DP-203: 36 - Automating the process with Azure Databricks Autoloader (youtube.com)&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 13:42:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80147#M35947</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-23T13:42:22Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80150#M35949</link>
      <description>&lt;P&gt;Therefore, I insert the pipeline diagnostics into the Storage --&amp;gt; A Databricks notebook is automatically triggered --&amp;gt; I can process the data in this notebook using filters and then insert it into the Delta Table.&lt;/P&gt;&lt;P&gt;Is it something like this?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 13:49:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80150#M35949</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-23T13:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80209#M35962</link>
      <description>&lt;P&gt;I did it! Thank you very much!&lt;/P&gt;&lt;P&gt;Just one question: do I need to create a task that runs continuously, or can I schedule it?&lt;/P&gt;&lt;P&gt;I didn't understand the event grid part. Could you send me a screenshot of it?&lt;/P&gt;&lt;P&gt;I want it to combine all the new files and insert them into the Delta Table, for example, every 10 minutes it should insert the new ones (if there are any), without the risk of duplication. Or do I need to run it continuously? I'm concerned about the cost.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 20:14:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80209#M35962</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-23T20:14:01Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80214#M35963</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/112751"&gt;@8b1tz&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Glad that it worked for you. You don't have to run it continuously, you can ran it as batch jobs with Trigger.AvailableNow (look at below link, cost consideration sections):&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/ingestion/auto-loader/production.html#cost-considerations" target="_blank" rel="noopener"&gt;Configure Auto Loader for production workloads | Databricks on AWS&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;As of event grid part, read about File Notification Mode in Autoloader (or watch below video). In short, this mode is recommended to efficiently ingest large amount of data.&lt;BR /&gt;&lt;SPAN&gt;In file notification mode, Auto Loader automatically (you can set it manually if you prefer) sets up a notification service (Event Grid) and queue service (Storage Queue) that subscribes to file events from the input directory.&lt;BR /&gt;&lt;/SPAN&gt;So it works like this, new file arrives on your storage then event grid sends information about new file to storage queue. Finally, autoloader checks if there are new files at storage queue to process. If auto loader succesfully processed data it empties the queue and saves those information in checkpoint.&lt;BR /&gt;&lt;BR /&gt;Auto loader will combine all new data into target table, so in each run it will load only new data.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://www.youtube.com/watch?v=fasr08wJhJE&amp;amp;t=203s" target="_blank" rel="noopener"&gt;Az Databricks # 28:- Autoloader in Databricks || File Notification mode (youtube.com)&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 20:41:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80214#M35963</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-23T20:41:15Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80217#M35965</link>
      <description>&lt;P&gt;Geez, thank you very much! So I'm going to do it like this: the job checks if the specific blob has been updated, if so, I activate the notebook that catches the events within the storage and uses the checkpoint to not catch them, what do you think?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 21:02:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80217#M35965</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-23T21:02:56Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80220#M35966</link>
      <description>&lt;P&gt;I have a question: Should I use only the job trigger and a notebook without Auto Loader, use only the Auto Loader, or use the job trigger along with the Auto Loader?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 22:25:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80220#M35966</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-23T22:25:30Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80271#M35976</link>
      <description>&lt;P&gt;I agree that consuming an event hub is not as straightforward, but it is doable by setting up a kafka stream in Spark. To be honest I find autoloader a bit cumbersome especially for this usecase, but hey if it works it works.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jul 2024 07:46:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80271#M35976</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-07-24T07:46:25Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80335#M35992</link>
      <description>&lt;P&gt;yes, in the end I just want to put the logs in the delta table and that's it, I don't want to store anything, I know that maybe sending the logs to storage might not be a good option but I tried the event hub a lot and I couldn't :(. in the end I think I'm going to use the file trigger that arrived in the job and use a notebook, I just don't know how I'm going to guarantee that there's no duplication, would that be ok, maybe delete the ones I've consumed, I don't know, I'm afraid of deleting them? a log that arrived on time after reading...&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jul 2024 11:38:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80335#M35992</guid>
      <dc:creator>8b1tz</dc:creator>
      <dc:date>2024-07-24T11:38:56Z</dc:date>
    </item>
    <item>
      <title>Re: ADF logs into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80336#M35993</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/112751"&gt;@8b1tz&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Once again, if you use auto loader it guarantees exactly once semantics, so there shouldn't be any duplicates.&lt;BR /&gt;The same applies if you were to use Event Hub, it's just different data source, but same concept of structured streaming applies (auto loader is built upon structered streaming).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Below is a snippet from documentation:&lt;/P&gt;&lt;P&gt;As files are discovered, their metadata is persisted in a scalable key-value store (RocksDB) in the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;checkpoint location&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of your Auto Loader pipeline. T&lt;STRONG&gt;his key-value store ensures that data is processed exactly once.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In case of failures, Auto Loader can resume from where it left off by information stored in the checkpoint location and continue to provide exactly-once guarantees when writing data into Delta Lake. &lt;STRONG&gt;You don’t need to maintain or manage any state yourself to achieve fault tolerance or exactly-once semantics&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I highly recommend you to get to know how auto loader work (or more generally, how structured streaming works). Read documentation, watch some video on YT.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jul 2024 11:50:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adf-logs-into-databricks/m-p/80336#M35993</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-24T11:50:42Z</dc:date>
    </item>
  </channel>
</rss>

