<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Advice for generic file processing for ingestion of multiple data formats in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/advice-for-generic-file-processing-for-ingestion-of-multiple/m-p/68459#M33686</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;We are using delta live tables to ingest data from multiple business groups, each with different input file formats and parsing requirements.&amp;nbsp; The input files are ingested from azure blob storage.&amp;nbsp; Right now, we are only servicing three business groups, and for PoC purposes have three different parsing scripts that are tailored to the specific requirements of each group.&amp;nbsp; In the future, we could be servicing hundreds of business groups and do not want to maintain separate parsing scripts for each business group that opts in.&amp;nbsp; Does databricks offer a solution for generic file processing?&amp;nbsp; Additionally, over time the input file for any particular business group may change and their business processes change.&amp;nbsp; Can databricks adapt automatically to changing file formats?&lt;/P&gt;</description>
    <pubDate>Tue, 07 May 2024 13:51:43 GMT</pubDate>
    <dc:creator>Lea</dc:creator>
    <dc:date>2024-05-07T13:51:43Z</dc:date>
    <item>
      <title>Advice for generic file processing for ingestion of multiple data formats</title>
      <link>https://community.databricks.com/t5/data-engineering/advice-for-generic-file-processing-for-ingestion-of-multiple/m-p/68459#M33686</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;We are using delta live tables to ingest data from multiple business groups, each with different input file formats and parsing requirements.&amp;nbsp; The input files are ingested from azure blob storage.&amp;nbsp; Right now, we are only servicing three business groups, and for PoC purposes have three different parsing scripts that are tailored to the specific requirements of each group.&amp;nbsp; In the future, we could be servicing hundreds of business groups and do not want to maintain separate parsing scripts for each business group that opts in.&amp;nbsp; Does databricks offer a solution for generic file processing?&amp;nbsp; Additionally, over time the input file for any particular business group may change and their business processes change.&amp;nbsp; Can databricks adapt automatically to changing file formats?&lt;/P&gt;</description>
      <pubDate>Tue, 07 May 2024 13:51:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/advice-for-generic-file-processing-for-ingestion-of-multiple/m-p/68459#M33686</guid>
      <dc:creator>Lea</dc:creator>
      <dc:date>2024-05-07T13:51:43Z</dc:date>
    </item>
    <item>
      <title>Re: Advice for generic file processing for ingestion of multiple data formats</title>
      <link>https://community.databricks.com/t5/data-engineering/advice-for-generic-file-processing-for-ingestion-of-multiple/m-p/70154#M34018</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/104679"&gt;@Lea&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;I'd like to inform you that our platform does not currently provide a built-in feature for ingesting multiple or interchangeable file formats. However, we highly value your input and encourage you to share your ideas through Databricks'&amp;nbsp;&lt;A href="https://ideas.databricks.com/" target="_blank" rel="noopener"&gt;Ideas Portal&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Autoloader can discover files to process based on a Glob Pattern:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/ingestion/auto-loader/patterns.html#filtering-directories-or-files-using-glob-patterns" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/ingestion/auto-loader/patterns.html#filtering-directories-or-files-using-glob-patterns&lt;/A&gt;. Although this is not exactly generic file processing, this feature can help in some cases.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;At the moment, Autoloader supports the following file formats:&amp;nbsp;&lt;/STRONG&gt;&lt;A style="font-family: inherit; background-color: #ffffff;" href="https://docs.databricks.com/en/ingestion/auto-loader/options.html#file-format-options" target="_blank" rel="noopener"&gt;File Format Options&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Common patterns:&amp;nbsp;&lt;/STRONG&gt;&lt;A style="font-family: inherit; background-color: #ffffff;" href="https://docs.databricks.com/en/ingestion/auto-loader/patterns.html#common-data-loading-patterns" target="_blank" rel="noopener"&gt;Common Data Loading Patterns&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG style="color: #1b3139; font-family: inherit;"&gt;Apache Spark DataFrames and SQL supported formats:&lt;/STRONG&gt;&lt;A style="font-family: inherit; background-color: #ffffff;" href="https://docs.databricks.com/en/query/formats/index.html#data-format-options" target="_self"&gt;Data Format Options&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you have any further questions, please don't hesitate to reach out.&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2024 17:18:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/advice-for-generic-file-processing-for-ingestion-of-multiple/m-p/70154#M34018</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-05-21T17:18:55Z</dc:date>
    </item>
  </channel>
</rss>

