<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Medaillon architecture in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/medaillon-architecture/m-p/115075#M45021</link>
    <description>&lt;P&gt;Hi&amp;nbsp;patacoing,&lt;/P&gt;&lt;P&gt;How are you doing today?, As per my understanding,&amp;nbsp;The structure you described in your S3 data lake sounds more like a "pre-bronze" layer, since the files are in mixed formats (JSON, CSV, text, binary), which makes it tricky to process them directly with Spark in a uniform way. In Databricks, your bronze layer is usually where data becomes readable and queryable—often standardized into Delta format. A good approach is to use Autoloader to ingest each file type separately by setting the correct format (like .format("cloudFiles").option("cloudFiles.format", "json") and so on), and then write them into a bronze Delta table with consistent schema. If formats are very inconsistent or unknown, you could even store the raw content in a Delta table using a binary column along with a metadata map to track file info. This lets you store everything safely and do transformations later in silver/gold layers. So yes, Autoloader is definitely still relevant—you just need to process one format at a time or wrap each file’s raw content smartly. Let me know if you'd like a sample bronze setup based on your structure!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
    <pubDate>Thu, 10 Apr 2025 02:53:00 GMT</pubDate>
    <dc:creator>Brahmareddy</dc:creator>
    <dc:date>2025-04-10T02:53:00Z</dc:date>
    <item>
      <title>Medaillon architecture</title>
      <link>https://community.databricks.com/t5/data-engineering/medaillon-architecture/m-p/115066#M45020</link>
      <description>&lt;P&gt;Hello, I have in a S3 data lake, in it:&amp;nbsp; a structure of files that are of different formats : json, csv, text, binary, ...&lt;/P&gt;&lt;P&gt;Would you consider this as my bronze layer ? or a "pre-bronze" layer since it can't be processed directly by spark (because of different file format ?)&lt;BR /&gt;How am I supposed to query and trasform that data with databricks since it's from different format ?&lt;/P&gt;&lt;P&gt;Should I instead firstly transform data to put into a delta table with some columns like :&amp;nbsp;&lt;/P&gt;&lt;P&gt;- metadata (map column)&lt;/P&gt;&lt;P&gt;- content (binary column)&lt;/P&gt;&lt;P&gt;In this case, would Autoloader be relevant ?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Apr 2025 21:34:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/medaillon-architecture/m-p/115066#M45020</guid>
      <dc:creator>patacoing</dc:creator>
      <dc:date>2025-04-09T21:34:56Z</dc:date>
    </item>
    <item>
      <title>Re: Medaillon architecture</title>
      <link>https://community.databricks.com/t5/data-engineering/medaillon-architecture/m-p/115075#M45021</link>
      <description>&lt;P&gt;Hi&amp;nbsp;patacoing,&lt;/P&gt;&lt;P&gt;How are you doing today?, As per my understanding,&amp;nbsp;The structure you described in your S3 data lake sounds more like a "pre-bronze" layer, since the files are in mixed formats (JSON, CSV, text, binary), which makes it tricky to process them directly with Spark in a uniform way. In Databricks, your bronze layer is usually where data becomes readable and queryable—often standardized into Delta format. A good approach is to use Autoloader to ingest each file type separately by setting the correct format (like .format("cloudFiles").option("cloudFiles.format", "json") and so on), and then write them into a bronze Delta table with consistent schema. If formats are very inconsistent or unknown, you could even store the raw content in a Delta table using a binary column along with a metadata map to track file info. This lets you store everything safely and do transformations later in silver/gold layers. So yes, Autoloader is definitely still relevant—you just need to process one format at a time or wrap each file’s raw content smartly. Let me know if you'd like a sample bronze setup based on your structure!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Thu, 10 Apr 2025 02:53:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/medaillon-architecture/m-p/115075#M45021</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2025-04-10T02:53:00Z</dc:date>
    </item>
  </channel>
</rss>

