<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Does Azure Databricks and Delta Layer make it a Lakehouse? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/does-azure-databricks-and-delta-layer-make-it-a-lakehouse/m-p/22424#M15349</link>
    <description>&lt;P&gt;At a high level a Lakehouse must contain the following properties: &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Open direct access data formats (Apache Parquet, Delta Lake etc.)&lt;/LI&gt;&lt;LI&gt;First class support for machine learning and data science workloads&lt;/LI&gt;&lt;LI&gt;state of the art performance&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Databricks is the first Lakehouse because it meets the above three properties. Specifically, if you are using Databricks with ADLS and converting all your data (json, csv, parquet, messages etc.) into Delta tables that are available within Databricks. Then that is the making of a Lakehouse, but it still needs to be built and supported.  The Databricks platform allows us to satisfy points 2 and 3 above and Delta Lake satisfies 1 ad 3 (performance relies on the engine and the storage which is why 3 is mentioned twice). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Leveraging Databricks and accessing data stored in Delta is a Lakehouse. By adding Databricks SQL (formally SQL Analytics) we allow more users to access and use the Lakehouse. In Databricks SQL users are using the same compute and data as the data engineer does in Databricks, they just have a different UI that they are familiar with. Additionally, Databricks SQL is optimized for SQL and BI workloads while the notebook environment is better for engineering and data science &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As a fun read you should check our the &lt;A href="http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf" alt="http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf" target="_blank"&gt;Lakehouse whitepaper&lt;/A&gt;. &lt;/P&gt;</description>
    <pubDate>Fri, 18 Jun 2021 20:35:48 GMT</pubDate>
    <dc:creator>Ryan_Chynoweth</dc:creator>
    <dc:date>2021-06-18T20:35:48Z</dc:date>
    <item>
      <title>Does Azure Databricks and Delta Layer make it a Lakehouse?</title>
      <link>https://community.databricks.com/t5/data-engineering/does-azure-databricks-and-delta-layer-make-it-a-lakehouse/m-p/22423#M15348</link>
      <description>&lt;P&gt;Even after going through many resources, I have failed to understand what constitutes a lakehouse, hence my question below.&lt;/P&gt;&lt;P&gt;If we have Azure Gen 2 Storage, ADF, and Azure Databricks with the possibility of converting the incoming CSV files into Delta tables can that be called a "Lakehouse" architecture or is it called a "Delta Lake"?&lt;/P&gt;&lt;P&gt;Or is it the "SQL analytics" engine over and above the Delta Lake layer that makes it a "Lakehouse"?&lt;/P&gt;&lt;P&gt;Please clarify.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 20:07:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/does-azure-databricks-and-delta-layer-make-it-a-lakehouse/m-p/22423#M15348</guid>
      <dc:creator>User16765131552</dc:creator>
      <dc:date>2021-06-18T20:07:26Z</dc:date>
    </item>
    <item>
      <title>Re: Does Azure Databricks and Delta Layer make it a Lakehouse?</title>
      <link>https://community.databricks.com/t5/data-engineering/does-azure-databricks-and-delta-layer-make-it-a-lakehouse/m-p/22424#M15349</link>
      <description>&lt;P&gt;At a high level a Lakehouse must contain the following properties: &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Open direct access data formats (Apache Parquet, Delta Lake etc.)&lt;/LI&gt;&lt;LI&gt;First class support for machine learning and data science workloads&lt;/LI&gt;&lt;LI&gt;state of the art performance&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Databricks is the first Lakehouse because it meets the above three properties. Specifically, if you are using Databricks with ADLS and converting all your data (json, csv, parquet, messages etc.) into Delta tables that are available within Databricks. Then that is the making of a Lakehouse, but it still needs to be built and supported.  The Databricks platform allows us to satisfy points 2 and 3 above and Delta Lake satisfies 1 ad 3 (performance relies on the engine and the storage which is why 3 is mentioned twice). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Leveraging Databricks and accessing data stored in Delta is a Lakehouse. By adding Databricks SQL (formally SQL Analytics) we allow more users to access and use the Lakehouse. In Databricks SQL users are using the same compute and data as the data engineer does in Databricks, they just have a different UI that they are familiar with. Additionally, Databricks SQL is optimized for SQL and BI workloads while the notebook environment is better for engineering and data science &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As a fun read you should check our the &lt;A href="http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf" alt="http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf" target="_blank"&gt;Lakehouse whitepaper&lt;/A&gt;. &lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 20:35:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/does-azure-databricks-and-delta-layer-make-it-a-lakehouse/m-p/22424#M15349</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-06-18T20:35:48Z</dc:date>
    </item>
    <item>
      <title>Re: Does Azure Databricks and Delta Layer make it a Lakehouse?</title>
      <link>https://community.databricks.com/t5/data-engineering/does-azure-databricks-and-delta-layer-make-it-a-lakehouse/m-p/22425#M15350</link>
      <description>&lt;P&gt;Lakehouse is a concept defined with the following Parameter-&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Data is stored in an open standard format.&lt;/LI&gt;&lt;LI&gt;Data is stored in a way which support Data Science,ML  and BI loads.&lt;/LI&gt;&lt;LI&gt;Delta is just a way or engine on cloud storage that provides control on data and prevent it from becoming data swamp and also add performance and provide sql like query support&lt;/LI&gt;&lt;LI&gt; for lake house it is always recommended to have 3 layers,&lt;/LI&gt;&lt;/OL&gt;&lt;UL&gt;&lt;LI&gt;Bronze -  Raw data  as it is from OTP&lt;/LI&gt;&lt;LI&gt;Silver -data in a curated format and with a filter  that does not  allow any junk data to silver, this layer is best suited for Data science and ML&lt;/LI&gt;&lt;LI&gt;gold layer-Purely aggregated data that helps in BI  and can be used in Machine learning too.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Wed, 23 Jun 2021 06:09:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/does-azure-databricks-and-delta-layer-make-it-a-lakehouse/m-p/22425#M15350</guid>
      <dc:creator>User16826994223</dc:creator>
      <dc:date>2021-06-23T06:09:07Z</dc:date>
    </item>
  </channel>
</rss>

