<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Difference between DBFS and Delta Lake? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30373#M22011</link>
    <description>&lt;P&gt;Would like a deeper dive/explanation into the difference. When I write to a table with the following code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark_df.write.mode("overwrite").saveAsTable("db.table")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The table is created and can be viewed in the Data tab. It can also be found in some DBFS path. Now if I run:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dbutils.fs.rm("{}".format(dbfs_path), recurse=True)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Where dbfs_path is a pathway to the table in DBFS, it will remove that table from DBFS, however it is still in the Data tab (even though I know you can't call the table anymore inside the notebook because technically it no longer exists).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I run:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%sql
DROP TABLE IF EXISTS db.table&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Inside a cell, it will drop the table from the Data tab and DBFS. Can someone explain (high level) how the infrastructure works? Much appreciated.&lt;/P&gt;</description>
    <pubDate>Fri, 28 Jan 2022 20:54:18 GMT</pubDate>
    <dc:creator>pjp94</dc:creator>
    <dc:date>2022-01-28T20:54:18Z</dc:date>
    <item>
      <title>Difference between DBFS and Delta Lake?</title>
      <link>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30373#M22011</link>
      <description>&lt;P&gt;Would like a deeper dive/explanation into the difference. When I write to a table with the following code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark_df.write.mode("overwrite").saveAsTable("db.table")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The table is created and can be viewed in the Data tab. It can also be found in some DBFS path. Now if I run:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dbutils.fs.rm("{}".format(dbfs_path), recurse=True)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Where dbfs_path is a pathway to the table in DBFS, it will remove that table from DBFS, however it is still in the Data tab (even though I know you can't call the table anymore inside the notebook because technically it no longer exists).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I run:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%sql
DROP TABLE IF EXISTS db.table&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Inside a cell, it will drop the table from the Data tab and DBFS. Can someone explain (high level) how the infrastructure works? Much appreciated.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jan 2022 20:54:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30373#M22011</guid>
      <dc:creator>pjp94</dc:creator>
      <dc:date>2022-01-28T20:54:18Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between DBFS and Delta Lake?</title>
      <link>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30375#M22013</link>
      <description>&lt;P&gt;Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.&lt;/P&gt;&lt;P&gt;On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct).  this can be parquet, orc, csv, json etc.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So with your rm command you did indeed delete the data from DBFS.  However, the &lt;B&gt;table definition&lt;/B&gt; still exists (it is stored in a metastore which contains metadata about which databases and tables exist and where the data resides).&lt;/P&gt;&lt;P&gt;So now you have an empty table.  To remove the table definition too, you have to drop it, exactly like you did.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For completeness: delta lake has nothing to do with this.  Delta lake is parquet on steroids giving you a lot more functionalities, but the way of working stays identical.&lt;/P&gt;</description>
      <pubDate>Mon, 31 Jan 2022 08:47:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30375#M22013</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-01-31T08:47:53Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between DBFS and Delta Lake?</title>
      <link>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30376#M22014</link>
      <description>&lt;P&gt;Hi @Werner Stinckens​&amp;nbsp;, this is exactly what I was looking for. Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1) Follow up questions, do you need to setup an object level storage connection on databricks (ie. to an S3 bucket or Azure Blob)? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2) Any folders in your /mnt path are external object stores (ie S3, Blob Storage, etc.), correct? Everything else is stored in the databricks root? I ask because my organization has 2 folders in the /mnt folder: /mnt/aws &amp;amp; /mnt/delta... not sure if delta refers to delta lake?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3) So delta lake and dbfs are independent of eachother, correct? DBFS is where the data is actually stored (ie if I wrote a table, then the parquet files). How does Delta Lake fit into this? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks so much!&lt;/P&gt;</description>
      <pubDate>Tue, 01 Feb 2022 14:45:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30376#M22014</guid>
      <dc:creator>pjp94</dc:creator>
      <dc:date>2022-02-01T14:45:02Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between DBFS and Delta Lake?</title>
      <link>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30377#M22015</link>
      <description>&lt;P&gt;1) you don´t have to as a databricks workspace has it's own storage, but it certainly is a good idea&lt;/P&gt;&lt;P&gt;2)not all folders in /mnt are external.  Only the ones you mounted in there yourself.&lt;/P&gt;&lt;P&gt;3)correct.  Delta lake is just a file format like parquet, but with more possibilities.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Feb 2022 16:32:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/difference-between-dbfs-and-delta-lake/m-p/30377#M22015</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-02-01T16:32:28Z</dc:date>
    </item>
  </channel>
</rss>

