<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is best practice for organising simple desktop-style analytics workflows in Databricks? in Data Governance</title>
    <link>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8028#M234</link>
    <description>&lt;P&gt;The articles you mention are specific about the use of Unity Catalog (a feature you CAN use in Databricks but don't have to).  It is saying that if you use Unity, dbfs mounts will not work.&lt;/P&gt;&lt;P&gt;If you do not use unity, you can perfectly mount your cloud storage in dbfs.&lt;/P&gt;&lt;P&gt;Besides that: you can always access cloud storage without a mount.  Instead of using a file path like '/mnt/datalake/...' you use 'S3://...' or 'abfss://...' &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you need Unity or not is another discussion as it has advantages but also limitations.&lt;/P&gt;</description>
    <pubDate>Thu, 09 Mar 2023 11:49:26 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2023-03-09T11:49:26Z</dc:date>
    <item>
      <title>What is best practice for organising simple desktop-style analytics workflows in Databricks?</title>
      <link>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8027#M233</link>
      <description>&lt;P&gt;Apologies in advance for the soft question, but I'm genuinely struggling with this.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're a small data science unit just setting up in Databricks. While we do run some intensive ETL and analytics jobs, a non-trivial part of the team's BAU is exploratory desktop analytics. E.g. this might involve being sent spreadsheets by other organisations, or downloading random bits of data from the web to do ad hoc, small pieces of analytics in python or R.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What is the recommended way of organising and persisting files for such workflows? Using the DBFS file system to read and write from object storage seems like the obvious solution, but the Databricks documentation seems to be giving mixed messages on this. E.g. &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts" alt="https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts" target="_blank"&gt;th&lt;/A&gt;e following 2 articles from the docs (&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/dbfs/unity-catalog" alt="https://learn.microsoft.com/en-us/azure/databricks/dbfs/unity-catalog" target="_blank"&gt;article1&lt;/A&gt;, &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts" alt="https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts" target="_blank"&gt;article2&lt;/A&gt;) state pretty explicitly right up front that:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"Databricks recommends against using DBFS and mounted cloud object storage for most use cases in Unity Catalog-enabled Azure Databricks workspaces.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"Mounted data does not work with Unity Catalog, and Databricks recommends migrating away from using mounts and managing data governance with Unity Catalog".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, what's best practice for such workflows?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 10:36:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8027#M233</guid>
      <dc:creator>jmill</dc:creator>
      <dc:date>2023-03-09T10:36:55Z</dc:date>
    </item>
    <item>
      <title>Re: What is best practice for organising simple desktop-style analytics workflows in Databricks?</title>
      <link>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8028#M234</link>
      <description>&lt;P&gt;The articles you mention are specific about the use of Unity Catalog (a feature you CAN use in Databricks but don't have to).  It is saying that if you use Unity, dbfs mounts will not work.&lt;/P&gt;&lt;P&gt;If you do not use unity, you can perfectly mount your cloud storage in dbfs.&lt;/P&gt;&lt;P&gt;Besides that: you can always access cloud storage without a mount.  Instead of using a file path like '/mnt/datalake/...' you use 'S3://...' or 'abfss://...' &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you need Unity or not is another discussion as it has advantages but also limitations.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 11:49:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8028#M234</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-03-09T11:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: What is best practice for organising simple desktop-style analytics workflows in Databricks?</title>
      <link>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8029#M235</link>
      <description>&lt;P&gt;You can also upload data in the UI&lt;span class="lia-inline-image-display-wrapper" image-alt="Image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2553i5489437429D01D43/image-size/large?v=v2&amp;amp;px=999" role="button" title="Image" alt="Image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wouldn't worry about doing something the best way, just do it the way that will get the work done.  We haven't made it so you can make giant mistakes and you can always change things in the future.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;D&lt;A href="https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-data-summarize" alt="https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-data-summarize" target="_blank"&gt;ata Summarize &lt;/A&gt;and AutoML should help a great deal in starting projects.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 12:17:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8029#M235</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-09T12:17:13Z</dc:date>
    </item>
    <item>
      <title>Re: What is best practice for organising simple desktop-style analytics workflows in Databricks?</title>
      <link>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8030#M236</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is what I usually follow. See if this helps&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;When I have a small sample data in my local disk or any data shared by my upstream colleagues over email in csv format, I simply use the 'Import and Export data' option in the Databricks UI and upload my file to a DBFS path I want and use that path for loading to Spark data frame&lt;/LI&gt;&lt;LI&gt;If my files are created my another upstream Databricks job, that will anyway be on the the path accessible by the Databricks cluster. So I read from there. &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Our cluster is hosted on AWS but I don't think it is different to Azure&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 14:10:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8030#M236</guid>
      <dc:creator>pvignesh92</dc:creator>
      <dc:date>2023-03-09T14:10:36Z</dc:date>
    </item>
    <item>
      <title>Re: What is best practice for organising simple desktop-style analytics workflows in Databricks?</title>
      <link>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8031#M237</link>
      <description>&lt;P&gt;Hi @Jason Millburn​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 09:51:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/what-is-best-practice-for-organising-simple-desktop-style/m-p/8031#M237</guid>
      <dc:creator>Vartika</dc:creator>
      <dc:date>2023-03-31T09:51:36Z</dc:date>
    </item>
  </channel>
</rss>

