<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Ask your technical questions at Databricks Office Hours! November 16 - 8:00 AM - 9:00 AM PT: Register HereNovember 30 - 11:00 AM - 12:00 PM PT: Regist... in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/ask-your-technical-questions-at-databricks-office-hours-november/m-p/22819#M15699</link>
    <description>&lt;P&gt;&lt;B&gt;Ask your technical questions at &lt;/B&gt;&lt;A href="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" alt="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" target="_blank"&gt;&lt;B&gt;Databricks Office Hours&lt;/B&gt;&lt;/A&gt;!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;November 16&lt;/B&gt;&amp;nbsp;- 8:00 AM - 9:00 AM PT:&amp;nbsp;&lt;A href="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" alt="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" target="_blank"&gt;&lt;B&gt;&lt;U&gt;Register Here&lt;/U&gt;&lt;/B&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;November 30&lt;/B&gt;&amp;nbsp;- 11:00 AM - 12:00 PM PT: &lt;A href="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" alt="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" target="_blank"&gt;&lt;B&gt;&lt;U&gt;Register Here&lt;/U&gt;&lt;/B&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Databricks Office Hours&amp;nbsp;connects you directly with experts to answer all your Databricks questions. &lt;B&gt;Join us to:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;• Troubleshoot your technical questions&lt;/P&gt;&lt;P&gt;• Learn the best strategies to apply Databricks to your use case&lt;/P&gt;&lt;P&gt;• Master tips and tricks to maximize your usage of our platform&lt;/P&gt;</description>
    <pubDate>Thu, 10 Nov 2022 23:35:31 GMT</pubDate>
    <dc:creator>Taha_Hussain</dc:creator>
    <dc:date>2022-11-10T23:35:31Z</dc:date>
    <item>
      <title>Ask your technical questions at Databricks Office Hours! November 16 - 8:00 AM - 9:00 AM PT: Register HereNovember 30 - 11:00 AM - 12:00 PM PT: Regist...</title>
      <link>https://community.databricks.com/t5/data-engineering/ask-your-technical-questions-at-databricks-office-hours-november/m-p/22819#M15699</link>
      <description>&lt;P&gt;&lt;B&gt;Ask your technical questions at &lt;/B&gt;&lt;A href="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" alt="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" target="_blank"&gt;&lt;B&gt;Databricks Office Hours&lt;/B&gt;&lt;/A&gt;!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;November 16&lt;/B&gt;&amp;nbsp;- 8:00 AM - 9:00 AM PT:&amp;nbsp;&lt;A href="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" alt="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" target="_blank"&gt;&lt;B&gt;&lt;U&gt;Register Here&lt;/U&gt;&lt;/B&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;November 30&lt;/B&gt;&amp;nbsp;- 11:00 AM - 12:00 PM PT: &lt;A href="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" alt="https://www.databricks.com/p/webinar/officehours?utm_source=databricks&amp;amp;utm_medium=post&amp;amp;utm_content=dbcommunity&amp;amp;_ga=2.60807080.467803939.1667799823-1423152529.1665590694" target="_blank"&gt;&lt;B&gt;&lt;U&gt;Register Here&lt;/U&gt;&lt;/B&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Databricks Office Hours&amp;nbsp;connects you directly with experts to answer all your Databricks questions. &lt;B&gt;Join us to:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;• Troubleshoot your technical questions&lt;/P&gt;&lt;P&gt;• Learn the best strategies to apply Databricks to your use case&lt;/P&gt;&lt;P&gt;• Master tips and tricks to maximize your usage of our platform&lt;/P&gt;</description>
      <pubDate>Thu, 10 Nov 2022 23:35:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ask-your-technical-questions-at-databricks-office-hours-november/m-p/22819#M15699</guid>
      <dc:creator>Taha_Hussain</dc:creator>
      <dc:date>2022-11-10T23:35:31Z</dc:date>
    </item>
    <item>
      <title>Re: Ask your technical questions at Databricks Office Hours! November 16 - 8:00 AM - 9:00 AM PT: Register HereNovember 30 - 11:00 AM - 12:00 PM PT: Regist...</title>
      <link>https://community.databricks.com/t5/data-engineering/ask-your-technical-questions-at-databricks-office-hours-november/m-p/22820#M15700</link>
      <description>&lt;P&gt;Q&amp;amp;A Recap from 11/30 Office Hours&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: What is the downside of using z-ordering and auto optimize? It seems like there could be a tradeoff with writing small files (whereas it is good at reading a larger file), is that true?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;A: By default, Delta Lake on Databricks collects statistics on the first 32 columns defined in your table schema. It keep track of simple statistics such as minimum and maximum values at a certain granularity that’s correlated with I/O granularity.&amp;nbsp;Collecting statistics on long strings is an expensive operation that sometime can be bottleneck&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: Is there a way to run Databricks on-prem?&amp;nbsp;&amp;nbsp;We have some workloads that are not allowed to go to the cloud due to data security requirements.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;A: You can utilize our PVC support (private virtual cloud ) where all your control plane and dataplane can be accessed. That is likely the best approach.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: More info on Photon and how it is being used? we would love to read it&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;A: &lt;/I&gt;&lt;A href="https://cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf" alt="https://cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf" target="_blank"&gt;&lt;I&gt;Here is an in-depth&lt;/I&gt;&lt;/A&gt;&lt;I&gt; paper on Photon.&amp;nbsp;You can get a more general overview &lt;/I&gt;&lt;A href="https://docs.databricks.com/runtime/photon.html" alt="https://docs.databricks.com/runtime/photon.html" target="_blank"&gt;&lt;I&gt;here.&lt;/I&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;Q: &lt;/I&gt;How to organize Databricks from AWS Marketplace when you have multiple VPCs for each environment? One AWS VPC for each workspace?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt; &lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;A: The &lt;/I&gt;&lt;A href="https://docs.databricks.com/administration-guide/admin-console.html" alt="https://docs.databricks.com/administration-guide/admin-console.html" target="_blank"&gt;&lt;I&gt;account console&lt;/I&gt;&lt;/A&gt;&lt;I&gt; - all bits and bites of your network component will be controlled from here. Additionally, the subnets that you specify for a customer-managed VPC must be reserved for one Databricks workspace only. You cannot share these subnets with any&lt;/I&gt;&lt;A href="https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#subnets" alt="https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#subnets" target="_blank"&gt;&lt;I&gt; other resources,&lt;/I&gt;&lt;/A&gt;&lt;I&gt; including other Databricks workspaces.&amp;nbsp;ideally, you can have 1 workspace 1 VPC&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: I have lots of options for installing stuff on clusters: notebook-scoped libraries, cluster-scoped, init_scripts and custom docker image. In which case would you recommend each of them? Especially in the case I have a project with many many dependencies and I want to speedup cluster startup&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;A: If there are lots of libraries you wanted to install I suggest to use init script. managing the same will be easy too. One hack, add 10/20 sec sleep on your script before install command &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: Is there a way to disable the use of the default DBFS storage account by users?&amp;nbsp;&amp;nbsp;We have a cyber policy that does not allow us to use any storage accounts with public IP's (which the default storage account has, and we can not change it).&amp;nbsp;&amp;nbsp;The issue is that when new users come onboard, that DBFS storage location is the default location for when they create new tables or datasets.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;A: You need DBFS. but you can ask user to not mount. I am not sure if that fits with your use case but you can implement some deny rule too&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: What might be a best approach (responsive and also cost efficient) to handle a low volume messaging input (1000 messages per day)...assume long periods of no messages....but when they come in, a sub second response to receive and process is expected.... I am assuming I would need a cluster always on to handle this even though it would not be very busy.... unless there is another way to handle near real time messaging with sporadic incoming messages?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;A: you can use autoloader with trigger.available Now and run the job with some interval if the latency is accepted. This will help you save costs as well.&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 04:51:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ask-your-technical-questions-at-databricks-office-hours-november/m-p/22820#M15700</guid>
      <dc:creator>Taha_Hussain</dc:creator>
      <dc:date>2022-12-02T04:51:11Z</dc:date>
    </item>
  </channel>
</rss>

