<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to prevent sql queries in 2 notebooks from reading the same row from a Table ? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31723#M23104</link>
    <description>&lt;P&gt;Yup that's exactly my current plan but there's an issue here. I will explain:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have 1 million rows in my table&lt;/P&gt;&lt;P&gt;Let's say I give row# 1 to row# 500000 to 1st notebook&lt;/P&gt;&lt;P&gt;and&lt;/P&gt;&lt;P&gt;I give row# 500001 to row# 1 Million to 2nd Notebook.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What if the data is such that the first half (row# 1 to row# 500000) takes 1/10th the time for processing when compared to 2nd half (row# 500001 to row# 1 Million) ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You see how this can mean Notebook 1 will finish way before Notebook 2. Ideally both Notebooks should run for equal time (around the same time) to finish my activity the fastest..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hence predetermining the datasets for each notebook is not efficient. The notebooks should dynamically ingest a new batch(of 300 rows) as soon as it finishes the current batch. But my problem is both notebooks might end up ingesting the same batch . &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let me know if that makes sense &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
    <pubDate>Thu, 15 Sep 2022 10:26:39 GMT</pubDate>
    <dc:creator>KrishZ</dc:creator>
    <dc:date>2022-09-15T10:26:39Z</dc:date>
    <item>
      <title>How to prevent sql queries in 2 notebooks from reading the same row from a Table ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31720#M23101</link>
      <description>&lt;P&gt;I have an SQL query to select and update rows in a table. &lt;/P&gt;&lt;P&gt;I do this in batches of 300 rows (select 300 , update the selected 300 , select new 300 and update the newly selected and so on..) &lt;/P&gt;&lt;P&gt;&lt;B&gt;I run this query in 2 different notebooks concurrently to speed up my processing &lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Can someone tell how to prevent the same row from the table getting selected in the sql query ?&lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2022 06:35:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31720#M23101</guid>
      <dc:creator>KrishZ</dc:creator>
      <dc:date>2022-09-15T06:35:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent sql queries in 2 notebooks from reading the same row from a Table ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31721#M23102</link>
      <description>&lt;P&gt;Not too clear what your trying to do but I'd try to break the data set logically using mbe a column like date and then process them Hope that helps&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2022 07:30:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31721#M23102</guid>
      <dc:creator>PriyaAnanthram</dc:creator>
      <dc:date>2022-09-15T07:30:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent sql queries in 2 notebooks from reading the same row from a Table ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31722#M23103</link>
      <description>&lt;P&gt;mbe A rownumber column that you can add may help you run it in batches of 300&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2022 08:51:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31722#M23103</guid>
      <dc:creator>PriyaAnanthram</dc:creator>
      <dc:date>2022-09-15T08:51:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent sql queries in 2 notebooks from reading the same row from a Table ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31723#M23104</link>
      <description>&lt;P&gt;Yup that's exactly my current plan but there's an issue here. I will explain:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have 1 million rows in my table&lt;/P&gt;&lt;P&gt;Let's say I give row# 1 to row# 500000 to 1st notebook&lt;/P&gt;&lt;P&gt;and&lt;/P&gt;&lt;P&gt;I give row# 500001 to row# 1 Million to 2nd Notebook.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What if the data is such that the first half (row# 1 to row# 500000) takes 1/10th the time for processing when compared to 2nd half (row# 500001 to row# 1 Million) ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You see how this can mean Notebook 1 will finish way before Notebook 2. Ideally both Notebooks should run for equal time (around the same time) to finish my activity the fastest..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hence predetermining the datasets for each notebook is not efficient. The notebooks should dynamically ingest a new batch(of 300 rows) as soon as it finishes the current batch. But my problem is both notebooks might end up ingesting the same batch . &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let me know if that makes sense &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2022 10:26:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31723#M23104</guid>
      <dc:creator>KrishZ</dc:creator>
      <dc:date>2022-09-15T10:26:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent sql queries in 2 notebooks from reading the same row from a Table ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31724#M23105</link>
      <description>&lt;P&gt;Hi @Krishna Zanwar​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Sep 2022 07:34:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-sql-queries-in-2-notebooks-from-reading-the-same/m-p/31724#M23105</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-09-28T07:34:20Z</dc:date>
    </item>
  </channel>
</rss>

