<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/121991#M46614</link>
    <description>&lt;P&gt;This code snippet seems to have no relationship with the question whatsoever. Is this generated?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 17 Jun 2025 13:46:01 GMT</pubDate>
    <dc:creator>Mathias_Peters</dc:creator>
    <dc:date>2025-06-17T13:46:01Z</dc:date>
    <item>
      <title>Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark</title>
      <link>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/65530#M32832</link>
      <description>&lt;P&gt;Hi community,&lt;/P&gt;&lt;P&gt;I hope you're all doing well. I'm currently engaged in a PySpark project where I'm implementing pagination-like functionality using the &lt;FONT color="#FF6600"&gt;offset&lt;/FONT&gt; and &lt;FONT color="#FF6600"&gt;limit&lt;/FONT&gt; functions. My aim is to retrieve data between a specified &lt;FONT color="#3366FF"&gt;starting_index&lt;/FONT&gt; and &lt;FONT color="#3366FF"&gt;ending_index&lt;/FONT&gt; without computing the entire dataset in memory.&lt;/P&gt;&lt;P&gt;Here's how I'm currently using these functions:&lt;/P&gt;&lt;PRE&gt;sliced_df = df.offset(starting_index).limit(ending_index - starting_index)&lt;/PRE&gt;&lt;P&gt;However, I'm uncertain whether this approach provides reliable results, especially considering partitioned DataFrames. The documentation doesn't offer clear guidance on how these functions behave under such circumstances.&lt;/P&gt;&lt;P&gt;Could someone kindly address the following questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Can I trust that the offset and limit functions will consistently return data between the specified starting_index and ending_index?&lt;/LI&gt;&lt;LI&gt;How do these functions behave when applied to partitioned DataFrames?&lt;/LI&gt;&lt;LI&gt;Are there any best practices or considerations to ensure correct pagination when using offset and limit, particularly with partitioned DataFrames?&lt;/LI&gt;&lt;LI&gt;Is there a recommended approach that balances speed and efficiency without computing the complete dataset in memory?&lt;P&gt;Additionally, I'd like to mention that I am using db-connect Spark session for this project.&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Thu, 04 Apr 2024 16:42:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/65530#M32832</guid>
      <dc:creator>himanshu_k</dc:creator>
      <dc:date>2024-04-04T16:42:03Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark</title>
      <link>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/90552#M37955</link>
      <description>&lt;P&gt;You can leverage this code base. It works as expected using &lt;STRONG&gt;"next_page_token"&lt;/STRONG&gt; parameter-&lt;/P&gt;&lt;P&gt;Don't forget to mark this solution as correct if this helped you &lt;span class="lia-unicode-emoji" title=":upside_down_face:"&gt;🙃&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import requests

token = 'your token'
url = 'your URL'

params = {'expand_tasks': 'true'}
header = {'Authorization': f'Bearer {token}'}

while True:

    response = requests.get(url, headers=header, params=params)
    response_data = response.json()
    jobs = response_data.get("jobs", [])

    for job in jobs:

        settings = job.get('settings')
        task = settings.get('tasks')

        if task and task[0].get('existing_cluster_id'):
            job_name = settings.get('name')
            job_creator = job.get('creator_user_name')
            print(f'job creator name= {job_creator} &amp;amp; job name= {job_name}')
        else:
            print(f"{settings.get('name')} not running on ACL")

    next_page_token = response_data.get('next_page_token')
    if not next_page_token:
        break  

    params['page_token'] = next_page_token&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Sep 2024 11:08:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/90552#M37955</guid>
      <dc:creator>wise_owl</dc:creator>
      <dc:date>2024-09-16T11:08:08Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark</title>
      <link>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/121991#M46614</link>
      <description>&lt;P&gt;This code snippet seems to have no relationship with the question whatsoever. Is this generated?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jun 2025 13:46:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/121991#M46614</guid>
      <dc:creator>Mathias_Peters</dc:creator>
      <dc:date>2025-06-17T13:46:01Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark</title>
      <link>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/121992#M46615</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;did you find answer to this question?&amp;nbsp;&lt;BR /&gt;I am having similar problems and a slow solution, which I need to improve upon.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jun 2025 13:46:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clarification-needed-ensuring-correct-pagination-with-offset-and/m-p/121992#M46615</guid>
      <dc:creator>Mathias_Peters</dc:creator>
      <dc:date>2025-06-17T13:46:58Z</dc:date>
    </item>
  </channel>
</rss>

