<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Problem with dropDuplicates in Databricks runtime 15.4LTS in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90614#M37962</link>
    <description>&lt;P&gt;Exactly what &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/120535"&gt;@juan_barreto&lt;/a&gt;&amp;nbsp; said. The public api should be a contract that we can trust and it shouldn't be changed lightly. Imagine codebase with hundreds of notebooks and a developer team agreed to follow convention to use keyword arguments in that particular function. Now you have a problem. Solution is simple, but you need to rewrite your whole codebase.&lt;/P&gt;</description>
    <pubDate>Mon, 16 Sep 2024 16:42:51 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2024-09-16T16:42:51Z</dc:date>
    <item>
      <title>Problem with dropDuplicates in Databricks runtime 15.4LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/89651#M37876</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;I'm testing the latest version of the databricks runtime but I'm getting errors doing a simple dropDuplicates.&lt;/P&gt;&lt;P&gt;Using the following code&lt;/P&gt;&lt;LI-CODE lang="python"&gt;data = spark.read.table("some_table")
data.dropDuplicates(subset=['SOME_COLUMN']).count()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;I'm getting this error.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;TypeError                                 Traceback (most recent call last)
File &amp;lt;command-934417477504931&amp;gt;, line 1
----&amp;gt; 1 data.dropDuplicates(subset=['SOME_COLUMN']).count()

File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function.&amp;lt;locals&amp;gt;.wrapper(*args, **kwargs)
     45 start = time.perf_counter()
     46 try:
---&amp;gt; 47     res = func(*args, **kwargs)
     48     logger.log_success(
     49         module_name, class_name, function_name, time.perf_counter() - start, signature
     50     )
     51     return res

TypeError: DataFrame.dropDuplicates() got an unexpected keyword argument 'subset'&lt;/LI-CODE&gt;&lt;P&gt;It works fine if I only pass the list without using it as a keyword argument.&lt;/P&gt;&lt;P&gt;It looks like they changed the function definition to receive a varargs instead of a list but this broke a lot of code for us.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="juan_barreto_0-1726153266526.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11131i1B0F94EC1C6AD6EA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="juan_barreto_0-1726153266526.png" alt="juan_barreto_0-1726153266526.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Does somebody else have this problem?&lt;/P&gt;</description>
      <pubDate>Thu, 12 Sep 2024 15:03:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/89651#M37876</guid>
      <dc:creator>juan_barreto</dc:creator>
      <dc:date>2024-09-12T15:03:06Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with dropDuplicates in Databricks runtime 15.4LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/89800#M37907</link>
      <description>&lt;P&gt;Wanted to add to this thread. Seeing the same issue. This appears to be recent problem.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Sep 2024 14:08:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/89800#M37907</guid>
      <dc:creator>kellys</dc:creator>
      <dc:date>2024-09-13T14:08:57Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with dropDuplicates in Databricks runtime 15.4LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90587#M37958</link>
      <description>&lt;P&gt;Same thing here, broke a lot of code.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Sep 2024 14:07:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90587#M37958</guid>
      <dc:creator>RodriGonca</dc:creator>
      <dc:date>2024-09-16T14:07:12Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with dropDuplicates in Databricks runtime 15.4LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90591#M37959</link>
      <description>&lt;P&gt;What happens if you avoid passing it as a named parameter? Like:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;data.dropDuplicates(['SOME_COLUMN']).count()&lt;/LI-CODE&gt;</description>
      <pubDate>Mon, 16 Sep 2024 15:40:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90591#M37959</guid>
      <dc:creator>Witold</dc:creator>
      <dc:date>2024-09-16T15:40:55Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with dropDuplicates in Databricks runtime 15.4LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90613#M37961</link>
      <description>&lt;P&gt;Hi, As I said, doing that works. But it broke a really big codebase.&lt;/P&gt;&lt;P&gt;Databricks should not be changing the public API of a function in a "stable" release.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Sep 2024 16:30:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90613#M37961</guid>
      <dc:creator>juan_barreto</dc:creator>
      <dc:date>2024-09-16T16:30:24Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with dropDuplicates in Databricks runtime 15.4LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90614#M37962</link>
      <description>&lt;P&gt;Exactly what &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/120535"&gt;@juan_barreto&lt;/a&gt;&amp;nbsp; said. The public api should be a contract that we can trust and it shouldn't be changed lightly. Imagine codebase with hundreds of notebooks and a developer team agreed to follow convention to use keyword arguments in that particular function. Now you have a problem. Solution is simple, but you need to rewrite your whole codebase.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Sep 2024 16:42:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90614#M37962</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-16T16:42:51Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with dropDuplicates in Databricks runtime 15.4LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90665#M37972</link>
      <description>&lt;P&gt;Unless is was communicated as a breaking changes between major updates, it would be OK. But I can't find anything in the release notes, so it's a bug.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Sep 2024 06:00:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-dropduplicates-in-databricks-runtime-15-4lts/m-p/90665#M37972</guid>
      <dc:creator>Witold</dc:creator>
      <dc:date>2024-09-17T06:00:38Z</dc:date>
    </item>
  </channel>
</rss>

