a month ago
Hi,
I'm testing the latest version of the databricks runtime but I'm getting errors doing a simple dropDuplicates.
Using the following code
data = spark.read.table("some_table")
data.dropDuplicates(subset=['SOME_COLUMN']).count()
I'm getting this error.
TypeError Traceback (most recent call last)
File <command-934417477504931>, line 1
----> 1 data.dropDuplicates(subset=['SOME_COLUMN']).count()
File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function.<locals>.wrapper(*args, **kwargs)
45 start = time.perf_counter()
46 try:
---> 47 res = func(*args, **kwargs)
48 logger.log_success(
49 module_name, class_name, function_name, time.perf_counter() - start, signature
50 )
51 return res
TypeError: DataFrame.dropDuplicates() got an unexpected keyword argument 'subset'
It works fine if I only pass the list without using it as a keyword argument.
It looks like they changed the function definition to receive a varargs instead of a list but this broke a lot of code for us.
Does somebody else have this problem?
4 weeks ago
Wanted to add to this thread. Seeing the same issue. This appears to be recent problem.
3 weeks ago
Same thing here, broke a lot of code.
3 weeks ago
What happens if you avoid passing it as a named parameter? Like:
data.dropDuplicates(['SOME_COLUMN']).count()
3 weeks ago
Hi, As I said, doing that works. But it broke a really big codebase.
Databricks should not be changing the public API of a function in a "stable" release.
3 weeks ago
Exactly what @juan_barreto said. The public api should be a contract that we can trust and it shouldn't be changed lightly. Imagine codebase with hundreds of notebooks and a developer team agreed to follow convention to use keyword arguments in that particular function. Now you have a problem. Solution is simple, but you need to rewrite your whole codebase.
3 weeks ago
Unless is was communicated as a breaking changes between major updates, it would be OK. But I can't find anything in the release notes, so it's a bug.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group