cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Running a K-means (.fit) gives error:Params must be either a param map or a list/tuple of param maps but got %s." % type(params)

mbejarano89
New Contributor III

 am running a k-means algorithm. My feature are DoubleType and have no nulls, but I get : raise TypeError("Params must be either a param map or a list/tuple of param maps but got %s." % type(params). Anyone have any idea how to solve this?

File /databricks/spark/python/pyspark/ml/base.py:205, in Estimator.fit(self, dataset, params)

203 return self.copy(params)._fit(dataset)

204 else:

--> 205 return self._fit(dataset)

206 else:

207 raise TypeError(

208 "Params must be either a param map or a list/tuple of param maps, "

209 "but got %s." % type(params)

210 )

File /databricks/spark/python/pyspark/ml/wrapper.py:381, in JavaEstimator._fit(self, dataset)

380 def _fit(self, dataset: DataFrame) -> JM:

--> 381 java_model = self._fit_java(dataset)

382 model = self._create_model(java_model)

383 return self._copyValues(model)

File /databricks/spark/python/pyspark/ml/wrapper.py:378, in JavaEstimator._fit_java(self, dataset)

375 assert self._java_obj is not None

377 self._transfer_params_to_java()

--> 378 return self._java_obj.fit(dataset._jdf)

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command)

-> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"):

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:162, in capture_sql_exception.<locals>.deco(*a, **kw)

160 def deco(*a: Any, **kw: Any) -> Any:

161 try:

--> 162 return f(*a, **kw)

163 except Py4JJavaError as e:

164 converted = convert_exception(e.java_exception)

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)

324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)

325 if answer[1] == REFERENCE_TYPE:

--> 326 raise Py4JJavaError(

327 "An error occurred while calling {0}{1}{2}.\n".

328 format(target_id, ".", name), value)

329 else:

330 raise Py4JError(

331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".

332 format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o13759.fit. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 4052.0 failed 4 times, most recent failure: Lost task

9.3 in stage 4052.0 (TID 29072) (10.1.2.11 executor 0): java.lang.AssertionError: assertion

failed at...

2 REPLIES 2

Anonymous
Not applicable

Hi @Marcela Bejarano​ 

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

mbejarano89
New Contributor III

I found the answer just by trying several things, although I do not understand exactly what the problem was. All I had to do was to cache the input data before fitting the model:

assemble=VectorAssembler(inputCols=columns_input, outputCol='features')
assembled_data=assemble.transform(df)
assembled_data = assembled_data.cache()
KMeans_algo=KMeans(featuresCol='features', k=number_of_clusters)            
KMeans_fit=KMeans_algo.fit(assembled_data) 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.