Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Showing results for 
Search instead for 
Did you mean: 

Running a K-means (.fit) gives error:Params must be either a param map or a list/tuple of param maps but got %s." % type(params)

New Contributor III

 am running a k-means algorithm. My feature are DoubleType and have no nulls, but I get : raise TypeError("Params must be either a param map or a list/tuple of param maps but got %s." % type(params). Anyone have any idea how to solve this?

File /databricks/spark/python/pyspark/ml/, in, dataset, params)

203 return self.copy(params)._fit(dataset)

204 else:

--> 205 return self._fit(dataset)

206 else:

207 raise TypeError(

208 "Params must be either a param map or a list/tuple of param maps, "

209 "but got %s." % type(params)

210 )

File /databricks/spark/python/pyspark/ml/, in JavaEstimator._fit(self, dataset)

380 def _fit(self, dataset: DataFrame) -> JM:

--> 381 java_model = self._fit_java(dataset)

382 model = self._create_model(java_model)

383 return self._copyValues(model)

File /databricks/spark/python/pyspark/ml/, in JavaEstimator._fit_java(self, dataset)

375 assert self._java_obj is not None

377 self._transfer_params_to_java()

--> 378 return

File /databricks/spark/python/lib/, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command)

-> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"):

File /databricks/spark/python/pyspark/errors/exceptions/, in capture_sql_exception.<locals>.deco(*a, **kw)

160 def deco(*a: Any, **kw: Any) -> Any:

161 try:

--> 162 return f(*a, **kw)

163 except Py4JJavaError as e:

164 converted = convert_exception(e.java_exception)

File /databricks/spark/python/lib/, in get_return_value(answer, gateway_client, target_id, name)

324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)

325 if answer[1] == REFERENCE_TYPE:

--> 326 raise Py4JJavaError(

327 "An error occurred while calling {0}{1}{2}.\n".

328 format(target_id, ".", name), value)

329 else:

330 raise Py4JError(

331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".

332 format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling : org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 4052.0 failed 4 times, most recent failure: Lost task

9.3 in stage 4052.0 (TID 29072) ( executor 0): java.lang.AssertionError: assertion

failed at...


Not applicable

Hi @Marcela Bejarano​ 

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

New Contributor III

I found the answer just by trying several things, although I do not understand exactly what the problem was. All I had to do was to cache the input data before fitting the model:

assemble=VectorAssembler(inputCols=columns_input, outputCol='features')
assembled_data = assembled_data.cache()
KMeans_algo=KMeans(featuresCol='features', k=number_of_clusters)    

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!