Databricks Community

raela · 04-04-2017

df.select("columnname").distinct().show()

raela · 01-12-2017

@jack karthik What have you tried? Have you tried cast()? https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column df.select(df("colA").cast("string"))

raela · 12-22-2016

Refer to the programming guide to see the algorithms available in MLlib: http://spark.apache.org/docs/latest/ml-classification-regression.html There is no KNN in MLlib, you might want to try another algorithm that's available.

raela · 12-02-2016

What's the purpose of creating those smaller dataframes? Are you trying to write them out to separate files? You could just use a filter command and filter by gender, and then generate random samples for each resulting dataframe if you need to.

raela · 08-05-2016

Have you tried sqlContext.read.parquet("/filePath/") ?

Databricks Community

User Stats

User Activity

Re: how to get unique values of a column in pyspark dataframe

Re: How to append new column values in dataframe behalf of unique id's

Re: KNN classifier on Spark

Re: Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

Re: How can i read parquet file compressed by snappy?