cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

schnee1
by New Contributor III
  • 6513 Views
  • 8 replies
  • 0 kudos

Access struct elements inside dataframe?

I have JSON data set that contains a price in a string like "USD 5.00". I'd like to convert the numeric portion to a Double to use in an MLLIB LabeledPoint, and have managed to split the price string into an array of string. The below creates a data...

  • 6513 Views
  • 8 replies
  • 0 kudos
Latest Reply
goldentriangle
New Contributor II
  • 0 kudos

Thanks, Golden Triangle Tour

  • 0 kudos
7 More Replies
User16826992666
by Valued Contributor
  • 8577 Views
  • 2 replies
  • 0 kudos

Why do Spark MLlib models only accept a vector column as input?

In other libraries I can just use the feature columns themselves as inputs, why do I need to make a vector out of my features when I use MLlib?

  • 8577 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

Yeah, it's more a design choice. Rather than have every implementation take column(s) params, this is handled once in VectorAssembler for all of them. One way or the other, most implementations need a vector of inputs anyway. VectorAssembler can do s...

  • 0 kudos
1 More Replies
User16826992666
by Valued Contributor
  • 436 Views
  • 1 replies
  • 0 kudos

Which MLlib library am I supposed to use - pyspark.mllib or pyspark.ml?

Both of these libraries seem to be available and they are both for MLlib, how do I know which one to use?

  • 436 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

The pyspark.mllib library is built for RDD's, and the pyspark.ml library is built for Dataframes. The RDD-based mllib library is currently in maintenance mode, while the Dataframe library will continue to receive updates and active development. For t...

  • 0 kudos
pmezentsev
by New Contributor
  • 6094 Views
  • 7 replies
  • 0 kudos

Pyspark. How to get best params in grid search

Hello!I am using spark 2.1.1 in python(python 2.7 executed in jupyter notebook)And trying to make grid search for linear regression parameters.My code looks like this:from pyspark.ml.tuning import CrossValidator, ParamGridBuilder from pyspark.ml impo...

  • 6094 Views
  • 7 replies
  • 0 kudos
Latest Reply
phamyen
New Contributor II
  • 0 kudos

This is a great article. It gave me a lot of useful information. thank you very much download app

  • 0 kudos
6 More Replies
vanshikagupta
by New Contributor II
  • 6731 Views
  • 2 replies
  • 0 kudos

conversion of code from scala to python

does databricks community edition provides with databricks ML visualization for pyspark, just the same as provided in this link for scala. https://docs.azuredatabricks.net/_static/notebooks/decision-trees.html also please help me to convert this lin...

  • 6731 Views
  • 2 replies
  • 0 kudos
Latest Reply
miklos
Contributor
  • 0 kudos

Yes, CE supports it. It isn't supported in python yet.

  • 0 kudos
1 More Replies
Labels