cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16752240150
by New Contributor II
  • 1091 Views
  • 1 replies
  • 1 kudos

What algorithms does Databricks AutoML use?

AutoML presumably tries a few different algorithms while hyperparameter searching. What model types are considered?

  • 1091 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

At the moment, it's really just xgboost, and sklearn implemenations like random forests, logistic regression, and linear regression as applicable. More possibilities are coming.

  • 1 kudos
User16752239203
by Databricks Employee
  • 1134 Views
  • 1 replies
  • 0 kudos

How can I use Non- Spark related libraries like spacy with Databricks and Spark

I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...

  • 1134 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...

  • 0 kudos
Anonymous
by Not applicable
  • 2232 Views
  • 1 replies
  • 0 kudos
  • 2232 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

I don't think we have a lot of internal docs, just high-level explanations like https://databricks.com/blog/2021/05/27/databricks-announces-the-first-feature-store-integrated-with-delta-lake-and-mlflow.htmlHowever I don't think there's much to it. Th...

  • 0 kudos
Anonymous
by Not applicable
  • 1703 Views
  • 1 replies
  • 0 kudos
  • 1703 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The feature store has both online / offline components. The offline feature store is used for feature discovery, model training, and batch inference and is backed by  Delta tables. You could read/write to offline store from Databricks clusters that...

  • 0 kudos
Srikanth_Gupta_
by Valued Contributor
  • 1632 Views
  • 1 replies
  • 1 kudos

What are best NLP libraries to use with Spark

Best NLP APIs to use with Spark which gives better performance

  • 1632 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

By far the most popular and comprehensive library, to my knowledge, for Spark-native distributed NLP, is spark-nlp from John Snow Labs. https://nlp.johnsnowlabs.com/ It is open source (but with commercial support options) and has a whole lot of funct...

  • 1 kudos
User16826992666
by Valued Contributor
  • 2402 Views
  • 1 replies
  • 0 kudos
  • 2402 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

These terms are borrowed from scikit-learn, and the idea is the same. A transformer is just a component of a pipeline that transforms the data in some way. An estimator is also a transfomer, but one that additionally needs to be 'fit' on data before ...

  • 0 kudos
Anonymous
by Not applicable
  • 9023 Views
  • 1 replies
  • 0 kudos
  • 9023 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

If the image is a result of, for example, a plotting library's output, it should just render as-is.If it's not, then one simple approach is to write a markdown (%md) cell and include a link to the image: ![](url of the image)Of course this requires t...

  • 0 kudos
Anonymous
by Not applicable
  • 1716 Views
  • 1 replies
  • 0 kudos

Resolved! Best practice for Image manipulation

Can you please recommend suggestions for image manipulation once you read the data as an image ? Any specific library to use?

  • 1716 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

Spark has a built-in 'image' data source which will read a directory of images files as a DataFrame: spark.read.format("image").load(...). The resulting DataFrame has the pixel data, dimensions, channels, etc.You can also read image files 'manually' ...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 5396 Views
  • 2 replies
  • 0 kudos

Can I access Delta tables outside of Databricks Runtime?

Is it possible to write same table with Databricks and from OSS too, Also what if I want to read the data from Map redeuce or hive

  • 5396 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

Yes. The Delta client is open source, and lets you read/write Delta tables if you add it to your external application. See https://docs.delta.io/latest/index.html

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 600 Views
  • 0 replies
  • 0 kudos

Databricks Certified Professional Data Scientist  Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

  • 600 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 513 Views
  • 0 replies
  • 0 kudos

python Vs Scala in Spark Daatricks. we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also e...

python Vs Scala in Spark Daatricks.we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also enhancing its python API more than the scala API, so is Scala will be past for Spark.Thanks

  • 513 Views
  • 0 replies
  • 0 kudos
User16753724663
by Valued Contributor
  • 2061 Views
  • 1 replies
  • 0 kudos
  • 2061 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Valued Contributor
  • 0 kudos

We can use the below api to list out the jobs and then use the delete job api:https://docs.databricks.com/dev-tools/api/latest/jobs.html#listListEndpoint HTTP Method2.0/jobs/list GETOnce we list out the jobs, then we can use below API to delete them:...

  • 0 kudos
User16753724663
by Valued Contributor
  • 3643 Views
  • 1 replies
  • 1 kudos

Unable to install sf and rgeos R packages on the cluster

Got following errorjava.lang.RuntimeException: Installation failed with message:Error installing R package: Could not install package with error: installation of package ‘rgdal’ had non-zero exit status   Full error log available at /databricks/drive...

  • 3643 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753724663
Valued Contributor
  • 1 kudos

We can use the below init script to install the packages in the cluster:%scala   dbutils.fs.put("dbfs:/databricks/init_scripts/rlib.sh", """   #!/bin/bash   sudo apt-get install -y libudunits2-dev   sudo add-apt-repository ppa:ubuntugis/ubuntugis-uns...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels