cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16752239203
by Databricks Employee
  • 2000 Views
  • 1 replies
  • 0 kudos

How can I use Non- Spark related libraries like spacy with Databricks and Spark

I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...

  • 2000 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...

  • 0 kudos
Anonymous
by Not applicable
  • 3267 Views
  • 1 replies
  • 0 kudos
  • 3267 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

I don't think we have a lot of internal docs, just high-level explanations like https://databricks.com/blog/2021/05/27/databricks-announces-the-first-feature-store-integrated-with-delta-lake-and-mlflow.htmlHowever I don't think there's much to it. Th...

  • 0 kudos
Anonymous
by Not applicable
  • 2565 Views
  • 1 replies
  • 0 kudos
  • 2565 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

The feature store has both online / offline components. The offline feature store is used for feature discovery, model training, and batch inference and is backed by  Delta tables. You could read/write to offline store from Databricks clusters that...

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 3368 Views
  • 1 replies
  • 1 kudos

What are best NLP libraries to use with Spark

Best NLP APIs to use with Spark which gives better performance

  • 3368 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

By far the most popular and comprehensive library, to my knowledge, for Spark-native distributed NLP, is spark-nlp from John Snow Labs. https://nlp.johnsnowlabs.com/ It is open source (but with commercial support options) and has a whole lot of funct...

  • 1 kudos
User16826992666
by Databricks Employee
  • 4336 Views
  • 1 replies
  • 0 kudos
  • 4336 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

These terms are borrowed from scikit-learn, and the idea is the same. A transformer is just a component of a pipeline that transforms the data in some way. An estimator is also a transfomer, but one that additionally needs to be 'fit' on data before ...

  • 0 kudos
Anonymous
by Not applicable
  • 10824 Views
  • 1 replies
  • 0 kudos
  • 10824 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

If the image is a result of, for example, a plotting library's output, it should just render as-is.If it's not, then one simple approach is to write a markdown (%md) cell and include a link to the image: ![](url of the image)Of course this requires t...

  • 0 kudos
Anonymous
by Not applicable
  • 2807 Views
  • 1 replies
  • 0 kudos

Resolved! Best practice for Image manipulation

Can you please recommend suggestions for image manipulation once you read the data as an image ? Any specific library to use?

  • 2807 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

Spark has a built-in 'image' data source which will read a directory of images files as a DataFrame: spark.read.format("image").load(...). The resulting DataFrame has the pixel data, dimensions, channels, etc.You can also read image files 'manually' ...

  • 0 kudos
User16826994223
by Databricks Employee
  • 7284 Views
  • 2 replies
  • 0 kudos

Can I access Delta tables outside of Databricks Runtime?

Is it possible to write same table with Databricks and from OSS too, Also what if I want to read the data from Map redeuce or hive

  • 7284 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

Yes. The Delta client is open source, and lets you read/write Delta tables if you add it to your external application. See https://docs.delta.io/latest/index.html

  • 0 kudos
1 More Replies
User16826994223
by Databricks Employee
  • 1798 Views
  • 0 replies
  • 0 kudos

Databricks Certified Professional Data Scientist  Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

  • 1798 Views
  • 0 replies
  • 0 kudos
User16826994223
by Databricks Employee
  • 874 Views
  • 0 replies
  • 0 kudos

python Vs Scala in Spark Daatricks. we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also e...

python Vs Scala in Spark Daatricks.we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also enhancing its python API more than the scala API, so is Scala will be past for Spark.Thanks

  • 874 Views
  • 0 replies
  • 0 kudos
User16753724663
by Databricks Employee
  • 2966 Views
  • 1 replies
  • 0 kudos
  • 2966 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Databricks Employee
  • 0 kudos

We can use the below api to list out the jobs and then use the delete job api:https://docs.databricks.com/dev-tools/api/latest/jobs.html#listListEndpoint HTTP Method2.0/jobs/list GETOnce we list out the jobs, then we can use below API to delete them:...

  • 0 kudos
User16753724663
by Databricks Employee
  • 4894 Views
  • 1 replies
  • 1 kudos

Unable to install sf and rgeos R packages on the cluster

Got following errorjava.lang.RuntimeException: Installation failed with message:Error installing R package: Could not install package with error: installation of package ‘rgdal’ had non-zero exit status   Full error log available at /databricks/drive...

  • 4894 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753724663
Databricks Employee
  • 1 kudos

We can use the below init script to install the packages in the cluster:%scala   dbutils.fs.put("dbfs:/databricks/init_scripts/rlib.sh", """   #!/bin/bash   sudo apt-get install -y libudunits2-dev   sudo add-apt-repository ppa:ubuntugis/ubuntugis-uns...

  • 1 kudos
User16753724663
by Databricks Employee
  • 9114 Views
  • 1 replies
  • 0 kudos

Error importing pip package s3fs

A job recently began failing with the following error when a python notebook imports the pip package s3fs.ImportError: cannot import name 'maybe_sync' from 'fsspec.asyn' (/databricks/python/lib/python3.8/site-packages/fsspec/asyn.py)   ImportError Tr...

  • 9114 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Databricks Employee
  • 0 kudos

While checking the init script is installing the s3fs version 0.5.2.This version has issues at the moment from the pypi. I have tested version 0.6.0 that works fine. please change your requirement.txt file with a newer version of s3fs. Below is the p...

  • 0 kudos
Joseph_B
by Databricks Employee
  • 4032 Views
  • 1 replies
  • 1 kudos
  • 4032 Views
  • 1 replies
  • 1 kudos
Latest Reply
Joseph_B
Databricks Employee
  • 1 kudos

You can find a lot more info on this at this MLflow product page, including a comparison table at the bottom. I'd summarize that comparison as: Databricks provides three key things in its managed MLflow service.Security: MLflow experiments, models, ...

  • 1 kudos
Anonymous
by Not applicable
  • 1704 Views
  • 0 replies
  • 0 kudos

Feature Discovery

How would one discover features here and also know how to make sense of these features?Ideally, we can trace the usage of features in code as well.

  • 1704 Views
  • 0 replies
  • 0 kudos
Labels