cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 658 Views
  • 1 replies
  • 0 kudos
  • 658 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

I don't think we have a lot of internal docs, just high-level explanations like https://databricks.com/blog/2021/05/27/databricks-announces-the-first-feature-store-integrated-with-delta-lake-and-mlflow.htmlHowever I don't think there's much to it. Th...

  • 0 kudos
Anonymous
by Not applicable
  • 1074 Views
  • 1 replies
  • 0 kudos
  • 1074 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The feature store has both online / offline components. The offline feature store is used for feature discovery, model training, and batch inference and is backed by  Delta tables. You could read/write to offline store from Databricks clusters that...

  • 0 kudos
Srikanth_Gupta_
by Valued Contributor
  • 944 Views
  • 1 replies
  • 1 kudos

What are best NLP libraries to use with Spark

Best NLP APIs to use with Spark which gives better performance

  • 944 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 1 kudos

By far the most popular and comprehensive library, to my knowledge, for Spark-native distributed NLP, is spark-nlp from John Snow Labs. https://nlp.johnsnowlabs.com/ It is open source (but with commercial support options) and has a whole lot of funct...

  • 1 kudos
User16826992666
by Valued Contributor
  • 1390 Views
  • 1 replies
  • 0 kudos
  • 1390 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

These terms are borrowed from scikit-learn, and the idea is the same. A transformer is just a component of a pipeline that transforms the data in some way. An estimator is also a transfomer, but one that additionally needs to be 'fit' on data before ...

  • 0 kudos
Anonymous
by Not applicable
  • 5587 Views
  • 1 replies
  • 0 kudos
  • 5587 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

If the image is a result of, for example, a plotting library's output, it should just render as-is.If it's not, then one simple approach is to write a markdown (%md) cell and include a link to the image: ![](url of the image)Of course this requires t...

  • 0 kudos
Anonymous
by Not applicable
  • 949 Views
  • 1 replies
  • 0 kudos

Resolved! Best practice for Image manipulation

Can you please recommend suggestions for image manipulation once you read the data as an image ? Any specific library to use?

  • 949 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

Spark has a built-in 'image' data source which will read a directory of images files as a DataFrame: spark.read.format("image").load(...). The resulting DataFrame has the pixel data, dimensions, channels, etc.You can also read image files 'manually' ...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 3438 Views
  • 2 replies
  • 0 kudos

Can I access Delta tables outside of Databricks Runtime?

Is it possible to write same table with Databricks and from OSS too, Also what if I want to read the data from Map redeuce or hive

  • 3438 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

Yes. The Delta client is open source, and lets you read/write Delta tables if you add it to your external application. See https://docs.delta.io/latest/index.html

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 313 Views
  • 0 replies
  • 0 kudos

Databricks Certified Professional Data Scientist  Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

  • 313 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 255 Views
  • 0 replies
  • 0 kudos

python Vs Scala in Spark Daatricks. we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also e...

python Vs Scala in Spark Daatricks.we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also enhancing its python API more than the scala API, so is Scala will be past for Spark.Thanks

  • 255 Views
  • 0 replies
  • 0 kudos
User15986662700
by New Contributor III
  • 3359 Views
  • 1 replies
  • 0 kudos
  • 3359 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15986662700
New Contributor III
  • 0 kudos

If your data frame has complex fields, there's no standard way to convert it to a csv file and enable exporting, thus the option is disabled. Try to flatten/map the data frame before displaying, this will enable the "download full results" option aga...

  • 0 kudos
User16753724663
by Valued Contributor
  • 1317 Views
  • 1 replies
  • 0 kudos
  • 1317 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Valued Contributor
  • 0 kudos

We can use the below api to list out the jobs and then use the delete job api:https://docs.databricks.com/dev-tools/api/latest/jobs.html#listListEndpoint HTTP Method2.0/jobs/list GETOnce we list out the jobs, then we can use below API to delete them:...

  • 0 kudos
User16753724663
by Valued Contributor
  • 2652 Views
  • 1 replies
  • 1 kudos

Unable to install sf and rgeos R packages on the cluster

Got following errorjava.lang.RuntimeException: Installation failed with message:Error installing R package: Could not install package with error: installation of package ‘rgdal’ had non-zero exit status   Full error log available at /databricks/drive...

  • 2652 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753724663
Valued Contributor
  • 1 kudos

We can use the below init script to install the packages in the cluster:%scala   dbutils.fs.put("dbfs:/databricks/init_scripts/rlib.sh", """   #!/bin/bash   sudo apt-get install -y libudunits2-dev   sudo add-apt-repository ppa:ubuntugis/ubuntugis-uns...

  • 1 kudos
User16753724663
by Valued Contributor
  • 4753 Views
  • 1 replies
  • 0 kudos

Error importing pip package s3fs

A job recently began failing with the following error when a python notebook imports the pip package s3fs.ImportError: cannot import name 'maybe_sync' from 'fsspec.asyn' (/databricks/python/lib/python3.8/site-packages/fsspec/asyn.py)   ImportError Tr...

  • 4753 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Valued Contributor
  • 0 kudos

While checking the init script is installing the s3fs version 0.5.2.This version has issues at the moment from the pypi. I have tested version 0.6.0 that works fine. please change your requirement.txt file with a newer version of s3fs. Below is the p...

  • 0 kudos
Joseph_B
by New Contributor III
  • 1623 Views
  • 1 replies
  • 1 kudos
  • 1623 Views
  • 1 replies
  • 1 kudos
Latest Reply
Joseph_B
New Contributor III
  • 1 kudos

You can find a lot more info on this at this MLflow product page, including a comparison table at the bottom. I'd summarize that comparison as: Databricks provides three key things in its managed MLflow service.Security: MLflow experiments, models, ...

  • 1 kudos
Anonymous
by Not applicable
  • 700 Views
  • 0 replies
  • 0 kudos

Feature Discovery

How would one discover features here and also know how to make sense of these features?Ideally, we can trace the usage of features in code as well.

  • 700 Views
  • 0 replies
  • 0 kudos
Labels