Machine Learning

by Anonymous • Not applicable

06-14-2021 7:54:10 AM

658 Views
1 replies
0 kudos

Resolved! Is there documentation about the feature store API and how it's architected under the hood?

Machine Learning

Reply

658 Views
1 replies
0 kudos

06-14-2021 7:54:10 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-17-2021 4:20:13 PM

0 kudos

I don't think we have a lot of internal docs, just high-level explanations like https://databricks.com/blog/2021/05/27/databricks-announces-the-first-feature-store-integrated-with-delta-lake-and-mlflow.htmlHowever I don't think there's much to it. Th...

0 kudos

06-17-2021 4:20:13 PM

by Anonymous • Not applicable

06-14-2021 7:52:17 AM

1074 Views
1 replies
0 kudos

Resolved! Do you have information on the scalability and cost of using Databricks feature store?

Machine Learning

Reply

1074 Views
1 replies
0 kudos

06-14-2021 7:52:17 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-17-2021 2:22:03 PM

0 kudos

The feature store has both online / offline components. The offline feature store is used for feature discovery, model training, and batch inference and is backed by Delta tables. You could read/write to offline store from Databricks clusters that...

0 kudos

06-17-2021 2:22:03 PM

by Srikanth_Gupta_ • Valued Contributor

06-16-2021 5:49:23 AM

944 Views
1 replies
1 kudos

What are best NLP libraries to use with Spark

Best NLP APIs to use with Spark which gives better performance

Machine Learning

Reply

944 Views
1 replies
1 kudos

06-16-2021 5:49:23 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-17-2021 12:59:25 PM

1 kudos

By far the most popular and comprehensive library, to my knowledge, for Spark-native distributed NLP, is spark-nlp from John Snow Labs. https://nlp.johnsnowlabs.com/ It is open source (but with commercial support options) and has a whole lot of funct...

1 kudos

06-17-2021 12:59:25 PM

by User16826992666 • Valued Contributor

06-17-2021 8:05:21 AM

1390 Views
1 replies
0 kudos

Resolved! In Spark MLlib, what is the difference between an estimator and a transformer?

Machine Learning

Reply

1390 Views
1 replies
0 kudos

06-17-2021 8:05:21 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-17-2021 11:21:49 AM

0 kudos

These terms are borrowed from scikit-learn, and the idea is the same. A transformer is just a component of a pipeline that transforms the data in some way. An estimator is also a transfomer, but one that additionally needs to be 'fit' on data before ...

0 kudos

06-17-2021 11:21:49 AM

by Anonymous • Not applicable

06-17-2021 9:16:50 AM

5587 Views
1 replies
0 kudos

Resolved! How can I embed image to my notebook?

Machine Learning

Reply

5587 Views
1 replies
0 kudos

06-17-2021 9:16:50 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-17-2021 11:17:07 AM

0 kudos

If the image is a result of, for example, a plotting library's output, it should just render as-is.If it's not, then one simple approach is to write a markdown (%md) cell and include a link to the image: ![](url of the image)Of course this requires t...

0 kudos

06-17-2021 11:17:07 AM

by Anonymous • Not applicable

06-17-2021 9:28:44 AM

949 Views
1 replies
0 kudos

Resolved! Best practice for Image manipulation

Can you please recommend suggestions for image manipulation once you read the data as an image ? Any specific library to use?

Machine Learning

Reply

949 Views
1 replies
0 kudos

06-17-2021 9:28:44 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-17-2021 11:13:58 AM

0 kudos

Spark has a built-in 'image' data source which will read a directory of images files as a DataFrame: spark.read.format("image").load(...). The resulting DataFrame has the pixel data, dimensions, channels, etc.You can also read image files 'manually' ...

0 kudos

06-17-2021 11:13:58 AM

by User16826994223 • Honored Contributor III

06-15-2021 9:06:09 AM

3438 Views
2 replies
0 kudos

Can I access Delta tables outside of Databricks Runtime?

Is it possible to write same table with Databricks and from OSS too, Also what if I want to read the data from Map redeuce or hive

Machine Learning

Reply

3438 Views
2 replies
0 kudos

06-15-2021 9:06:09 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-17-2021 11:11:45 AM

0 kudos

Yes. The Delta client is open source, and lets you read/write Delta tables if you add it to your external application. See https://docs.delta.io/latest/index.html

0 kudos

06-17-2021 11:11:45 AM

1 More Replies

by User16826994223 • Honored Contributor III

06-17-2021 1:48:38 AM

313 Views
0 replies
0 kudos

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

Machine Learning

Reply

313 Views
0 replies
0 kudos

06-17-2021 1:48:38 AM

by User16826994223 • Honored Contributor III

06-17-2021 1:16:56 AM

255 Views
0 replies
0 kudos

python Vs Scala in Spark Daatricks. we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also e...

python Vs Scala in Spark Daatricks.we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also enhancing its python API more than the scala API, so is Scala will be past for Spark.Thanks

Machine Learning

Reply

255 Views
0 replies
0 kudos

06-17-2021 1:16:56 AM

by User15986662700 • New Contributor III

06-16-2021 1:49:34 PM

3359 Views
1 replies
0 kudos

Why is the "download full results" option disabled (gray)?

Machine Learning

Reply

3359 Views
1 replies
0 kudos

06-16-2021 1:49:34 PM

View Replies

Latest Reply

User15986662700
New Contributor III

06-16-2021 1:51:42 PM

0 kudos

If your data frame has complex fields, there's no standard way to convert it to a csv file and enable exporting, thus the option is disabled. Try to flatten/map the data frame before displaying, this will enable the "download full results" option aga...

0 kudos

06-16-2021 1:51:42 PM

by User16753724663 • Valued Contributor

06-16-2021 10:25:19 AM

1317 Views
1 replies
0 kudos

we have noticed that more than 1000 jobs are in jobs list in shard . Due to which we are getting 'error_code': 'QUOTA_EXCEEDED', when submitting new jobs using job API

Machine Learning

Reply

1317 Views
1 replies
0 kudos

06-16-2021 10:25:19 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

06-16-2021 10:26:33 AM

0 kudos

We can use the below api to list out the jobs and then use the delete job api:https://docs.databricks.com/dev-tools/api/latest/jobs.html#listListEndpoint HTTP Method2.0/jobs/list GETOnce we list out the jobs, then we can use below API to delete them:...

0 kudos

06-16-2021 10:26:33 AM

by User16753724663 • Valued Contributor

06-16-2021 10:20:23 AM

2652 Views
1 replies
1 kudos

Unable to install sf and rgeos R packages on the cluster

Got following errorjava.lang.RuntimeException: Installation failed with message:Error installing R package: Could not install package with error: installation of package ‘rgdal’ had non-zero exit status Full error log available at /databricks/drive...

Machine Learning

Reply

2652 Views
1 replies
1 kudos

06-16-2021 10:20:23 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

06-16-2021 10:21:07 AM

1 kudos

We can use the below init script to install the packages in the cluster:%scala dbutils.fs.put("dbfs:/databricks/init_scripts/rlib.sh", """ #!/bin/bash sudo apt-get install -y libudunits2-dev sudo add-apt-repository ppa:ubuntugis/ubuntugis-uns...

1 kudos

06-16-2021 10:21:07 AM

by User16753724663 • Valued Contributor

06-16-2021 10:11:18 AM

4753 Views
1 replies
0 kudos

Error importing pip package s3fs

A job recently began failing with the following error when a python notebook imports the pip package s3fs.ImportError: cannot import name 'maybe_sync' from 'fsspec.asyn' (/databricks/python/lib/python3.8/site-packages/fsspec/asyn.py) ImportError Tr...

Machine Learning

Reply

4753 Views
1 replies
0 kudos

06-16-2021 10:11:18 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

06-16-2021 10:12:23 AM

0 kudos

While checking the init script is installing the s3fs version 0.5.2.This version has issues at the moment from the pypi. I have tested version 0.6.0 that works fine. please change your requirement.txt file with a newer version of s3fs. Below is the p...

0 kudos

06-16-2021 10:12:23 AM

by Joseph_B • New Contributor III

06-14-2021 2:38:56 PM

1623 Views
1 replies
1 kudos

How does Databricks managed MLflow compare with open-source (OSS) MLflow?

Machine Learning

Reply

1623 Views
1 replies
1 kudos

06-14-2021 2:38:56 PM

View Replies

Latest Reply

Joseph_B
New Contributor III

06-14-2021 2:44:00 PM

1 kudos

You can find a lot more info on this at this MLflow product page, including a comparison table at the bottom. I'd summarize that comparison as: Databricks provides three key things in its managed MLflow service.Security: MLflow experiments, models, ...

1 kudos

06-14-2021 2:44:00 PM

by Anonymous • Not applicable

06-14-2021 7:53:09 AM

700 Views
0 replies
0 kudos

Feature Discovery

How would one discover features here and also know how to make sense of these features?Ideally, we can trace the usage of features in code as well.

Machine Learning

Reply

700 Views
0 replies
0 kudos

06-14-2021 7:53:09 AM

Databricks

Forum Posts

Resolved! Is there documentation about the feature store API and how it's architected under the hood?

Resolved! Do you have information on the scalability and cost of using Databricks feature store?

What are best NLP libraries to use with Spark

Resolved! In Spark MLlib, what is the difference between an estimator and a transformer?

Resolved! How can I embed image to my notebook?

Resolved! Best practice for Image manipulation

Can I access Delta tables outside of Databricks Runtime?

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

python Vs Scala in Spark Daatricks. we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also e...

Why is the "download full results" option disabled (gray)?

we have noticed that more than 1000 jobs are in jobs list in shard . Due to which we are getting 'error_code': 'QUOTA_EXCEEDED', when submitting new jobs using job API

Unable to install sf and rgeos R packages on the cluster

Error importing pip package s3fs

How does Databricks managed MLflow compare with open-source (OSS) MLflow?

Feature Discovery

pdb debugger on databricks

import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstim...

Query ML Endpoint with R and Curl

'error_code': 'INVALID_PARAMETER_VALUE', 'message'...

AutoMl Dataset too large