- 413 Views
- 0 replies
- 0 kudos
AutoML presumably tries a few different algorithms while hyperparameter searching. What model types are considered?
At the moment, it's really just xgboost, and sklearn implemenations like random forests, logistic regression, and linear regression as applicable. More possibilities are coming.
I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...
It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...
I don't think we have a lot of internal docs, just high-level explanations like https://databricks.com/blog/2021/05/27/databricks-announces-the-first-feature-store-integrated-with-delta-lake-and-mlflow.htmlHowever I don't think there's much to it. Th...
The feature store has both online / offline components. The offline feature store is used for feature discovery, model training, and batch inference and is backed by Delta tables. You could read/write to offline store from Databricks clusters that...
Best NLP APIs to use with Spark which gives better performance
By far the most popular and comprehensive library, to my knowledge, for Spark-native distributed NLP, is spark-nlp from John Snow Labs. https://nlp.johnsnowlabs.com/ It is open source (but with commercial support options) and has a whole lot of funct...
These terms are borrowed from scikit-learn, and the idea is the same. A transformer is just a component of a pipeline that transforms the data in some way. An estimator is also a transfomer, but one that additionally needs to be 'fit' on data before ...
If the image is a result of, for example, a plotting library's output, it should just render as-is.If it's not, then one simple approach is to write a markdown (%md) cell and include a link to the image: ![](url of the image)Of course this requires t...
Can you please recommend suggestions for image manipulation once you read the data as an image ? Any specific library to use?
Spark has a built-in 'image' data source which will read a directory of images files as a DataFrame: spark.read.format("image").load(...). The resulting DataFrame has the pixel data, dimensions, channels, etc.You can also read image files 'manually' ...
Is it possible to write same table with Databricks and from OSS too, Also what if I want to read the data from Map redeuce or hive
Yes. The Delta client is open source, and lets you read/write Delta tables if you add it to your external application. See https://docs.delta.io/latest/index.html
Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...
python Vs Scala in Spark Daatricks.we are seeing Datbricks platform is more used with Python language than scala language , and databricks is also enhancing its python API more than the scala API, so is Scala will be past for Spark.Thanks
I'm having the same problem but it's with the results of a query. Are there any fixes for this instance?
We can use the below api to list out the jobs and then use the delete job api:https://docs.databricks.com/dev-tools/api/latest/jobs.html#listListEndpoint HTTP Method2.0/jobs/list GETOnce we list out the jobs, then we can use below API to delete them:...
Got following errorjava.lang.RuntimeException: Installation failed with message:Error installing R package: Could not install package with error: installation of package ‘rgdal’ had non-zero exit status Full error log available at /databricks/drive...
We can use the below init script to install the packages in the cluster:%scala dbutils.fs.put("dbfs:/databricks/init_scripts/rlib.sh", """ #!/bin/bash sudo apt-get install -y libudunits2-dev sudo add-apt-repository ppa:ubuntugis/ubuntugis-uns...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New GroupUser | Count |
---|---|
89 | |
39 | |
36 | |
25 | |
25 |