Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
Background: I'm working on a pilot project to assess the pros and cons of using DataBricks to train models using R. I am using a dataset that occupies about 5.7GB of memory when loaded into a pandas dataframe. The data are stored in a delta table in ...
@acsmaggart Please try using collect_larger() to collect the larger dataset. This should work. Please refer to the following document for more info on the library.https://medium.com/@NotZacDavies/collecting-large-results-with-sparklyr-8256a0370ec6
Today, many R packages are pre-installed on the standard clusters on Databricks. Libraries like "tidyverse", "ggplot2", etc are there. Also the great library "readxl" to load Excel files. But unfortunately, its counterpart "writexl" is not pre-instal...
Hi everybody,I have a scenario where we have multiple teams working with Python and R, and this teams uses a lot of different libraries. Because of this dozen of libraries, the cluster start took much time. Then I created a Docker image, where I can ...
Hi @Fabio Simoes Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...
Hi!If I need to use many workers to distributes regular pandas, I would use a pandas_UDF. (having regular python crunching a slice of my data, on each node, and combining all results back to the driver node)Is there something equivalent for R?Thanks,
I’ll be asking my rep about the hosted RShiny server in private preview— our team didn’t know about that so we’ve struggled through putting our shiny app (developed on Databricks using RStudio, that part was fantastic) into a container and hosting it...
Got following errorjava.lang.RuntimeException: Installation failed with message:Error installing R package: Could not install package with error: installation of package ‘rgdal’ had non-zero exit status
Full error log available at /databricks/drive...
We can use the below init script to install the packages in the cluster:%scala
dbutils.fs.put("dbfs:/databricks/init_scripts/rlib.sh", """
#!/bin/bash
sudo apt-get install -y libudunits2-dev
sudo add-apt-repository ppa:ubuntugis/ubuntugis-uns...