cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

acsmaggart
by New Contributor III
  • 4927 Views
  • 6 replies
  • 3 kudos

`collect()`ing Large Datasets in R

Background: I'm working on a pilot project to assess the pros and cons of using DataBricks to train models using R. I am using a dataset that occupies about 5.7GB of memory when loaded into a pandas dataframe. The data are stored in a delta table in ...

collecting the data using pyspark collecting the data using R
  • 4927 Views
  • 6 replies
  • 3 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 3 kudos

@acsmaggart Please try using collect_larger() to collect the larger dataset. This should work. Please refer to the following document for more info on the library.https://medium.com/@NotZacDavies/collecting-large-results-with-sparklyr-8256a0370ec6

  • 3 kudos
5 More Replies
Supreme_Auto_Ci
by New Contributor II
  • 2858 Views
  • 4 replies
  • 4 kudos
  • 2858 Views
  • 4 replies
  • 4 kudos
Latest Reply
rahulroy
New Contributor II
  • 4 kudos

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. It encompasses the entire data lifecycle, from data acquisition to data exploration, modeling, and...

  • 4 kudos
3 More Replies
Priyag1
by Honored Contributor II
  • 1154 Views
  • 1 replies
  • 9 kudos

***Understanding Databricks Machine Learning Workspace - 1***Databricks Machine Learning helps you simplify and standardize your ML development proce...

***Understanding Databricks Machine Learning Workspace - 1***Databricks Machine Learning helps you simplify and standardize your ML development processes. It is helpful to :Train models either manually or with AutoML.Track training parameters and mo...

  • 1154 Views
  • 1 replies
  • 9 kudos
Latest Reply
samhita
New Contributor III
  • 9 kudos

good initiative

  • 9 kudos
Howard_w
by New Contributor
  • 2820 Views
  • 2 replies
  • 1 kudos

Resolved! Study material ML associate certification

Hi, is there an officially recommended book for the machine learning associate/professional certification? Or any sort of study guide or even third party course? I really struggle to find some study material for this activity.

  • 2820 Views
  • 2 replies
  • 1 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 1 kudos

hello, to get an overview you may find out ML certification course from data bricks academy and refer the related concepts

  • 1 kudos
1 More Replies
Koliya
by New Contributor II
  • 18033 Views
  • 4 replies
  • 7 kudos

The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.

I am running a hugging face model on a GPU cluster (g4dn.xlarge, 16GB Memory, 4 cores). I run the same model in four different notebooks with different data sources. I created a workflow to run one model after the other. These notebooks run fine indi...

  • 18033 Views
  • 4 replies
  • 7 kudos
Latest Reply
fkemeth
New Contributor II
  • 7 kudos

You might accumulate gradients when running your Huggingface model, which typically leads to out-of-memory errors after some iterations. If you use it for inference only, dowith torch.no_grad(): # The code where you apply the model

  • 7 kudos
3 More Replies
anvil
by New Contributor II
  • 899 Views
  • 1 replies
  • 0 kudos

How far does model size and lag impact distributed inference ?

Hello !I was wondering how impactful a model's size of inference lag was in a distributed manner.With tools like Pandas Iterator UDFs or mlflow.pyfunc.spark_udf() we can make it so models are loaded only once per worker, so I would tend to say that m...

  • 899 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 0 kudos

Your assumption that minimizing inference lag is more important than minimizing the size of the model in a distributed setting is generally correct.In a distributed environment, models are typically loaded once per worker, as you mentioned, which mea...

  • 0 kudos
anvil
by New Contributor II
  • 2539 Views
  • 3 replies
  • 4 kudos

Are UDFs necessary for applying models from ML libraries at scale ?

Hello,I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method. Spark MLlib models don't need this k...

  • 2539 Views
  • 3 replies
  • 4 kudos
Latest Reply
Manoj12421
Valued Contributor II
  • 4 kudos

MlLib is in the maintenance model and udf is not used by creating model in most cases

  • 4 kudos
2 More Replies
isaac_gritz
by Databricks Employee
  • 1734 Views
  • 1 replies
  • 4 kudos

Databricks MLOps Best Practices

Where to find the best practices on MLOps on DatabricksWe recommend checking out the Big Book of MLOps for detailed guidance on MLOps best practices on Databricks including reference architectures.For a deep dive on the Databricks Feature store, we r...

  • 1734 Views
  • 1 replies
  • 4 kudos
Latest Reply
sher
Valued Contributor II
  • 4 kudos

you can check here https://docs.databricks.com/machine-learning/mlops/mlops-workflow.html

  • 4 kudos
boyelana
by Contributor III
  • 3778 Views
  • 7 replies
  • 13 kudos

Resolved! Getting started with Databricks Machine Learning

hello all,I am fairly new to Databricks technologies and I have taken the Lakehouse Fundamentals course but I am interested in Machine Learning technologies. I will appreciate any help with materials and curated free study paths and packs that can he...

  • 3778 Views
  • 7 replies
  • 13 kudos
Latest Reply
Anonymous
Not applicable
  • 13 kudos

https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf is a free book and has some machine learning examples. The way I learned was mostly from the docs, which are good and have good coding examples.

  • 13 kudos
6 More Replies
User16752245767
by Contributor
  • 1781 Views
  • 3 replies
  • 10 kudos

I am Avi, a Solutions Architect at Databricks. We have built an application to demonstrate how AI-capabilities could be easily integrated to deliver n...

I am Avi, a Solutions Architect at Databricks. We have built an application to demonstrate how AI-capabilities could be easily integrated to deliver novel user experiences. The application allows users to submit images and text, and uses these inputs...

  • 1781 Views
  • 3 replies
  • 10 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 10 kudos

Hi @Avinash Sooriyarachchi​ Thanks for sharing it.

  • 10 kudos
2 More Replies
User16752245767
by Contributor
  • 919 Views
  • 0 replies
  • 5 kudos

youtu.be

I'm Avi, a Solutions Architect at Databricks working at the intersection of Data Engineering and Machine Learning.Streaming data processing has moved from niche to mainstream, and deploying machine learning models in such data streams opens up a mult...

  • 919 Views
  • 0 replies
  • 5 kudos
THIAM_HUATTAN
by Valued Contributor
  • 2371 Views
  • 7 replies
  • 6 kudos

Why this Databricks ML code gets stuck?

I could not paste the code here because of the some word not allowed, so I have to paste it elsewhere.Below is OK:https://justpaste.it/8xcr9But below gets stuck:https://justpaste.it/8nydtand it keeps looping and running...

  • 2371 Views
  • 7 replies
  • 6 kudos
Latest Reply
Vidula
Honored Contributor
  • 6 kudos

Hey @THIAM HUAT TAN​ Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 6 kudos
6 More Replies
Slalom_Tobias
by New Contributor III
  • 2764 Views
  • 4 replies
  • 1 kudos

Resolved! ML Practioner | ml 09 - automl notebook | error on importing databricks.automl

executing the following code...from databricks import automlsummary = automl.regress(train_df, target_col="price", primary_metric="rmse", timeout_minutes=5, max_trials=10)generates the error...ImportError: cannot import name 'automl' from 'databricks...

  • 2764 Views
  • 4 replies
  • 1 kudos
Latest Reply
Krueger156
New Contributor II
  • 1 kudos

I'm happy to see a particularly subject.

  • 1 kudos
3 More Replies
vivoedoardo
by New Contributor II
  • 2133 Views
  • 3 replies
  • 1 kudos

How to track features used and filters in MLFlow?

Hello everyone,We are experimenting with several approaches in a Machine Learning project ( binary classification), and we would like to keep track of those using MLFlow. We are using the feature store to build, store, and retrieve the features, and ...

  • 2133 Views
  • 3 replies
  • 1 kudos
Latest Reply
NathanielN
New Contributor II
  • 1 kudos

 Thanks for the information, I will try to figure it out for more. Keep sharing such informative post keep suggesting such post.

  • 1 kudos
2 More Replies
Vik1
by New Contributor II
  • 3722 Views
  • 4 replies
  • 2 kudos

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

My setup:Worker type: Standard_D32d_v4, 128 GB Memory, 32 Cores, Min Workers: 2, Max Workers: 8Driver type: Standard_D32ds_v4, 128 GB Memory, 32 CoresDatabricks Runtime Version: 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)I ran a snowflake quer...

  • 3722 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Vivek Ranjan​ Checking in. If Joseph's answer helped, would you let us know and mark the answer as best?  It would be really helpful for the other members to find the solution more quickly.Thanks!

  • 2 kudos
3 More Replies
Labels