Topics with Label: Machine Learning

by acsmaggart • New Contributor III

10-31-2022 11:37:06 AM

5914 Views
6 replies
3 kudos

`collect()`ing Large Datasets in R

Background: I'm working on a pilot project to assess the pros and cons of using DataBricks to train models using R. I am using a dataset that occupies about 5.7GB of memory when loaded into a pandas dataframe. The data are stored in a delta table in ...

Machine Learning

Reply

5914 Views
6 replies
3 kudos

10-31-2022 11:37:06 AM

View Replies

Latest Reply

Annapurna_Hiriy
Databricks Employee

02-13-2024 3:17:37 AM

3 kudos

@acsmaggart Please try using collect_larger() to collect the larger dataset. This should work. Please refer to the following document for more info on the library.https://medium.com/@NotZacDavies/collecting-large-results-with-sparklyr-8256a0370ec6

3 kudos

02-13-2024 3:17:37 AM

5 More Replies

by Supreme_Auto_Ci • New Contributor II

04-07-2022 1:19:50 AM

3178 Views
4 replies
4 kudos

Resolved! describe data science and machine learning?

Machine Learning

Reply

3178 Views
4 replies
4 kudos

04-07-2022 1:19:50 AM

View Replies

Latest Reply

rahulroy
New Contributor II

11-23-2023 6:58:03 AM

4 kudos

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. It encompasses the entire data lifecycle, from data acquisition to data exploration, modeling, and...

4 kudos

11-23-2023 6:58:03 AM

3 More Replies

by Priyag1 • Honored Contributor II

05-05-2023 11:23:19 PM

1353 Views
1 replies
9 kudos

Understanding Databricks Machine Learning Workspace - 1Databricks Machine Learning helps you simplify and standardize your ML development proce...

***Understanding Databricks Machine Learning Workspace - 1***Databricks Machine Learning helps you simplify and standardize your ML development processes. It is helpful to :Train models either manually or with AutoML.Track training parameters and mo...

Machine Learning

Reply

1353 Views
1 replies
9 kudos

05-05-2023 11:23:19 PM

View Replies

Latest Reply

samhita
New Contributor III

05-06-2023 12:06:52 AM

9 kudos

good initiative

9 kudos

05-06-2023 12:06:52 AM

by Howard_w • New Contributor

04-27-2023 3:55:14 PM

3816 Views
2 replies
1 kudos

Resolved! Study material ML associate certification

Hi, is there an officially recommended book for the machine learning associate/professional certification? Or any sort of study guide or even third party course? I really struggle to find some study material for this activity.

Machine Learning

Reply

3816 Views
2 replies
1 kudos

04-27-2023 3:55:14 PM

View Replies

Latest Reply

Priyag1
Honored Contributor II

05-04-2023 3:47:09 AM

1 kudos

hello, to get an overview you may find out ML certification course from data bricks academy and refer the related concepts

1 kudos

05-04-2023 3:47:09 AM

1 More Replies

by Koliya • New Contributor II

12-21-2022 6:47:38 PM

22355 Views
4 replies
7 kudos

The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.

I am running a hugging face model on a GPU cluster (g4dn.xlarge, 16GB Memory, 4 cores). I run the same model in four different notebooks with different data sources. I created a workflow to run one model after the other. These notebooks run fine indi...

Machine Learning

Reply

22355 Views
4 replies
7 kudos

12-21-2022 6:47:38 PM

View Replies

Latest Reply

fkemeth
New Contributor II

03-27-2023 2:12:49 AM

7 kudos

You might accumulate gradients when running your Huggingface model, which typically leads to out-of-memory errors after some iterations. If you use it for inference only, dowith torch.no_grad(): # The code where you apply the model

7 kudos

03-27-2023 2:12:49 AM

3 More Replies

by anvil • New Contributor II

01-24-2023 1:24:02 PM

1048 Views
1 replies
0 kudos

How far does model size and lag impact distributed inference ?

Hello !I was wondering how impactful a model's size of inference lag was in a distributed manner.With tools like Pandas Iterator UDFs or mlflow.pyfunc.spark_udf() we can make it so models are loaded only once per worker, so I would tend to say that m...

Machine Learning

Reply

1048 Views
1 replies
0 kudos

01-24-2023 1:24:02 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

02-28-2023 5:16:48 AM

0 kudos

Your assumption that minimizing inference lag is more important than minimizing the size of the model in a distributed setting is generally correct.In a distributed environment, models are typically loaded once per worker, as you mentioned, which mea...

0 kudos

02-28-2023 5:16:48 AM

by anvil • New Contributor II

01-24-2023 1:14:46 PM

3179 Views
3 replies
4 kudos

Are UDFs necessary for applying models from ML libraries at scale ?

Hello,I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method. Spark MLlib models don't need this k...

Machine Learning

Reply

3179 Views
3 replies
4 kudos

01-24-2023 1:14:46 PM

View Replies

Latest Reply

Manoj12421
Valued Contributor II

02-08-2023 11:17:49 AM

4 kudos

MlLib is in the maintenance model and udf is not used by creating model in most cases

4 kudos

02-08-2023 11:17:49 AM

2 More Replies

by isaac_gritz • Databricks Employee

08-23-2022 1:03:27 AM

2236 Views
1 replies
4 kudos

Databricks MLOps Best Practices

Where to find the best practices on MLOps on DatabricksWe recommend checking out the Big Book of MLOps for detailed guidance on MLOps best practices on Databricks including reference architectures.For a deep dive on the Databricks Feature store, we r...

Machine Learning

Reply

2236 Views
1 replies
4 kudos

08-23-2022 1:03:27 AM

View Replies

Latest Reply

sher
Valued Contributor II

12-16-2022 11:29:58 PM

4 kudos

you can check here https://docs.databricks.com/machine-learning/mlops/mlops-workflow.html

4 kudos

12-16-2022 11:29:58 PM

by boyelana • Contributor III

12-09-2022 9:30:21 AM

4492 Views
7 replies
13 kudos

Resolved! Getting started with Databricks Machine Learning

hello all,I am fairly new to Databricks technologies and I have taken the Lakehouse Fundamentals course but I am interested in Machine Learning technologies. I will appreciate any help with materials and curated free study paths and packs that can he...

Machine Learning

Reply

4492 Views
7 replies
13 kudos

12-09-2022 9:30:21 AM

View Replies

Latest Reply

Anonymous
Not applicable

12-09-2022 12:13:22 PM

13 kudos

https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf is a free book and has some machine learning examples. The way I learned was mostly from the docs, which are good and have good coding examples.

13 kudos

12-09-2022 12:13:22 PM

6 More Replies

by User16752245767 • Contributor

11-30-2022 9:35:39 AM

2117 Views
3 replies
10 kudos

I am Avi, a Solutions Architect at Databricks. We have built an application to demonstrate how AI-capabilities could be easily integrated to deliver n...

I am Avi, a Solutions Architect at Databricks. We have built an application to demonstrate how AI-capabilities could be easily integrated to deliver novel user experiences. The application allows users to submit images and text, and uses these inputs...

Machine Learning

Reply

2117 Views
3 replies
10 kudos

11-30-2022 9:35:39 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-06-2022 4:33:53 AM

10 kudos

Hi @Avinash Sooriyarachchi Thanks for sharing it.

10 kudos

12-06-2022 4:33:53 AM

2 More Replies

by User16752245767 • Contributor

12-05-2022 6:48:33 AM

1211 Views
0 replies
5 kudos

youtu.be

I'm Avi, a Solutions Architect at Databricks working at the intersection of Data Engineering and Machine Learning.Streaming data processing has moved from niche to mainstream, and deploying machine learning models in such data streams opens up a mult...

Machine Learning

Reply

1211 Views
0 replies
5 kudos

12-05-2022 6:48:33 AM

by THIAM_HUATTAN • Valued Contributor

06-26-2022 11:26:50 PM

2967 Views
7 replies
6 kudos

Why this Databricks ML code gets stuck?

I could not paste the code here because of the some word not allowed, so I have to paste it elsewhere.Below is OK:https://justpaste.it/8xcr9But below gets stuck:https://justpaste.it/8nydtand it keeps looping and running...

Machine Learning

Reply

2967 Views
7 replies
6 kudos

06-26-2022 11:26:50 PM

View Replies

Latest Reply

Vidula
Honored Contributor

08-27-2022 12:28:12 AM

6 kudos

Hey @THIAM HUAT TAN Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

6 kudos

08-27-2022 12:28:12 AM

6 More Replies

by Slalom_Tobias • New Contributor III

08-01-2022 9:30:39 AM

3276 Views
4 replies
1 kudos

Resolved! ML Practioner | ml 09 - automl notebook | error on importing databricks.automl

executing the following code...from databricks import automlsummary = automl.regress(train_df, target_col="price", primary_metric="rmse", timeout_minutes=5, max_trials=10)generates the error...ImportError: cannot import name 'automl' from 'databricks...

Machine Learning

Reply

3276 Views
4 replies
1 kudos

08-01-2022 9:30:39 AM

View Replies

Latest Reply

Krueger156
New Contributor II

08-05-2022 2:03:22 AM

1 kudos

I'm happy to see a particularly subject.

1 kudos

08-05-2022 2:03:22 AM

3 More Replies

by vivoedoardo • New Contributor II

05-17-2022 12:37:14 AM

2622 Views
3 replies
1 kudos

How to track features used and filters in MLFlow?

Hello everyone,We are experimenting with several approaches in a Machine Learning project ( binary classification), and we would like to keep track of those using MLFlow. We are using the feature store to build, store, and retrieve the features, and ...

Machine Learning

Reply

2622 Views
3 replies
1 kudos

05-17-2022 12:37:14 AM

View Replies

Latest Reply

NathanielN
New Contributor II

07-23-2022 12:12:31 AM

1 kudos

Thanks for the information, I will try to figure it out for more. Keep sharing such informative post keep suggesting such post.

1 kudos

07-23-2022 12:12:31 AM

2 More Replies

by Vik1 • New Contributor II

01-21-2022 9:16:42 AM

4223 Views
4 replies
2 kudos

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

My setup:Worker type: Standard_D32d_v4, 128 GB Memory, 32 Cores, Min Workers: 2, Max Workers: 8Driver type: Standard_D32ds_v4, 128 GB Memory, 32 CoresDatabricks Runtime Version: 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)I ran a snowflake quer...

Machine Learning

Reply

4223 Views
4 replies
2 kudos

01-21-2022 9:16:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:23:05 AM

2 kudos

Hey there @Vivek Ranjan Checking in. If Joseph's answer helped, would you let us know and mark the answer as best? It would be really helpful for the other members to find the solution more quickly.Thanks!

2 kudos

04-22-2022 7:23:05 AM

3 More Replies

Databricks Community

Forum Posts

`collect()`ing Large Datasets in R

Resolved! describe data science and machine learning?

Understanding Databricks Machine Learning Workspace - 1Databricks Machine Learning helps you simplify and standardize your ML development proce...

Resolved! Study material ML associate certification

The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.

How far does model size and lag impact distributed inference ?

Are UDFs necessary for applying models from ML libraries at scale ?

Databricks MLOps Best Practices

Resolved! Getting started with Databricks Machine Learning

I am Avi, a Solutions Architect at Databricks. We have built an application to demonstrate how AI-capabilities could be easily integrated to deliver n...

youtu.be

Why this Databricks ML code gets stuck?

Resolved! ML Practioner | ml 09 - automl notebook | error on importing databricks.automl

How to track features used and filters in MLFlow?

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.