cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

acsmaggart
by New Contributor III
  • 2806 Views
  • 6 replies
  • 2 kudos

`collect()`ing Large Datasets in R

Background: I'm working on a pilot project to assess the pros and cons of using DataBricks to train models using R. I am using a dataset that occupies about 5.7GB of memory when loaded into a pandas dataframe. The data are stored in a delta table in ...

collecting the data using pyspark collecting the data using R
  • 2806 Views
  • 6 replies
  • 2 kudos
Latest Reply
Annapurna_Hiriy
New Contributor III
  • 2 kudos

@acsmaggart Please try using collect_larger() to collect the larger dataset. This should work. Please refer to the following document for more info on the library.https://medium.com/@NotZacDavies/collecting-large-results-with-sparklyr-8256a0370ec6

  • 2 kudos
5 More Replies
Supreme_Auto_Ci
by New Contributor II
  • 1793 Views
  • 5 replies
  • 4 kudos
  • 1793 Views
  • 5 replies
  • 4 kudos
Latest Reply
rahulroy
New Contributor II
  • 4 kudos

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. It encompasses the entire data lifecycle, from data acquisition to data exploration, modeling, and...

  • 4 kudos
4 More Replies
Kaniz
by Community Manager
  • 519 Views
  • 0 replies
  • 1 kudos

Transformers have revolutionized the field of Natural Language Processing (NLP) by achieving state-of-the-art results on various language tasks. With ...

Transformers have revolutionized the field of Natural Language Processing (NLP) by achieving state-of-the-art results on various language tasks. With the introduction of models like BERT (Bidirectional Encoder Representations from Transformers), GPT ...

  • 519 Views
  • 0 replies
  • 1 kudos
Priyag1
by Honored Contributor II
  • 623 Views
  • 1 replies
  • 9 kudos

***Understanding Databricks Machine Learning Workspace - 1***Databricks Machine Learning helps you simplify and standardize your ML development proce...

***Understanding Databricks Machine Learning Workspace - 1***Databricks Machine Learning helps you simplify and standardize your ML development processes. It is helpful to :Train models either manually or with AutoML.Track training parameters and mo...

  • 623 Views
  • 1 replies
  • 9 kudos
Latest Reply
samhita
New Contributor III
  • 9 kudos

good initiative

  • 9 kudos
Howard_w
by New Contributor
  • 1717 Views
  • 2 replies
  • 1 kudos

Resolved! Study material ML associate certification

Hi, is there an officially recommended book for the machine learning associate/professional certification? Or any sort of study guide or even third party course? I really struggle to find some study material for this activity.

  • 1717 Views
  • 2 replies
  • 1 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 1 kudos

hello, to get an overview you may find out ML certification course from data bricks academy and refer the related concepts

  • 1 kudos
1 More Replies
Koliya
by New Contributor II
  • 11315 Views
  • 5 replies
  • 7 kudos

The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.

I am running a hugging face model on a GPU cluster (g4dn.xlarge, 16GB Memory, 4 cores). I run the same model in four different notebooks with different data sources. I created a workflow to run one model after the other. These notebooks run fine indi...

  • 11315 Views
  • 5 replies
  • 7 kudos
Latest Reply
fkemeth
New Contributor II
  • 7 kudos

You might accumulate gradients when running your Huggingface model, which typically leads to out-of-memory errors after some iterations. If you use it for inference only, dowith torch.no_grad(): # The code where you apply the model

  • 7 kudos
4 More Replies
anvil
by New Contributor II
  • 511 Views
  • 1 replies
  • 0 kudos

How far does model size and lag impact distributed inference ?

Hello !I was wondering how impactful a model's size of inference lag was in a distributed manner.With tools like Pandas Iterator UDFs or mlflow.pyfunc.spark_udf() we can make it so models are loaded only once per worker, so I would tend to say that m...

  • 511 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 0 kudos

Your assumption that minimizing inference lag is more important than minimizing the size of the model in a distributed setting is generally correct.In a distributed environment, models are typically loaded once per worker, as you mentioned, which mea...

  • 0 kudos
anvil
by New Contributor II
  • 1566 Views
  • 3 replies
  • 4 kudos

Are UDFs necessary for applying models from ML libraries at scale ?

Hello,I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method. Spark MLlib models don't need this k...

  • 1566 Views
  • 3 replies
  • 4 kudos
Latest Reply
Manoj12421
Valued Contributor II
  • 4 kudos

MlLib is in the maintenance model and udf is not used by creating model in most cases

  • 4 kudos
2 More Replies
isaac_gritz
by Valued Contributor II
  • 912 Views
  • 1 replies
  • 4 kudos

Databricks MLOps Best Practices

Where to find the best practices on MLOps on DatabricksWe recommend checking out the Big Book of MLOps for detailed guidance on MLOps best practices on Databricks including reference architectures.For a deep dive on the Databricks Feature store, we r...

  • 912 Views
  • 1 replies
  • 4 kudos
Latest Reply
sher
Valued Contributor II
  • 4 kudos

you can check here https://docs.databricks.com/machine-learning/mlops/mlops-workflow.html

  • 4 kudos
boyelana
by Contributor III
  • 1932 Views
  • 7 replies
  • 13 kudos

Resolved! Getting started with Databricks Machine Learning

hello all,I am fairly new to Databricks technologies and I have taken the Lakehouse Fundamentals course but I am interested in Machine Learning technologies. I will appreciate any help with materials and curated free study paths and packs that can he...

  • 1932 Views
  • 7 replies
  • 13 kudos
Latest Reply
Anonymous
Not applicable
  • 13 kudos

https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf is a free book and has some machine learning examples. The way I learned was mostly from the docs, which are good and have good coding examples.

  • 13 kudos
6 More Replies
User16752245767
by Contributor
  • 1030 Views
  • 3 replies
  • 10 kudos

I am Avi, a Solutions Architect at Databricks. We have built an application to demonstrate how AI-capabilities could be easily integrated to deliver n...

I am Avi, a Solutions Architect at Databricks. We have built an application to demonstrate how AI-capabilities could be easily integrated to deliver novel user experiences. The application allows users to submit images and text, and uses these inputs...

  • 1030 Views
  • 3 replies
  • 10 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 10 kudos

Hi @Avinash Sooriyarachchi​ Thanks for sharing it.

  • 10 kudos
2 More Replies
User16752245767
by Contributor
  • 391 Views
  • 0 replies
  • 5 kudos

youtu.be

I'm Avi, a Solutions Architect at Databricks working at the intersection of Data Engineering and Machine Learning.Streaming data processing has moved from niche to mainstream, and deploying machine learning models in such data streams opens up a mult...

  • 391 Views
  • 0 replies
  • 5 kudos
THIAM_HUATTAN
by Valued Contributor
  • 1299 Views
  • 7 replies
  • 6 kudos

Why this Databricks ML code gets stuck?

I could not paste the code here because of the some word not allowed, so I have to paste it elsewhere.Below is OK:https://justpaste.it/8xcr9But below gets stuck:https://justpaste.it/8nydtand it keeps looping and running...

  • 1299 Views
  • 7 replies
  • 6 kudos
Latest Reply
Vidula
Honored Contributor
  • 6 kudos

Hey @THIAM HUAT TAN​ Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 6 kudos
6 More Replies
Slalom_Tobias
by New Contributor III
  • 1442 Views
  • 4 replies
  • 1 kudos

Resolved! ML Practioner | ml 09 - automl notebook | error on importing databricks.automl

executing the following code...from databricks import automlsummary = automl.regress(train_df, target_col="price", primary_metric="rmse", timeout_minutes=5, max_trials=10)generates the error...ImportError: cannot import name 'automl' from 'databricks...

  • 1442 Views
  • 4 replies
  • 1 kudos
Latest Reply
Krueger156
New Contributor II
  • 1 kudos

I'm happy to see a particularly subject.

  • 1 kudos
3 More Replies
vivoedoardo
by New Contributor II
  • 1301 Views
  • 3 replies
  • 1 kudos

How to track features used and filters in MLFlow?

Hello everyone,We are experimenting with several approaches in a Machine Learning project ( binary classification), and we would like to keep track of those using MLFlow. We are using the feature store to build, store, and retrieve the features, and ...

  • 1301 Views
  • 3 replies
  • 1 kudos
Latest Reply
NathanielN
New Contributor II
  • 1 kudos

 Thanks for the information, I will try to figure it out for more. Keep sharing such informative post keep suggesting such post.

  • 1 kudos
2 More Replies
Labels