cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Suheb
by Contributor
  • 38 Views
  • 1 replies
  • 0 kudos

Why does my MLflow model training job fail on Databricks with an out‑of‑memory error for large datas

I am trying to train a machine learning model using MLflow on Databricks. When my dataset is very large, the training stops and gives an ‘out-of-memory’ error. Why does this happen and how can I fix it?

  • 38 Views
  • 1 replies
  • 0 kudos
Latest Reply
mukul1409
New Contributor
  • 0 kudos

Hi @Suheb This happens because during training the entire dataset or large intermediate objects are being loaded into the driver or executor memory, which can exceed the available memory on the cluster, especially when using large DataFrames, collect...

  • 0 kudos
jitenjha11
by New Contributor II
  • 69 Views
  • 1 replies
  • 0 kudos

Getting error when running databricks deploy bundle command

HI all,I am trying to implement MLOps project using https://github.com/databricks/mlops-stacks repo.I have created azure databricks with Premium (+ Role-based access controls) (Click to change) and following bundle creation and deploy using uRL: http...

  • 69 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I think this may be a duplicate of another question, but posting the same answer here for transparency: Hi, first things to check is that you have the correct permissions on the user or service principal you're running the job with, the user need...

  • 0 kudos
jitenjha11
by New Contributor II
  • 95 Views
  • 1 replies
  • 0 kudos

Getting error when running databricks deploy bundle command

HI all,I am trying to implement MLOps project using https://github.com/databricks/mlops-stacks repo.I have created azure databricks with Premium (+ Role-based access controls) (Click to change) and following bundle creation and deploy using uRL: http...

  • 95 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, first things to check is that you have the correct permissions on the user or service principal you're running the job with, the user needs to have workspace access and cluster creation access toggled on. Next you need to check you have a metast...

  • 0 kudos
Suheb
by Contributor
  • 227 Views
  • 3 replies
  • 1 kudos

Resolved! What are the practical differences between bagging and boosting algorithms?

How are bagging and boosting different when you use them in real machine-learning projects?

  • 227 Views
  • 3 replies
  • 1 kudos
Latest Reply
jameswood32
Contributor
  • 1 kudos

The practical differences between bagging and boosting mostly come down to how they build models and how they handle errors:Model Training Approach:Bagging (Bootstrap Aggregating): Builds multiple models in parallel using random subsets of the data. ...

  • 1 kudos
2 More Replies
Suheb
by Contributor
  • 221 Views
  • 4 replies
  • 2 kudos

Resolved! How do I improve the performance of my Random Forest model on Databricks?

How can I make these people smarter or faster so the final answer is better?

  • 221 Views
  • 4 replies
  • 2 kudos
Latest Reply
jameswood32
Contributor
  • 2 kudos

Improving the performance of a Random Forest model on Databricks is usually about data quality, feature engineering, and hyperparameter tuning. Some tips:Feature Engineering:Create meaningful features and remove irrelevant ones.Encode categorical var...

  • 2 kudos
3 More Replies
Suheb
by Contributor
  • 108 Views
  • 1 replies
  • 1 kudos

How do I implement and train a custom PyTorch model on Databricks using distributed training?

How can I build my own PyTorch machine-learning model and train it faster on Databricks by using multiple machines/GPUs instead of just one?

  • 108 Views
  • 1 replies
  • 1 kudos
Latest Reply
KaushalVachhani
Databricks Employee
  • 1 kudos

@Suheb , You may look at the torch distributor. It provides multiple distributed training options, including single-node with multiple-GPU training and multi-node training. Below are the references for you. https://docs.databricks.com/aws/en/machine-...

  • 1 kudos
RodrigoE
by New Contributor II
  • 135 Views
  • 2 replies
  • 0 kudos

Vector search index very slow

Hello,I have created a vector search index for a delta table with 1,400 rows. Using this vector index to find matching records on a table with 52M records with the query below ran for 20hrs and failed with: 'HTTP request failed with status: {"error_c...

Machine Learning
vector search index
  • 135 Views
  • 2 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

Hi @RodrigoE ,Your LATERAL subquery calls the Vector Search function once for every row of the 52M-row table, which results in tens of millions of remote calls to the Vector Search endpoint—this is not a nice pattern and will be extremely slow leadin...

  • 0 kudos
1 More Replies
Suheb
by Contributor
  • 177 Views
  • 1 replies
  • 1 kudos

Resolved! What are recommended approaches for feature engineering in Databricks ML projects?

When building machine-learning models in Databricks, how should I prepare and transform my data so the model can learn better?

  • 177 Views
  • 1 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, this is quite a general question, I've put together a list of bullets that will help you in the right direction:   Focus on organized storage, flexible transformations, and making features easy to reuse and discover. Use Unity Catalog for govern...

  • 1 kudos
RodrigoE
by New Contributor II
  • 347 Views
  • 4 replies
  • 2 kudos

Resolved! Vector search index initialization very slow

Hello,I am creating a vector search index and selected Compute embeddings for a delta table with 19M records.  Delta table has only two  columns: ID (selected as index) and Name (selected for embedding). Embedding model is databricks-gte-large-en.Ind...

Machine Learning
index
search
vector
vector index
Vector Search
  • 347 Views
  • 4 replies
  • 2 kudos
Latest Reply
RodrigoE
New Contributor II
  • 2 kudos

Your recommendation addressed the issue.  Followed the instructions and index initialization took only 8 hours - thank you! 

  • 2 kudos
3 More Replies
Suheb
by Contributor
  • 137 Views
  • 1 replies
  • 1 kudos

Resolved! How do I start with MLflow on Databricks?

I am new to MLflow and Databricks. How can I begin using MLflow inside Databricks to track and manage my machine learning models?

  • 137 Views
  • 1 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

Hi @Suheb , MLFlow is already pre installed in ML runtime. The question is very vague. You can follow the below documentations to get started with MLFlow on databricks. 1) https://www.databricks.com/product/managed-mlflow2) https://docs.databricks.co...

  • 1 kudos
Suheb
by Contributor
  • 154 Views
  • 1 replies
  • 1 kudos

Resolved! How do you organize ML projects in Databricks workspaces?

How do you keep your machine-learning files, notebooks, and code properly organized in Databricks?

  • 154 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @Suheb , I teach a lot of our machine learning training, and over time I’ve talked with many students, customers, and partners about how they approach this. The answers are all over the map, which tells you there’s no single “golden rule” that fi...

  • 1 kudos
mcarreira
by New Contributor III
  • 955 Views
  • 9 replies
  • 1 kudos

Resolved! Genie connection to copilot agent in copilot studio

Hello!I’m trying to add a tool — Azure Databricks Genie — in Microsoft Copilot Studio for my agent, but I’m running into some difficulties. Is it possible to establish this connection using a Pro cluster, or does it only work with a serverless cluste...

  • 955 Views
  • 9 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

I'm afraid I don't have much further suggestions. I'd suggest you raise a ticket with Microsoft on this.

  • 1 kudos
8 More Replies
Suheb
by Contributor
  • 190 Views
  • 1 replies
  • 1 kudos

Resolved! What are the recommended practices for handling skewed datasets in Databricks?

What should you do when your dataset is uneven—some values appear too many times and others appear very few times—while working in Databricks?

  • 190 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Suheb ,Refer to really good guide prepared by Databricks team. When you have a skewed dataset the primary things you can do are following:1. Filter skewed values2. Apply Skew hints3. AQE skew optimization4. SaltingMuch detailed description of abo...

  • 1 kudos
fbs342
by New Contributor II
  • 237 Views
  • 3 replies
  • 0 kudos

Resolved! Migrated model to Unity catalog not seeing referenced serving endpoint

There was a model which was migrated from workspace model registry to unity catalog. At the time of initial creation of that model, dependencies to other databricks serving endpoints were configured using "DatabricksServiceEndpoint" config in mlflow....

  • 237 Views
  • 3 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

Workspace model registry worked with workspace-scoped serving endpoints. UC models and UC serving endpoints use metastore-wide semantics and different lookup rules. The saved path inside the model metadata still points to workspace-level endpoints th...

  • 0 kudos
2 More Replies
srkam
by New Contributor
  • 159 Views
  • 1 replies
  • 0 kudos

UC Model Deployment across data bricks instances

Hello, We have multiple data bricks instances each represents an environment dev,qa,rel,prod etc.. We developed a  model in the dev workspace and registered in the UC model registry using mlflow. Now, we are trying to find a best way to deploy this r...

  • 159 Views
  • 1 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

You can use UC's centralized model registry and MLflow’s copy APIs. If all target workspaces attach to the same Unity Catalog metastore, reference and promote models via their 3‑level UC names; use MLflow’s copy_model_version to “copy” the exact arti...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels