cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jitenjha11
by New Contributor II
  • 34 Views
  • 1 replies
  • 0 kudos

Getting error when running databricks deploy bundle command

HI all,I am trying to implement MLOps project using https://github.com/databricks/mlops-stacks repo.I have created azure databricks with Premium (+ Role-based access controls) (Click to change) and following bundle creation and deploy using uRL: http...

  • 34 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I think this may be a duplicate of another question, but posting the same answer here for transparency: Hi, first things to check is that you have the correct permissions on the user or service principal you're running the job with, the user need...

  • 0 kudos
jitenjha11
by New Contributor II
  • 67 Views
  • 1 replies
  • 0 kudos

Getting error when running databricks deploy bundle command

HI all,I am trying to implement MLOps project using https://github.com/databricks/mlops-stacks repo.I have created azure databricks with Premium (+ Role-based access controls) (Click to change) and following bundle creation and deploy using uRL: http...

  • 67 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, first things to check is that you have the correct permissions on the user or service principal you're running the job with, the user needs to have workspace access and cluster creation access toggled on. Next you need to check you have a metast...

  • 0 kudos
Suheb
by Contributor
  • 212 Views
  • 3 replies
  • 1 kudos

What are the practical differences between bagging and boosting algorithms?

How are bagging and boosting different when you use them in real machine-learning projects?

  • 212 Views
  • 3 replies
  • 1 kudos
Latest Reply
jameswood32
Contributor
  • 1 kudos

The practical differences between bagging and boosting mostly come down to how they build models and how they handle errors:Model Training Approach:Bagging (Bootstrap Aggregating): Builds multiple models in parallel using random subsets of the data. ...

  • 1 kudos
2 More Replies
Suheb
by Contributor
  • 206 Views
  • 4 replies
  • 2 kudos

How do I improve the performance of my Random Forest model on Databricks?

How can I make these people smarter or faster so the final answer is better?

  • 206 Views
  • 4 replies
  • 2 kudos
Latest Reply
jameswood32
Contributor
  • 2 kudos

Improving the performance of a Random Forest model on Databricks is usually about data quality, feature engineering, and hyperparameter tuning. Some tips:Feature Engineering:Create meaningful features and remove irrelevant ones.Encode categorical var...

  • 2 kudos
3 More Replies
Suheb
by Contributor
  • 106 Views
  • 1 replies
  • 1 kudos

How do I implement and train a custom PyTorch model on Databricks using distributed training?

How can I build my own PyTorch machine-learning model and train it faster on Databricks by using multiple machines/GPUs instead of just one?

  • 106 Views
  • 1 replies
  • 1 kudos
Latest Reply
KaushalVachhani
Databricks Employee
  • 1 kudos

@Suheb , You may look at the torch distributor. It provides multiple distributed training options, including single-node with multiple-GPU training and multi-node training. Below are the references for you. https://docs.databricks.com/aws/en/machine-...

  • 1 kudos
RodrigoE
by New Contributor II
  • 125 Views
  • 2 replies
  • 0 kudos

Vector search index very slow

Hello,I have created a vector search index for a delta table with 1,400 rows. Using this vector index to find matching records on a table with 52M records with the query below ran for 20hrs and failed with: 'HTTP request failed with status: {"error_c...

Machine Learning
vector search index
  • 125 Views
  • 2 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

Hi @RodrigoE ,Your LATERAL subquery calls the Vector Search function once for every row of the 52M-row table, which results in tens of millions of remote calls to the Vector Search endpoint—this is not a nice pattern and will be extremely slow leadin...

  • 0 kudos
1 More Replies
Suheb
by Contributor
  • 152 Views
  • 1 replies
  • 1 kudos

Resolved! What are recommended approaches for feature engineering in Databricks ML projects?

When building machine-learning models in Databricks, how should I prepare and transform my data so the model can learn better?

  • 152 Views
  • 1 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, this is quite a general question, I've put together a list of bullets that will help you in the right direction:   Focus on organized storage, flexible transformations, and making features easy to reuse and discover. Use Unity Catalog for govern...

  • 1 kudos
RodrigoE
by New Contributor II
  • 332 Views
  • 4 replies
  • 2 kudos

Resolved! Vector search index initialization very slow

Hello,I am creating a vector search index and selected Compute embeddings for a delta table with 19M records.  Delta table has only two  columns: ID (selected as index) and Name (selected for embedding). Embedding model is databricks-gte-large-en.Ind...

Machine Learning
index
search
vector
vector index
Vector Search
  • 332 Views
  • 4 replies
  • 2 kudos
Latest Reply
RodrigoE
New Contributor II
  • 2 kudos

Your recommendation addressed the issue.  Followed the instructions and index initialization took only 8 hours - thank you! 

  • 2 kudos
3 More Replies
Suheb
by Contributor
  • 128 Views
  • 1 replies
  • 1 kudos

How do I start with MLflow on Databricks?

I am new to MLflow and Databricks. How can I begin using MLflow inside Databricks to track and manage my machine learning models?

  • 128 Views
  • 1 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

Hi @Suheb , MLFlow is already pre installed in ML runtime. The question is very vague. You can follow the below documentations to get started with MLFlow on databricks. 1) https://www.databricks.com/product/managed-mlflow2) https://docs.databricks.co...

  • 1 kudos
Suheb
by Contributor
  • 144 Views
  • 1 replies
  • 1 kudos

How do you organize ML projects in Databricks workspaces?

How do you keep your machine-learning files, notebooks, and code properly organized in Databricks?

  • 144 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @Suheb , I teach a lot of our machine learning training, and over time I’ve talked with many students, customers, and partners about how they approach this. The answers are all over the map, which tells you there’s no single “golden rule” that fi...

  • 1 kudos
mcarreira
by New Contributor III
  • 923 Views
  • 9 replies
  • 1 kudos

Genie connection to copilot agent in copilot studio

Hello!I’m trying to add a tool — Azure Databricks Genie — in Microsoft Copilot Studio for my agent, but I’m running into some difficulties. Is it possible to establish this connection using a Pro cluster, or does it only work with a serverless cluste...

  • 923 Views
  • 9 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

I'm afraid I don't have much further suggestions. I'd suggest you raise a ticket with Microsoft on this.

  • 1 kudos
8 More Replies
Suheb
by Contributor
  • 185 Views
  • 1 replies
  • 1 kudos

Resolved! What are the recommended practices for handling skewed datasets in Databricks?

What should you do when your dataset is uneven—some values appear too many times and others appear very few times—while working in Databricks?

  • 185 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Suheb ,Refer to really good guide prepared by Databricks team. When you have a skewed dataset the primary things you can do are following:1. Filter skewed values2. Apply Skew hints3. AQE skew optimization4. SaltingMuch detailed description of abo...

  • 1 kudos
fbs342
by New Contributor II
  • 229 Views
  • 3 replies
  • 0 kudos

Migrated model to Unity catalog not seeing referenced serving endpoint

There was a model which was migrated from workspace model registry to unity catalog. At the time of initial creation of that model, dependencies to other databricks serving endpoints were configured using "DatabricksServiceEndpoint" config in mlflow....

  • 229 Views
  • 3 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

Workspace model registry worked with workspace-scoped serving endpoints. UC models and UC serving endpoints use metastore-wide semantics and different lookup rules. The saved path inside the model metadata still points to workspace-level endpoints th...

  • 0 kudos
2 More Replies
srkam
by New Contributor
  • 154 Views
  • 1 replies
  • 0 kudos

UC Model Deployment across data bricks instances

Hello, We have multiple data bricks instances each represents an environment dev,qa,rel,prod etc.. We developed a  model in the dev workspace and registered in the UC model registry using mlflow. Now, we are trying to find a best way to deploy this r...

  • 154 Views
  • 1 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

You can use UC's centralized model registry and MLflow’s copy APIs. If all target workspaces attach to the same Unity Catalog metastore, reference and promote models via their 3‑level UC names; use MLflow’s copy_model_version to “copy” the exact arti...

  • 0 kudos
KrishZ
by Contributor
  • 11154 Views
  • 5 replies
  • 4 kudos

How to use Parallel processing using Concurrent Jobs in Databricks ?

QuestionIt would be great if you could recommend how I go about solving the below problem. I haven't been able to find much help online. A. Background:A1. I have to text manipulation using python (like concatenation , convert to spacy doc , get verbs...

  • 11154 Views
  • 5 replies
  • 4 kudos
Latest Reply
Sangsha
New Contributor II
  • 4 kudos

I have to process data for n number of devices which is sending data in every 5 seconds.I have a similar scenario where I have to take last 3 hours of data and process it for all the devices for some key parameters. Now if I am doing it sequentially ...

  • 4 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels