cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

snaveedgm
by New Contributor
  • 3351 Views
  • 1 replies
  • 0 kudos

databricks-vectorsearch 0.53 unable to use similarity_search()

I have an issue with databricks-vectorsearch package. Version 0.51 suddenly stopped working this week because:It now expected me to provide azure_tenant_id in addition to service principal's client ID and secret.After supplying tenant ID, it showed s...

  • 3351 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Hi @snaveedgm , This is interesting - can you double-check that the service principal has CAN QUERY on the embedding endpoint used for ingestion and/or querying (databricks-bge-large-en in your case)? Even though your direct REST test works, double-c...

  • 0 kudos
aswinkks
by New Contributor III
  • 3309 Views
  • 1 replies
  • 0 kudos

ML Solution for unstructured data containing Images and videos

Hi,I have a use case of developing an entire ML solution within Databricks starting from ingestion to inference and monitoring, but the problem is that we have unstructured data containing Images and Video for training the model using frameworks such...

  • 3309 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Hi @aswinkks , This is a very broad question, but generally, when dealing with video data, you convert the videos to images and have a system in place for training and another for inference.  This Databricks blog posts explains how to set up a video ...

  • 0 kudos
naveen_marthala
by Contributor
  • 12653 Views
  • 4 replies
  • 3 kudos

Resolved! How to PREVENT mlflow's autologging from logging ALL runs?

I am logging runs from jupyter notebook. the cells which has `mlflow.sklearn.autlog()` behaves as expected. but, the cells which has .fit() method being called on sklearn's estimators are also being logged as runs without explicitly mentioning `mlflo...

  • 12653 Views
  • 4 replies
  • 3 kudos
Latest Reply
Joe_Breath1
New Contributor III
  • 3 kudos

It looks like MLflow auto-logging is kicking in by default whenever you call .fit(), which is why you’re seeing runs even without explicitly using mlflow.sklearn.autolog(). To fix this, you can disable the global autologging and only trigger it when ...

  • 3 kudos
3 More Replies
harry_dfe
by New Contributor
  • 3130 Views
  • 1 replies
  • 0 kudos

notebook stuck at "filtering data" or waiting to run

Hi, my data is in vector sparse representaion, and it was working fine (display and training ml models), I added few features that converted data from sparse to dense represenation and after that anything I want to perform on data stuck(display or ml...

  • 3130 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @harry_dfe ,    Thanks for the details — this almost certainly stems from your data flipping from a sparse vector representation to a dense one, which explodes per‑row memory and stalls actions like display, writes, and ML training.   Why t...

  • 0 kudos
Paddy_chu
by New Contributor III
  • 3328 Views
  • 1 replies
  • 0 kudos

How to transpose spark dataframe using R API?

Hello,I recently discovered the sparklyr package and found it quite useful. After setting up the Spark connection, I can apply dplyr functions to manipulate large tables. However, it seems that any functions outside of dplyr cannot be used on Spark v...

  • 3328 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @Paddy_chu ,    You’re right that sparklyr gives you most dplyr verbs on Spark, but many tidyr verbs (including pivot_wider/pivot_longer) aren’t translated to Spark SQL and thus won’t run lazily on Spark tables. The practical options are to...

  • 0 kudos
moh3th1
by New Contributor II
  • 3216 Views
  • 1 replies
  • 2 kudos

Experiences with CatBoost Spark Integration in Production on Databricks?

Hi Community,I am currently evaluating various gradient boosting options on Databricks using production-level data, including the CatBoost Spark integration (ai.catboost:catboost-spark).I would love to hear from others who have successfully used this...

  • 3216 Views
  • 1 replies
  • 2 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 2 kudos

Hi @moh3th1 , I can't personally speak to using CatBoost, but I can discuss preferred libraries and recommendations per approach with various gradient-boosting libraries within Databricks. Preferred for robust distributed GBM on Databricks: XGBoost ...

  • 2 kudos
shubham_lekhwar
by New Contributor
  • 3377 Views
  • 1 replies
  • 1 kudos

MLflow Nested run with applyInPandas does not execute

I am trying to train an forecasting model along with Hyperparameters tuning with Hyperopt.I have multiple time series for "KEY" each of which I want to train a separate model. To do this I am using Spark's applyInPandas to tune and train model for ea...

  • 3377 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Hi @shubham_lekhwar , This is a common context-passing issue when using Spark with MLflow. The problem is that the nested=True flag in mlflow.start_run relies on an active run being present in the current process context. Your Parent_RUN is active on...

  • 1 kudos
Paddy_chu
by New Contributor III
  • 3335 Views
  • 1 replies
  • 0 kudos

Databricks app and R shiny

Hello,I've been testing the Databricks app and have the follow questions:1. My organization currently uses Catalog Explorer instead of Unity Catalog. I want to develop a Shiny app and was able to run code from the template under New > App. However, t...

  • 3335 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Thanks for the detailed context—here’s how to get Shiny-based apps working with your current setup and data. 1) Accessing data from “Catalog Explorer” in Databricks Apps A few key points about the Databricks Apps environment and data access: Apps su...

  • 0 kudos
Henrik_
by New Contributor III
  • 2749 Views
  • 1 replies
  • 1 kudos

Nested experiments and UC

Í have a general problem. I run a nested experiment in ML FLow, training and logging several models in a loop.  Then I want to register the best in UC. No problem so far. But when I load the model I register and run prediction it dosen't work. If I o...

  • 2749 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Hey @Henrik_ , There are a few things that could be happening here, if you share the error message/stack trace you get when it doesn’t work, I can help figure out which of these could be biting you and tailor the fix. In the meantime, here's a quick ...

  • 1 kudos
JoaoPigozzo
by New Contributor II
  • 136 Views
  • 2 replies
  • 2 kudos

Best practices for structuring databricks workspaces for CI/CD and ML workflows

Hi everyone,I’m designing the CI/CD process for our environment environment focused on machine learning and data science projects, and I’d like to understand what the best practices are regarding workspace organization—especially when using Unity Cat...

  • 136 Views
  • 2 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommen...

  • 2 kudos
1 More Replies
VivekWV
by New Contributor
  • 233 Views
  • 3 replies
  • 1 kudos

Safe Update Strategy for Online Feature Store Without Endpoint Disruption

Hi Team,We are implementing Databricks Online Feature Store using Lakebase architecture and have run into some constraints during development:Requirements:Deploy an offline table as a synced online table and create a feature spec that queries from th...

  • 233 Views
  • 3 replies
  • 1 kudos
Latest Reply
VivekWV
New Contributor
  • 1 kudos

Hi Mark, Thanks for your response. I followed the steps you suggested:Created the table and set primary key + time series key constraints.Enabled Change Data Feed.Created the feature table and deployed the online endpoint — this worked fine.Removed s...

  • 1 kudos
2 More Replies
AlexH
by New Contributor
  • 116 Views
  • 2 replies
  • 1 kudos

Offline Feature Store in Databricks Serving

Hi, I am planning to deploy a model (pyfunc)  with Databricks Serving. During inference, my model needs to retrieve some data from delta tables. I could make these tables to an offline feature store as well.Latency is not so important. It doesnt matt...

  • 116 Views
  • 2 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

There is a ready feature engineering function for that:  # on non ML runtime please install databricks-feature-engineering>=0.13.0a3" from databricks.feature_engineering import FeatureEngineeringClient fe = FeatureEngineeringClient() from databrick...

  • 1 kudos
1 More Replies
jeremy98
by Honored Contributor
  • 94 Views
  • 2 replies
  • 0 kudos

how to speed up inference?

Hi guys,I'm new to this concept, but we have several ML models that follow the same structure from the code. What I don’t fully understand is how to handle different types of models efficiently — right now, I need to loop through my items to get the ...

  • 94 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @jeremy98  I have not tried this - but could using Python's multiprocessing library to assign the inference for different models to different CPU cores be something you would want to give an attempt? Also here's a useful blog -  https://docs.datab...

  • 0 kudos
1 More Replies
spearitchmeta
by Contributor
  • 102 Views
  • 1 replies
  • 1 kudos

How does Databricks AutoML handle null imputation for categorical features by default?

Hi everyone I’m using Databricks AutoML (classification workflow) on Databricks Runtime 10.4 LTS ML+, and I’d like to clarify how missing (null) values are handled for categorical (string) columns by default.From the AutoML documentation, I see that:...

  • 102 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hello @spearitchmeta , I looked internally to see if I could help with this and I found some information that will shed light on your question.   Here’s how missing (null) values in categorical (string) columns are handled in Databricks AutoML on Dat...

  • 1 kudos
AlbertWang
by Valued Contributor
  • 2655 Views
  • 1 replies
  • 1 kudos

Can I Replicate Azure Document Intelligence's Custom Table Extraction in Databricks?

I am using Azure Document Intelligence to get data from a table in a PDF file. The table's headers do not visually align with the values. Therefore, the standard and pre-built models cannot correctly read the data.I have built a custom-trained Azure ...

  • 2655 Views
  • 1 replies
  • 1 kudos
Latest Reply
dkushari
Databricks Employee
  • 1 kudos

Hi @AlbertWang, you can easily achieve this using AgenBricks - Information Extraction. Your PDFs will be converted to text using the ai_parse_document function and saved in a Databricks table. You can then create the agent using that text table to ge...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels