Machine Learning

by ScyLukb • New Contributor

09-04-2024 6:50:33 AM

3557 Views
1 replies
0 kudos

Determine exact location of MLflow model tracking and model registry files and the Backend Stores

I would like to determine the exact location of:1. MLflow model tracking files2. Model registry files (with Workspace Model Registry)as according to the documentation it is mentioned that: "All methods copy the model into a secure location managed by...

Machine Learning

Reply

3557 Views
1 replies
0 kudos

09-04-2024 6:50:33 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Wednesday

0 kudos

Greetings @ScyLukb , You’re right that the docs say the Workspace Model Registry copies models to a “secure location” but don’t name it prominently. Here’s where those files actually live and how to discover the configured stores. Locations of MLflo...

0 kudos

Wednesday

by art1 • New Contributor III

10-25-2024 7:24:39 AM

3317 Views
1 replies
0 kudos

Hyperopt (15.4 LTS ML) ignores autologger settings

I use ML Flow Experiment to store models once they leave very early tests and development. I switched lately to 15.4 LTS ML and was hit by unhinged Hyperopt behavior:it was creating Experiment logs ignoring i) autologger is off on the workspace level...

Machine Learning

Reply

3317 Views
1 replies
0 kudos

10-25-2024 7:24:39 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Wednesday

0 kudos

Hey @art1 , sorry this post got lost in the shuffle. Here are some things to consider regarding your question: Thanks for flagging this—what you’re seeing is expected given how Databricks integrates Hyperopt with MLflow, and there are clear ways t...

0 kudos

Wednesday

by javeed • New Contributor

02-11-2025 10:10:41 PM

3384 Views
1 replies
0 kudos

Working with pyspark dataframe with machine learning libraries / statistical model libraries

Hi Team, I am working with huge volume of data (50GB) and i decompose the time series data using the statsmodel.Having said that the major challenge i am facing is the compatibility of the pyspark dataframe with the machine learning algorithms. altho...

Machine Learning

Reply

3384 Views
1 replies
0 kudos

02-11-2025 10:10:41 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Wednesday

0 kudos

Greetings @javeed , You’re right to call out the friction between a PySpark DataFrame and many Python ML libraries like statsmodels; most Python ML stacks expect pandas, while Spark is distributed-first. Here’s how to bridge that gap efficiently fo...

0 kudos

Wednesday

by snaveedgm • New Contributor

03-21-2025 9:22:59 AM

3414 Views
1 replies
0 kudos

databricks-vectorsearch 0.53 unable to use similarity_search()

I have an issue with databricks-vectorsearch package. Version 0.51 suddenly stopped working this week because:It now expected me to provide azure_tenant_id in addition to service principal's client ID and secret.After supplying tenant ID, it showed s...

Machine Learning

Reply

3414 Views
1 replies
0 kudos

03-21-2025 9:22:59 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Tuesday

0 kudos

Hi @snaveedgm , This is interesting - can you double-check that the service principal has CAN QUERY on the embedding endpoint used for ingestion and/or querying (databricks-bge-large-en in your case)? Even though your direct REST test works, double-c...

0 kudos

Tuesday

by aswinkks • New Contributor III

03-25-2025 5:47:58 AM

3373 Views
1 replies
0 kudos

ML Solution for unstructured data containing Images and videos

Hi,I have a use case of developing an entire ML solution within Databricks starting from ingestion to inference and monitoring, but the problem is that we have unstructured data containing Images and Video for training the model using frameworks such...

Machine Learning

Reply

3373 Views
1 replies
0 kudos

03-25-2025 5:47:58 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Tuesday

0 kudos

Hi @aswinkks , This is a very broad question, but generally, when dealing with video data, you convert the videos to images and have a system in place for training and another for inference. This Databricks blog posts explains how to set up a video ...

0 kudos

Tuesday

by naveen_marthala • Contributor

05-02-2022 9:08:48 AM

12712 Views
4 replies
3 kudos

Resolved! How to PREVENT mlflow's autologging from logging ALL runs?

I am logging runs from jupyter notebook. the cells which has `mlflow.sklearn.autlog()` behaves as expected. but, the cells which has .fit() method being called on sklearn's estimators are also being logged as runs without explicitly mentioning `mlflo...

Machine Learning

Reply

12712 Views
4 replies
3 kudos

05-02-2022 9:08:48 AM

View Replies

Latest Reply

Joe_Breath1
New Contributor III

08-24-2025 4:39:46 PM

3 kudos

It looks like MLflow auto-logging is kicking in by default whenever you call .fit(), which is why you’re seeing runs even without explicitly using mlflow.sklearn.autolog(). To fix this, you can disable the global autologging and only trigger it when ...

3 kudos

08-24-2025 4:39:46 PM

3 More Replies

by harry_dfe • New Contributor

01-30-2025 5:29:39 AM

3171 Views
1 replies
0 kudos

notebook stuck at "filtering data" or waiting to run

Hi, my data is in vector sparse representaion, and it was working fine (display and training ml models), I added few features that converted data from sparse to dense represenation and after that anything I want to perform on data stuck(display or ml...

Machine Learning

Reply

3171 Views
1 replies
0 kudos

01-30-2025 5:29:39 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Tuesday

0 kudos

Greetings @harry_dfe , Thanks for the details — this almost certainly stems from your data flipping from a sparse vector representation to a dense one, which explodes per‑row memory and stalls actions like display, writes, and ML training. Why t...

0 kudos

Tuesday

by Paddy_chu • New Contributor III

03-03-2025 3:38:12 PM

3375 Views
1 replies
0 kudos

How to transpose spark dataframe using R API?

Hello,I recently discovered the sparklyr package and found it quite useful. After setting up the Spark connection, I can apply dplyr functions to manipulate large tables. However, it seems that any functions outside of dplyr cannot be used on Spark v...

Machine Learning

Reply

3375 Views
1 replies
0 kudos

03-03-2025 3:38:12 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Tuesday

0 kudos

Greetings @Paddy_chu , You’re right that sparklyr gives you most dplyr verbs on Spark, but many tidyr verbs (including pivot_wider/pivot_longer) aren’t translated to Spark SQL and thus won’t run lazily on Spark tables. The practical options are to...

0 kudos

Tuesday

by moh3th1 • New Contributor II

04-11-2025 7:57:19 AM

3278 Views
1 replies
2 kudos

Experiences with CatBoost Spark Integration in Production on Databricks?

Hi Community,I am currently evaluating various gradient boosting options on Databricks using production-level data, including the CatBoost Spark integration (ai.catboost:catboost-spark).I would love to hear from others who have successfully used this...

Machine Learning

Reply

3278 Views
1 replies
2 kudos

04-11-2025 7:57:19 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

2 kudos

Hi @moh3th1 , I can't personally speak to using CatBoost, but I can discuss preferred libraries and recommendations per approach with various gradient-boosting libraries within Databricks. Preferred for robust distributed GBM on Databricks: XGBoost ...

2 kudos

Monday

by shubham_lekhwar • New Contributor

03-27-2025 10:28:11 PM

3428 Views
1 replies
1 kudos

MLflow Nested run with applyInPandas does not execute

I am trying to train an forecasting model along with Hyperparameters tuning with Hyperopt.I have multiple time series for "KEY" each of which I want to train a separate model. To do this I am using Spark's applyInPandas to tune and train model for ea...

Machine Learning

Reply

3428 Views
1 replies
1 kudos

03-27-2025 10:28:11 PM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

1 kudos

Hi @shubham_lekhwar , This is a common context-passing issue when using Spark with MLflow. The problem is that the nested=True flag in mlflow.start_run relies on an active run being present in the current process context. Your Parent_RUN is active on...

1 kudos

Monday

by Paddy_chu • New Contributor III

04-14-2025 11:03:51 AM

3391 Views
1 replies
0 kudos

Databricks app and R shiny

Hello,I've been testing the Databricks app and have the follow questions:1. My organization currently uses Catalog Explorer instead of Unity Catalog. I want to develop a Shiny app and was able to run code from the template under New > App. However, t...

Machine Learning

Reply

3391 Views
1 replies
0 kudos

04-14-2025 11:03:51 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

0 kudos

Thanks for the detailed context—here’s how to get Shiny-based apps working with your current setup and data. 1) Accessing data from “Catalog Explorer” in Databricks Apps A few key points about the Databricks Apps environment and data access: Apps su...

0 kudos

Monday

by Henrik_ • New Contributor III

04-16-2025 11:59:20 PM

2787 Views
1 replies
1 kudos

Nested experiments and UC

Í have a general problem. I run a nested experiment in ML FLow, training and logging several models in a loop. Then I want to register the best in UC. No problem so far. But when I load the model I register and run prediction it dosen't work. If I o...

Machine Learning

Reply

2787 Views
1 replies
1 kudos

04-16-2025 11:59:20 PM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

1 kudos

Hey @Henrik_ , There are a few things that could be happening here, if you share the error message/stack trace you get when it doesn’t work, I can help figure out which of these could be biting you and tailor the fix. In the meantime, here's a quick ...

1 kudos

Monday

by JoaoPigozzo • New Contributor III

a week ago

223 Views
2 replies
3 kudos

Resolved! Best practices for structuring databricks workspaces for CI/CD and ML workflows

Hi everyone,I’m designing the CI/CD process for our environment environment focused on machine learning and data science projects, and I’d like to understand what the best practices are regarding workspace organization—especially when using Unity Cat...

Machine Learning

Reply

223 Views
2 replies
3 kudos

a week ago

View Replies

Latest Reply

mark_ott
Databricks Employee

a week ago

3 kudos

When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommen...

3 kudos

a week ago

1 More Replies

by VivekWV • New Contributor II

2 weeks ago

343 Views
3 replies
1 kudos

Safe Update Strategy for Online Feature Store Without Endpoint Disruption

Hi Team,We are implementing Databricks Online Feature Store using Lakebase architecture and have run into some constraints during development:Requirements:Deploy an offline table as a synced online table and create a feature spec that queries from th...

Machine Learning

Reply

343 Views
3 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

VivekWV
New Contributor II

Monday

1 kudos

Hi Mark, Thanks for your response. I followed the steps you suggested:Created the table and set primary key + time series key constraints.Enabled Change Data Feed.Created the feature table and deployed the online endpoint — this worked fine.Removed s...

1 kudos

Monday

2 More Replies

by AlexH • New Contributor II

a week ago

182 Views
2 replies
1 kudos

Offline Feature Store in Databricks Serving

Hi, I am planning to deploy a model (pyfunc) with Databricks Serving. During inference, my model needs to retrieve some data from delta tables. I could make these tables to an offline feature store as well.Latency is not so important. It doesnt matt...

Machine Learning

Reply

182 Views
2 replies
1 kudos

a week ago

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

a week ago

1 kudos

There is a ready feature engineering function for that: # on non ML runtime please install databricks-feature-engineering>=0.13.0a3" from databricks.feature_engineering import FeatureEngineeringClient fe = FeatureEngineeringClient() from databrick...

1 kudos

a week ago

1 More Replies

Databricks Community

Forum Posts

Determine exact location of MLflow model tracking and model registry files and the Backend Stores

Hyperopt (15.4 LTS ML) ignores autologger settings

Working with pyspark dataframe with machine learning libraries / statistical model libraries

databricks-vectorsearch 0.53 unable to use similarity_search()

ML Solution for unstructured data containing Images and videos

Resolved! How to PREVENT mlflow's autologging from logging ALL runs?

notebook stuck at "filtering data" or waiting to run

How to transpose spark dataframe using R API?

Experiences with CatBoost Spark Integration in Production on Databricks?

MLflow Nested run with applyInPandas does not execute

Databricks app and R shiny

Nested experiments and UC

Resolved! Best practices for structuring databricks workspaces for CI/CD and ML workflows

Safe Update Strategy for Online Feature Store Without Endpoint Disruption

Offline Feature Store in Databricks Serving

Join Us as a Local Community Builder!

how to speed up inference?

Best practices for structuring databricks workspac...

How does Databricks AutoML handle null imputation ...

Importing sentence-transformers no longer works on...

Databricks Free Edition serverless