Machine Learning

by moh3th1 • New Contributor II

04-11-2025 7:57:19 AM

3157 Views
1 replies
1 kudos

Experiences with CatBoost Spark Integration in Production on Databricks?

Hi Community,I am currently evaluating various gradient boosting options on Databricks using production-level data, including the CatBoost Spark integration (ai.catboost:catboost-spark).I would love to hear from others who have successfully used this...

Machine Learning

Reply

3157 Views
1 replies
1 kudos

04-11-2025 7:57:19 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

59m ago

1 kudos

Hi @moh3th1 , I can't personally speak to using CatBoost, but I can discuss preferred libraries and recommendations per approach with various gradient-boosting libraries within Databricks. Preferred for robust distributed GBM on Databricks: XGBoost ...

1 kudos

59m ago

by shubham_lekhwar • New Contributor

03-27-2025 10:28:11 PM

3327 Views
1 replies
0 kudos

MLflow Nested run with applyInPandas does not execute

I am trying to train an forecasting model along with Hyperparameters tuning with Hyperopt.I have multiple time series for "KEY" each of which I want to train a separate model. To do this I am using Spark's applyInPandas to tune and train model for ea...

Machine Learning

Reply

3327 Views
1 replies
0 kudos

03-27-2025 10:28:11 PM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

an hour ago

0 kudos

Hi @shubham_lekhwar , This is a common context-passing issue when using Spark with MLflow. The problem is that the nested=True flag in mlflow.start_run relies on an active run being present in the current process context. Your Parent_RUN is active on...

0 kudos

an hour ago

by Paddy_chu • New Contributor III

04-14-2025 11:03:51 AM

3291 Views
1 replies
0 kudos

Databricks app and R shiny

Hello,I've been testing the Databricks app and have the follow questions:1. My organization currently uses Catalog Explorer instead of Unity Catalog. I want to develop a Shiny app and was able to run code from the template under New > App. However, t...

Machine Learning

Reply

3291 Views
1 replies
0 kudos

04-14-2025 11:03:51 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

2 hours ago

0 kudos

Thanks for the detailed context—here’s how to get Shiny-based apps working with your current setup and data. 1) Accessing data from “Catalog Explorer” in Databricks Apps A few key points about the Databricks Apps environment and data access: Apps su...

0 kudos

2 hours ago

by Henrik_ • New Contributor III

04-16-2025 11:59:20 PM

2710 Views
1 replies
0 kudos

Nested experiments and UC

Í have a general problem. I run a nested experiment in ML FLow, training and logging several models in a loop. Then I want to register the best in UC. No problem so far. But when I load the model I register and run prediction it dosen't work. If I o...

Machine Learning

Reply

2710 Views
1 replies
0 kudos

04-16-2025 11:59:20 PM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

2 hours ago

0 kudos

Hey @Henrik_ , There are a few things that could be happening here, if you share the error message/stack trace you get when it doesn’t work, I can help figure out which of these could be biting you and tailor the fix. In the meantime, here's a quick ...

0 kudos

2 hours ago

by JoaoPigozzo • New Contributor II

Thursday

106 Views
2 replies
2 kudos

Best practices for structuring databricks workspaces for CI/CD and ML workflows

Hi everyone,I’m designing the CI/CD process for our environment environment focused on machine learning and data science projects, and I’d like to understand what the best practices are regarding workspace organization—especially when using Unity Cat...

Machine Learning

Reply

106 Views
2 replies
2 kudos

Thursday

View Replies

Latest Reply

mark_ott
Databricks Employee

Friday

2 kudos

When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommen...

2 kudos

Friday

1 More Replies

by VivekWV • New Contributor

Thursday

187 Views
3 replies
1 kudos

Safe Update Strategy for Online Feature Store Without Endpoint Disruption

Hi Team,We are implementing Databricks Online Feature Store using Lakebase architecture and have run into some constraints during development:Requirements:Deploy an offline table as a synced online table and create a feature spec that queries from th...

Machine Learning

Reply

187 Views
3 replies
1 kudos

Thursday

View Replies

Latest Reply

VivekWV
New Contributor

9 hours ago

1 kudos

Hi Mark, Thanks for your response. I followed the steps you suggested:Created the table and set primary key + time series key constraints.Enabled Change Data Feed.Created the feature table and deployed the online endpoint — this worked fine.Removed s...

1 kudos

9 hours ago

2 More Replies

by AlexH • New Contributor

yesterday

83 Views
2 replies
1 kudos

Offline Feature Store in Databricks Serving

Hi, I am planning to deploy a model (pyfunc) with Databricks Serving. During inference, my model needs to retrieve some data from delta tables. I could make these tables to an offline feature store as well.Latency is not so important. It doesnt matt...

Machine Learning

Reply

83 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

yesterday

1 kudos

There is a ready feature engineering function for that: # on non ML runtime please install databricks-feature-engineering>=0.13.0a3" from databricks.feature_engineering import FeatureEngineeringClient fe = FeatureEngineeringClient() from databrick...

1 kudos

yesterday

1 More Replies

by jeremy98 • Honored Contributor

Thursday

77 Views
2 replies
0 kudos

how to speed up inference?

Hi guys,I'm new to this concept, but we have several ML models that follow the same structure from the code. What I don’t fully understand is how to handle different types of models efficiently — right now, I need to loop through my items to get the ...

Machine Learning

Reply

77 Views
2 replies
0 kudos

Thursday

View Replies

Latest Reply

NandiniN
Databricks Employee

Friday

0 kudos

Hi @jeremy98 I have not tried this - but could using Python's multiprocessing library to assign the inference for different models to different CPU cores be something you would want to give an attempt? Also here's a useful blog - https://docs.datab...

0 kudos

Friday

1 More Replies

by spearitchmeta • Contributor

Thursday

84 Views
1 replies
1 kudos

How does Databricks AutoML handle null imputation for categorical features by default?

Hi everyone I’m using Databricks AutoML (classification workflow) on Databricks Runtime 10.4 LTS ML+, and I’d like to clarify how missing (null) values are handled for categorical (string) columns by default.From the AutoML documentation, I see that:...

Machine Learning

Reply

84 Views
1 replies
1 kudos

Thursday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Thursday

1 kudos

Hello @spearitchmeta , I looked internally to see if I could help with this and I found some information that will shed light on your question. Here’s how missing (null) values in categorical (string) columns are handled in Databricks AutoML on Dat...

1 kudos

Thursday

by AlbertWang • Valued Contributor

05-14-2025 8:07:30 PM

2646 Views
1 replies
1 kudos

Can I Replicate Azure Document Intelligence's Custom Table Extraction in Databricks?

I am using Azure Document Intelligence to get data from a table in a PDF file. The table's headers do not visually align with the values. Therefore, the standard and pre-built models cannot correctly read the data.I have built a custom-trained Azure ...

Machine Learning

Reply

2646 Views
1 replies
1 kudos

05-14-2025 8:07:30 PM

View Replies

Latest Reply

dkushari
Databricks Employee

Wednesday

1 kudos

Hi @AlbertWang, you can easily achieve this using AgenBricks - Information Extraction. Your PDFs will be converted to text using the ai_parse_document function and saved in a Databricks table. You can then create the agent using that text table to ge...

1 kudos

Wednesday

by MightyMasdo • New Contributor III

06-07-2024 2:23:20 AM

3163 Views
3 replies
7 kudos

Spark context not implemented Error when using Databricks connect

I am developing an application using databricks connect and when I try to use VectorAssembler I get the Error sc is not none Assertion Error. is there a workaround for this ?

Machine Learning

Reply

3163 Views
3 replies
7 kudos

06-07-2024 2:23:20 AM

View Replies

Latest Reply

pibe1
New Contributor II

2 weeks ago

7 kudos

Ran into exactly the same issue as @Łukasz1 After some googling, I found this SO post explaining the issue: later versions of databricks connect no longer support the SparkContext API. Our code is failing because the underlying library is trying to f...

7 kudos

2 weeks ago

2 More Replies

by tarunnagar • New Contributor II

2 weeks ago

240 Views
1 replies
1 kudos

Best Practices for Collaborative Notebook Development in Databricks

Hi everyone! I’m looking to learn more about effective strategies for collaborative development in Databricks notebooks. Since notebooks are often used by multiple data scientists, analysts, and engineers, managing collaboration efficiently is critic...

Machine Learning

Reply

240 Views
1 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

AbhaySingh
Databricks Employee

2 weeks ago

1 kudos

For version control, use this approach.Git Integration with Databricks ReposCore Features:Databricks Git Folders (Repos) provides native Git integration with visual UI and REST API access Supports all major providers: GitHub, GitLab, Azure DevOps, Bi...

1 kudos

2 weeks ago

by gg5 • New Contributor II

06-03-2025 2:02:33 AM

2231 Views
4 replies
2 kudos

Resolved! Unable to Access Delta View from Azure Machine Learning via Delta Sharing – Is View Access Supported

Unable to Access Delta View from Azure Machine Learning via Delta Sharing – Is View Access Supported?I am able to access the tables but while accessing the view I am getting below error.Response from server: { 'details': [ { '@type': 'type.googleapis...

Machine Learning

Reply

2231 Views
4 replies
2 kudos

06-03-2025 2:02:33 AM

View Replies

Latest Reply

ericwang52
New Contributor II

2 weeks ago

2 kudos

View sharing is supported (launched GA) in Databricks. See https://docs.databricks.com/aws/en/delta-sharing/create-share#add-views-to-a-share. You likely need a workspace id override. Creating the recipient from a workspace with proper access and res...

2 kudos

2 weeks ago

3 More Replies

by juandados • New Contributor

3 weeks ago

258 Views
1 replies
0 kudos

GenAI experiment tracing does not render markdown images

When traces include base64 encoded images in Markdown, they do not render properly. This makes the analysis of traces including images difficult.Just for context, the same trace in other tracing tools like LangSmith renders as expected. An example of...

Machine Learning

Reply

258 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

sarahbhord
Databricks Employee

2 weeks ago

0 kudos

Thank you for the for the flag juandados! I will ping my product team to get a timeline for you.

0 kudos

2 weeks ago

by ostae911 • New Contributor

06-19-2025 5:44:11 PM

780 Views
1 replies
1 kudos

AutoML Forecast fails when using feature_store_lookups with timestamp key

We are running AutoML Forecast on Databricks Runtime 15.4 ML LTS and 16.4 ML LTS, using a time series dataset with temporal covariates from the Feature Store (e.g. a corona_dummy feature). We use feature_store_lookups with lookup_key and timestamp_lo...

Machine Learning

Reply

780 Views
1 replies
1 kudos

06-19-2025 5:44:11 PM

View Replies

Latest Reply

jamesl
Databricks Employee

2 weeks ago

1 kudos

Hi @ostae911 , are you still facing this issue? It looks like your usage of the timestamp column is correct. It can be used as a primary key on the time series feature table. Is it possible that there are other duplicate columns between the training ...

1 kudos

2 weeks ago

Databricks Community

Forum Posts

Experiences with CatBoost Spark Integration in Production on Databricks?

MLflow Nested run with applyInPandas does not execute

Databricks app and R shiny

Nested experiments and UC

Best practices for structuring databricks workspaces for CI/CD and ML workflows

Safe Update Strategy for Online Feature Store Without Endpoint Disruption

Offline Feature Store in Databricks Serving

how to speed up inference?

How does Databricks AutoML handle null imputation for categorical features by default?

Can I Replicate Azure Document Intelligence's Custom Table Extraction in Databricks?

Spark context not implemented Error when using Databricks connect

Best Practices for Collaborative Notebook Development in Databricks

Resolved! Unable to Access Delta View from Azure Machine Learning via Delta Sharing – Is View Access Supported

GenAI experiment tracing does not render markdown images

AutoML Forecast fails when using feature_store_lookups with timestamp key

Join Us as a Local Community Builder!

Problem loading a pyfunc model in job run

Serving Endpoint Disappears After One Day

Can't use pyspark bucketizer

VLLM dependency Issues with DBR 17.0

Custom docker container for GPU compute using pyth...