cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

adoodsonruby
by New Contributor II
  • 4533 Views
  • 1 replies
  • 1 kudos

AutoML Doesn't Work Due to Not being able to generate the EDA notebook

HiI'm trying run AutoML classification experiment with a dataset that I have made, and am experiencing this issue even after I have purposely downsampled my dataset before running it into the AutoML experiment. It appears that there is no way for me ...

  • 4533 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @adoodsonruby , sorry this got lost in the shuffle.  Have you tried again recently? I believe limits have been increased that would remove this impediment. Let us know, Louis.

  • 1 kudos
lchicoma
by New Contributor
  • 4348 Views
  • 1 replies
  • 0 kudos

Error to create an endpoint of databricks with 2 primary keys online table

I have a delta table that has a primary key conformed by 2 fields (accountId,ruleModelVersionDesc) and I have also created an online table that has the same primary key, but when I create a feature spec to create an endpoint I get the following error...

Machine Learning
enpoints
featurespec
fetureserving
MachineLearning
onlinetabl
  • 4348 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @lchicoma , sorry for the delayed response.  Thanks for sharing the error and context—this looks like a parsing issue in the feature specification rather than a problem with Delta or the runtime versions.   What changed recently There was an inci...

  • 0 kudos
amanjethani
by New Contributor
  • 1433 Views
  • 1 replies
  • 0 kudos

🐞 Stuck on LightGBM Distributed Training in PySpark – Hanging After Socket Communication

My Setup:I'm trying to run distributed LightGBM training using synapseml.lightgbm.LightGBMRegressor in PySpark.Cluster Details:Spark version: 3.5.1 (compatible with PySpark 3.5.6)PySpark version: 3.5.6synapseml: v0.11.1 (latest)Spark Cluster: 3 Hetzn...

  • 1433 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Hi @amanjethani , Thanks for laying out the setup and symptoms so clearly. The hang likely occurs because LightGBM’s distributed network either doesn’t fully form between executors or because the expected task count doesn’t match actual tasks, leadin...

  • 0 kudos
semsim
by Contributor
  • 4146 Views
  • 1 replies
  • 0 kudos

Can't query Legacy Serving Endpoint

Hi,I was able to deploy an endpoint using legacy serving (It's the only option we have to deploy endpoints in DB). Now I am having trouble querying the endpoint itself. When I try to query it I get the following error:    Here is the code I am using ...

semsim_0-1726245119742.png
  • 4146 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @semsim , sorry for the delayed response.   Thanks for the screenshot—this pinpoints the problem.   Root cause from the error Your model’s predict path is trying to create or write to /Workspace/Shared, and the serving container does not permit t...

  • 0 kudos
Kasen
by New Contributor III
  • 4989 Views
  • 1 replies
  • 1 kudos

Multi-tenant recommendation system (Machine learning)

Hello,I am looking to build a multi-tenant machine learning recommender system in Azure Databricks. The idea is to have a single shared model, where each tenant can use the same model to train on their own unique dataset. Essentially, while the model...

Machine Learning
machine learning
multi-tenant
recommendation
  • 4989 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

@Kasen , sorry for the delayed response.  Here are some things to consider regarding your question.   Azure Databricks is well-suited for a shared-architecture, tenant‑isolated recommender system. Below is a pragmatic blueprint, the isolation model o...

  • 1 kudos
ScyLukb
by New Contributor
  • 4551 Views
  • 1 replies
  • 0 kudos

Determine exact location of MLflow model tracking and model registry files and the Backend Stores

I would like to determine the exact location of:1. MLflow model tracking files2. Model registry files (with Workspace Model Registry)as according to the documentation it is mentioned that: "All methods copy the model into a secure location managed by...

  • 4551 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @ScyLukb ,  You’re right that the docs say the Workspace Model Registry copies models to a “secure location” but don’t name it prominently. Here’s where those files actually live and how to discover the configured stores. Locations of MLflo...

  • 0 kudos
art1
by New Contributor III
  • 4059 Views
  • 1 replies
  • 0 kudos

Hyperopt (15.4 LTS ML) ignores autologger settings

I use ML Flow Experiment to store models once they leave very early tests and development. I switched lately to 15.4 LTS ML and was hit by unhinged Hyperopt behavior:it was creating Experiment logs ignoring i) autologger is off on the workspace level...

  • 4059 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @art1 , sorry this post got lost in the shuffle.  Here are some things to consider regarding your question:   Thanks for flagging this—what you’re seeing is expected given how Databricks integrates Hyperopt with MLflow, and there are clear ways t...

  • 0 kudos
javeed
by New Contributor
  • 4239 Views
  • 1 replies
  • 0 kudos

Working with pyspark dataframe with machine learning libraries / statistical model libraries

Hi Team, I am working with huge volume of data (50GB) and i decompose the time series data using the statsmodel.Having said that the major challenge i am facing is the compatibility of the pyspark dataframe with the machine learning algorithms. altho...

  • 4239 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @javeed ,   You’re right to call out the friction between a PySpark DataFrame and many Python ML libraries like statsmodels; most Python ML stacks expect pandas, while Spark is distributed-first. Here’s how to bridge that gap efficiently fo...

  • 0 kudos
snaveedgm
by New Contributor
  • 4222 Views
  • 1 replies
  • 1 kudos

databricks-vectorsearch 0.53 unable to use similarity_search()

I have an issue with databricks-vectorsearch package. Version 0.51 suddenly stopped working this week because:It now expected me to provide azure_tenant_id in addition to service principal's client ID and secret.After supplying tenant ID, it showed s...

  • 4222 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Hi @snaveedgm , This is interesting - can you double-check that the service principal has CAN QUERY on the embedding endpoint used for ingestion and/or querying (databricks-bge-large-en in your case)? Even though your direct REST test works, double-c...

  • 1 kudos
aswinkks
by New Contributor III
  • 4264 Views
  • 1 replies
  • 1 kudos

Resolved! ML Solution for unstructured data containing Images and videos

Hi,I have a use case of developing an entire ML solution within Databricks starting from ingestion to inference and monitoring, but the problem is that we have unstructured data containing Images and Video for training the model using frameworks such...

  • 4264 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Hi @aswinkks , This is a very broad question, but generally, when dealing with video data, you convert the videos to images and have a system in place for training and another for inference.  This Databricks blog posts explains how to set up a video ...

  • 1 kudos
harry_dfe
by New Contributor
  • 4118 Views
  • 1 replies
  • 1 kudos

Resolved! notebook stuck at "filtering data" or waiting to run

Hi, my data is in vector sparse representaion, and it was working fine (display and training ml models), I added few features that converted data from sparse to dense represenation and after that anything I want to perform on data stuck(display or ml...

  • 4118 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @harry_dfe ,    Thanks for the details — this almost certainly stems from your data flipping from a sparse vector representation to a dense one, which explodes per‑row memory and stalls actions like display, writes, and ML training.   Why t...

  • 1 kudos
Paddy_chu
by New Contributor III
  • 4171 Views
  • 1 replies
  • 0 kudos

How to transpose spark dataframe using R API?

Hello,I recently discovered the sparklyr package and found it quite useful. After setting up the Spark connection, I can apply dplyr functions to manipulate large tables. However, it seems that any functions outside of dplyr cannot be used on Spark v...

  • 4171 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @Paddy_chu ,    You’re right that sparklyr gives you most dplyr verbs on Spark, but many tidyr verbs (including pivot_wider/pivot_longer) aren’t translated to Spark SQL and thus won’t run lazily on Spark tables. The practical options are to...

  • 0 kudos
moh3th1
by New Contributor II
  • 4590 Views
  • 1 replies
  • 2 kudos

Resolved! Experiences with CatBoost Spark Integration in Production on Databricks?

Hi Community,I am currently evaluating various gradient boosting options on Databricks using production-level data, including the CatBoost Spark integration (ai.catboost:catboost-spark).I would love to hear from others who have successfully used this...

  • 4590 Views
  • 1 replies
  • 2 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 2 kudos

Hi @moh3th1 , I can't personally speak to using CatBoost, but I can discuss preferred libraries and recommendations per approach with various gradient-boosting libraries within Databricks. Preferred for robust distributed GBM on Databricks: XGBoost ...

  • 2 kudos
shubham_lekhwar
by New Contributor
  • 4360 Views
  • 1 replies
  • 1 kudos

Resolved! MLflow Nested run with applyInPandas does not execute

I am trying to train an forecasting model along with Hyperparameters tuning with Hyperopt.I have multiple time series for "KEY" each of which I want to train a separate model. To do this I am using Spark's applyInPandas to tune and train model for ea...

  • 4360 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Hi @shubham_lekhwar , This is a common context-passing issue when using Spark with MLflow. The problem is that the nested=True flag in mlflow.start_run relies on an active run being present in the current process context. Your Parent_RUN is active on...

  • 1 kudos
Paddy_chu
by New Contributor III
  • 4444 Views
  • 1 replies
  • 1 kudos

Databricks app and R shiny

Hello,I've been testing the Databricks app and have the follow questions:1. My organization currently uses Catalog Explorer instead of Unity Catalog. I want to develop a Shiny app and was able to run code from the template under New > App. However, t...

  • 4444 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Thanks for the detailed context—here’s how to get Shiny-based apps working with your current setup and data. 1) Accessing data from “Catalog Explorer” in Databricks Apps A few key points about the Databricks Apps environment and data access: Apps su...

  • 1 kudos
Labels