cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pfzoz
by Visitor
  • 58 Views
  • 1 replies
  • 0 kudos

Using Qwen with vLLM

There are many conflict and dependency issues when trying to install VLLM and use the Qwen models (on serverless), even the v2 families.I tried following this guide https://docs.databricks.com/aws/en/machine-learning/sgc-examples/tutorials/sgc-raydat...

  • 58 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi @pfzoz -- the "Model architectures failed to be inspected" error you are hitting is a well-known compatibility issue between vLLM, the transformers library, and the Qwen2/2.5-VL model family. The root cause is that vLLM's model registry subprocess...

  • 0 kudos
thomasm
by New Contributor III
  • 654 Views
  • 6 replies
  • 2 kudos

MLFlow Detailed Trace view doesn't work in some workspaces

I've created a Databricks Model Serving Endpoint which serves an MLFlow Pyfunc model. The model uses langchain and I'm using mlflow.langchain.autolog().At my company we have some production(-like) workspaces where users cannot e.g. run Notebooks and ...

thomasm_1-1767785859607.png thomasm_0-1767785737567.png thomasm_2-1767785939124.png
  • 654 Views
  • 6 replies
  • 2 kudos
Latest Reply
thomasm
New Contributor III
  • 2 kudos

Since this week it went away automatically. I don't know why, I think Databricks changed something under the hood. The Databricks platform engineering team at my company couldn't figure out the cause, and also our resident Databricks solution archite...

  • 2 kudos
5 More Replies
thomas_berry
by Databricks Partner
  • 249 Views
  • 3 replies
  • 0 kudos

Resolved! TrainingArguments fails

Hello,I am working on an ML project for text classification and I have a problem.The following piece of code stalls completely. It prints 'start' but never 'end'.from transformers import TrainingArguments print("start") args = TrainingArguments(outpu...

  • 249 Views
  • 3 replies
  • 0 kudos
Latest Reply
thomas_berry
Databricks Partner
  • 0 kudos

Hello @lingareddy_Alva ,Thank you for your reply. I have since been given a cluster with the ML Runtime and the code now works. So I consider the problem solved.

  • 0 kudos
2 More Replies
TomBurns
by New Contributor
  • 2127 Views
  • 2 replies
  • 0 kudos

Identity Resolution

Looking for best solutions for identity resolution. I already have deterministic matching. Exploring probabilistic solutions. Any advice for me?

  • 2127 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sonal
New Contributor III
  • 0 kudos

Check open source Zingg which runs natively within Databricks https://github.com/zinggAI/zingg

  • 0 kudos
1 More Replies
ruia-dojo
by New Contributor
  • 158 Views
  • 1 replies
  • 0 kudos

Job compute fails due to BQ permissions

Hello,My databricks workspace is associated to GCP project analytics.But me and my team mostly work on GCP project data-science, which contains the only BQ dataset that we have write access to.I'm trying to automate a pipeline to run on job compute a...

  • 158 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

What identity is the job running as? Do you have any settings on the all-purpose cluster that you are not setting on the job-cluster? Maybe you need to provide roles/bigquery.jobUser on project analytics to the job compute service account?

  • 0 kudos
knight22-21
by New Contributor II
  • 637 Views
  • 3 replies
  • 1 kudos

Resolved! Unable to Access Azure Blob Storage from Databricks Community Edition Notebook

Hi everyone,I’m currently using the Databricks Community Edition and trying to access data stored in Azure Blob Storage from my .ipynb notebook. The storage account is part of my student free Azure subscription.However, I’m not able to establish a co...

  • 637 Views
  • 3 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, I think you are referring to Databricks Free edition, in which case this doesn't support the connection to external storage such as Azure Blob storage. Thanks,Emma

  • 1 kudos
2 More Replies
rtglorenabasul
by New Contributor
  • 401 Views
  • 1 replies
  • 1 kudos

Resolved! Issue Running Job on Serverless GPU

I have a job that runs a notebook, the notebook uses serverless GPU (A10) and it keeps failing with a "Run failed with error message Cluster 'xxxxxxxxxxx' was terminated. Reason: UNKNOWN (SUCCESS)". The base environment is 'Standard v4' and I have tr...

  • 401 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @rtglorenabasul, Thanks for sharing the details. The behaviour you’re seeing is consistent with an issue in how the job is bringing up Serverless GPU compute, rather than with the notebook code itself. Having done some checks, that error usually m...

  • 1 kudos
KyraHinnegan
by New Contributor II
  • 467 Views
  • 1 replies
  • 1 kudos

Resolved! Which types of model serving endpoints have health metrics available?

I am retrieving a list of model serving endpoints for my workspace via this API: https://docs.databricks.com/api/workspace/servingendpoints/listAnd then going to retrieve health metrics for each one with: https://[DATABRICKS_HOST]/api/2.0/serving-end...

  • 467 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @KyraHinnegan, I did some digging and here is what I found. Hopefully it helps you understand a bit more about what is going on. At a high level, not every endpoint type exposes infrastructure health metrics via /metrics. What you’re seeing with ...

  • 1 kudos
jayshan
by New Contributor III
  • 991 Views
  • 4 replies
  • 3 kudos

Resolved! Generic Spark Connect ML error. The fitted or loaded model size is too big.

When I train models in the serverless environment V4 (Premium Plan), the system occasionally returns the error message listed below, especially after running the model training code multiple times. We have tried creating new serverless sessions, whic...

  • 991 Views
  • 4 replies
  • 3 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 3 kudos

Hi @jayshan, I'm sorry for the delayed response to your question. And, thanks for the extra details and for sharing your workaround. This behaviour is tied to how Spark Connect ML works in serverless mode, rather than a traditional JVM/GC leak. On se...

  • 3 kudos
3 More Replies
RodrigoE
by New Contributor III
  • 1200 Views
  • 5 replies
  • 3 kudos

Resolved! Vector search index initialization very slow

Hello,I am creating a vector search index and selected Compute embeddings for a delta table with 19M records.  Delta table has only two  columns: ID (selected as index) and Name (selected for embedding). Embedding model is databricks-gte-large-en.Ind...

Machine Learning
index
search
vector
vector index
Vector Search
  • 1200 Views
  • 5 replies
  • 3 kudos
Latest Reply
BadrErraji
New Contributor III
  • 3 kudos

Why the deltaSync doesn't compute the embedding in parralel instead of sequential.That a major gap in the architecture no ? 

  • 3 kudos
4 More Replies
fede_bia
by Databricks Partner
  • 420 Views
  • 1 replies
  • 0 kudos

Databricks Model Serving Scaling Logic

Hi everyone,I’m seeking technical clarification on how Databricks Model Serving handles request queuing and autoscaling for CPU-intensive tasks. I am deploying a custom model for text and image extraction from PDFs (using Tesseract), and I’m struggli...

  • 420 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

TLDR: Pre-provision min_provisioned_concurrency â‰¥ your peak parallel requests (in multiples of 4) with scale-to-zero disabled, and chunk large PDFs in your model code to bound per-request latency — reactive autoscaling can't help CPU-bound workloads ...

  • 0 kudos
boskicl
by New Contributor III
  • 875 Views
  • 4 replies
  • 1 kudos

Resolved! mlflow spark load_model fails with FMRegressor Model error on Unity Catalog

We trained a Spark ML FMRegressor model and registered it to Unity Catalog via MLflow. When attempting to load it back using mlflow.spark.load_model, we get anOSError: [Errno 5] Input/output error: '/dbfs/tmp' regardless of what dfs_tmpdir path is pa...

  • 875 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi, This is a well-documented issue that comes down to cluster access mode and how mlflow.spark.load_model handles temporary file storage. Let me break down both problems you are hitting and provide solutions. PROBLEM 1: OSError: [Errno 5] Input/outp...

  • 1 kudos
3 More Replies
d_szepietowska
by New Contributor II
  • 906 Views
  • 3 replies
  • 4 kudos

Why ENABLE_MLFLOW_TRACING does not work for serving endpoint?

I would like to ask you if  you have experienced similar issue like me recently. I trained sklearn model. Logged this model with fe.log_model for automatic feature lookup. Online feature tables where published with currently recommended approach, whi...

  • 906 Views
  • 3 replies
  • 4 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 4 kudos

Hi @d_szepietowska, Thank you for the detailed investigation, especially the side-by-side comparison between the legacy online table (MySQL) and the Lakebase-backed Online Feature Store. That is very helpful for narrowing down the behavior. UNDERSTAN...

  • 4 kudos
2 More Replies
Dali1
by New Contributor III
  • 541 Views
  • 4 replies
  • 1 kudos

Params with databricks Asset bundles

Hello,I am using Databricks Asset bundels to create jobs for machine learning pipelines.My problem is I am using SparkPython taks and defining params inside those. When the job is created it is created with some params. When I want to run the same jo...

  • 541 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Dali1, Great questions -- parameterizing ML pipelines in DABs is something a lot of people wrestle with, so let me break down the options. THE SHORT ANSWER No, you should not have to update the job definition every time you want different paramet...

  • 1 kudos
3 More Replies
lschneid
by New Contributor II
  • 542 Views
  • 1 replies
  • 1 kudos

Replacing a Monolithic MLflow Serving Pipeline with Composed Models in Databricks

Hi everyone,I’m a senior MLE and recently joined a company where all data science and ML workloads run on Databricks. My background is mostly MLOps on Kubernetes, so I’m currently ramping up on Databricks and trying to improve the architecture of som...

  • 542 Views
  • 1 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @lschneid, This is a common architectural evolution for ML serving on Databricks, and the platform gives you several good options for decomposing a monolithic serving pipeline into cleaner, more maintainable components. Here is a breakdown of the ...

  • 1 kudos
Labels