cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rtglorenabasul
by New Contributor
  • 1383 Views
  • 1 replies
  • 1 kudos

Resolved! Issue Running Job on Serverless GPU

I have a job that runs a notebook, the notebook uses serverless GPU (A10) and it keeps failing with a "Run failed with error message Cluster 'xxxxxxxxxxx' was terminated. Reason: UNKNOWN (SUCCESS)". The base environment is 'Standard v4' and I have tr...

  • 1383 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @rtglorenabasul, Thanks for sharing the details. The behaviour you’re seeing is consistent with an issue in how the job is bringing up Serverless GPU compute, rather than with the notebook code itself. Having done some checks, that error usually m...

  • 1 kudos
jayshan
by New Contributor III
  • 2276 Views
  • 4 replies
  • 3 kudos

Resolved! Generic Spark Connect ML error. The fitted or loaded model size is too big.

When I train models in the serverless environment V4 (Premium Plan), the system occasionally returns the error message listed below, especially after running the model training code multiple times. We have tried creating new serverless sessions, whic...

  • 2276 Views
  • 4 replies
  • 3 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 3 kudos

Hi @jayshan, I'm sorry for the delayed response to your question. And, thanks for the extra details and for sharing your workaround. This behaviour is tied to how Spark Connect ML works in serverless mode, rather than a traditional JVM/GC leak. On se...

  • 3 kudos
3 More Replies
RodrigoE
by New Contributor III
  • 2189 Views
  • 5 replies
  • 3 kudos

Resolved! Vector search index initialization very slow

Hello,I am creating a vector search index and selected Compute embeddings for a delta table with 19M records.  Delta table has only two  columns: ID (selected as index) and Name (selected for embedding). Embedding model is databricks-gte-large-en.Ind...

Machine Learning
index
search
vector
vector index
Vector Search
  • 2189 Views
  • 5 replies
  • 3 kudos
Latest Reply
BadrErraji
New Contributor III
  • 3 kudos

Why the deltaSync doesn't compute the embedding in parralel instead of sequential.That a major gap in the architecture no ? 

  • 3 kudos
4 More Replies
fede_bia
by Databricks Partner
  • 1350 Views
  • 1 replies
  • 0 kudos

Databricks Model Serving Scaling Logic

Hi everyone,I’m seeking technical clarification on how Databricks Model Serving handles request queuing and autoscaling for CPU-intensive tasks. I am deploying a custom model for text and image extraction from PDFs (using Tesseract), and I’m struggli...

  • 1350 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

TLDR: Pre-provision min_provisioned_concurrency â‰¥ your peak parallel requests (in multiples of 4) with scale-to-zero disabled, and chunk large PDFs in your model code to bound per-request latency — reactive autoscaling can't help CPU-bound workloads ...

  • 0 kudos
boskicl
by New Contributor III
  • 1775 Views
  • 4 replies
  • 1 kudos

Resolved! mlflow spark load_model fails with FMRegressor Model error on Unity Catalog

We trained a Spark ML FMRegressor model and registered it to Unity Catalog via MLflow. When attempting to load it back using mlflow.spark.load_model, we get anOSError: [Errno 5] Input/output error: '/dbfs/tmp' regardless of what dfs_tmpdir path is pa...

  • 1775 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi, This is a well-documented issue that comes down to cluster access mode and how mlflow.spark.load_model handles temporary file storage. Let me break down both problems you are hitting and provide solutions. PROBLEM 1: OSError: [Errno 5] Input/outp...

  • 1 kudos
3 More Replies
d_szepietowska
by New Contributor II
  • 1762 Views
  • 3 replies
  • 4 kudos

Why ENABLE_MLFLOW_TRACING does not work for serving endpoint?

I would like to ask you if  you have experienced similar issue like me recently. I trained sklearn model. Logged this model with fe.log_model for automatic feature lookup. Online feature tables where published with currently recommended approach, whi...

  • 1762 Views
  • 3 replies
  • 4 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 4 kudos

Hi @d_szepietowska, Thank you for the detailed investigation, especially the side-by-side comparison between the legacy online table (MySQL) and the Lakebase-backed Online Feature Store. That is very helpful for narrowing down the behavior. UNDERSTAN...

  • 4 kudos
2 More Replies
Dali1
by New Contributor III
  • 1676 Views
  • 4 replies
  • 1 kudos

Params with databricks Asset bundles

Hello,I am using Databricks Asset bundels to create jobs for machine learning pipelines.My problem is I am using SparkPython taks and defining params inside those. When the job is created it is created with some params. When I want to run the same jo...

  • 1676 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Dali1, Great questions -- parameterizing ML pipelines in DABs is something a lot of people wrestle with, so let me break down the options. THE SHORT ANSWER No, you should not have to update the job definition every time you want different paramet...

  • 1 kudos
3 More Replies
lschneid
by New Contributor II
  • 832 Views
  • 1 replies
  • 1 kudos

Replacing a Monolithic MLflow Serving Pipeline with Composed Models in Databricks

Hi everyone,I’m a senior MLE and recently joined a company where all data science and ML workloads run on Databricks. My background is mostly MLOps on Kubernetes, so I’m currently ramping up on Databricks and trying to improve the architecture of som...

  • 832 Views
  • 1 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @lschneid, This is a common architectural evolution for ML serving on Databricks, and the platform gives you several good options for decomposing a monolithic serving pipeline into cleaner, more maintainable components. Here is a breakdown of the ...

  • 1 kudos
askaditya
by New Contributor II
  • 1217 Views
  • 2 replies
  • 1 kudos
  • 1217 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @askaditya, Yes, this is possible. You can connect Microsoft Copilot Studio to a Databricks Genie space on AWS by using the Genie Conversation API and Copilot Studio's HTTP Request node (or a Power Automate cloud flow). Here is how the pieces fit ...

  • 1 kudos
1 More Replies
fede_bia
by Databricks Partner
  • 1347 Views
  • 1 replies
  • 1 kudos

Resolved! Model Serving Only Shows WARNING/ERROR Logs

Hi everyone,I’m deploying a custom model using mlflow.pyfunc.PythonModel in Databricks Model Serving. Inside my wrapper code, I configured logging as follows:logging.basicConfig( stream=sys.stdout, level=logging.INFO, format='%(asctime)s ...

  • 1347 Views
  • 1 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

@fede_bia This is worth walking through carefully. this is a common source of confusion when deploying custom models on Databricks Model Serving. SHORT ANSWER The default root logging level for Model Serving endpoints is set to WARNING. That is why y...

  • 1 kudos
Bodevan
by New Contributor II
  • 2574 Views
  • 1 replies
  • 1 kudos

Resolved! Import CV2 results in Fatal Error

Hey thereThe setup for getting the error can be very basic:-Start a runtime (e.g. 17.3 LTS ML with an Standard_NV36ads_A10_v5 [A10] 440 GB memory, 1GPU)- In a notebook, install the cv2 package like this:%pip install opencv-pythonThis seems to install...

  • 2574 Views
  • 1 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Bodevan, This is a common scenario when installing the standard opencv-python package on Databricks (or any headless server environment). The root cause is that opencv-python ships with GUI dependencies (Qt and X11 libraries) that are not availab...

  • 1 kudos
venkatkittu
by New Contributor
  • 825 Views
  • 1 replies
  • 0 kudos

Facing 132 error in model serving while using faiss

Hi,I have been trying to deply a rest endpoint for my application using the model serving feature, I have registered my model on the unity catalog and when trying to serve the model it is getting sucess when I removed the code related faiss and when ...

  • 825 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @venkatkittu, I can help with this. The error code 132 you are seeing is actually a Unix signal, and understanding it will point you directly to the fix. WHAT ERROR CODE 132 MEANS In Unix systems, when a worker process exits with code 132, that is...

  • 0 kudos
Vlad96
by Databricks Partner
  • 2592 Views
  • 3 replies
  • 0 kudos

My model serving endpoint is never getting created

Hello, Im trying to serve a Pyfunc Model on a databricks endpoint but for some reason is getting stuck on a pending status. It's been 4 hours since the endpoint deployment started. If I check the build logs, no error appears whatsoever #23 0.133 chan...

Vlad96_0-1748304777139.png
  • 2592 Views
  • 3 replies
  • 0 kudos
Latest Reply
ThijsBertramCZ
New Contributor II
  • 0 kudos

Have you ever found a fix?I am experiencing the same issue

  • 0 kudos
2 More Replies
excavator-matt
by Contributor III
  • 6190 Views
  • 9 replies
  • 3 kudos

Resolved! What is the most efficient way of running sentence-transformers on a Spark DataFrame column?

We're trying to run the bundled sentence-transformers library from SBert in a notebook running Databricks ML 16.4 on an AWS g4dn.2xlarge [T4] instance.However, we're experiencing out of memory crashes and are wondering what the optimal to run sentenc...

Machine Learning
memory issues
sentence-transformers
vector embeddings
  • 6190 Views
  • 9 replies
  • 3 kudos
Latest Reply
excavator-matt
Contributor III
  • 3 kudos

Also, I forgot to mention the workaround solution for the first approach. If you write to parquet in a volume, you can then convert it back to a Delta table in a later cell.Instead of thisprojects_pdf.to_delta("europe_prod_catalog.ad_hoc.project_reco...

  • 3 kudos
8 More Replies
Dali1
by New Contributor III
  • 1112 Views
  • 2 replies
  • 2 kudos

Resolved! Python environment DAB

Hello,I am building a pipeline using DAB.The first step of the dab is to deploy my library as a wheel.The pipeline is run on a shared databricks cluster.When I run the job I see that the job is not using exactly the requirements I specified but it us...

  • 1112 Views
  • 2 replies
  • 2 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 2 kudos

Hi @Dali1, +1 to @pradeep_singh, on shared clusters, tasks inherit cluster-installed libraries, so you won’t get a clean, versioned environment. Use a job cluster (new_cluster) or switch to serverless jobs with an environment per task for isolation. ...

  • 2 kudos
1 More Replies
Labels