cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

KyraHinnegan
by New Contributor II
  • 513 Views
  • 1 replies
  • 1 kudos

Resolved! Which types of model serving endpoints have health metrics available?

I am retrieving a list of model serving endpoints for my workspace via this API: https://docs.databricks.com/api/workspace/servingendpoints/listAnd then going to retrieve health metrics for each one with: https://[DATABRICKS_HOST]/api/2.0/serving-end...

  • 513 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @KyraHinnegan, I did some digging and here is what I found. Hopefully it helps you understand a bit more about what is going on. At a high level, not every endpoint type exposes infrastructure health metrics via /metrics. What you’re seeing with ...

  • 1 kudos
jayshan
by New Contributor III
  • 1050 Views
  • 4 replies
  • 3 kudos

Resolved! Generic Spark Connect ML error. The fitted or loaded model size is too big.

When I train models in the serverless environment V4 (Premium Plan), the system occasionally returns the error message listed below, especially after running the model training code multiple times. We have tried creating new serverless sessions, whic...

  • 1050 Views
  • 4 replies
  • 3 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 3 kudos

Hi @jayshan, I'm sorry for the delayed response to your question. And, thanks for the extra details and for sharing your workaround. This behaviour is tied to how Spark Connect ML works in serverless mode, rather than a traditional JVM/GC leak. On se...

  • 3 kudos
3 More Replies
tonybenzu99
by New Contributor II
  • 1646 Views
  • 2 replies
  • 3 kudos

Resolved! Is Delta Lake deeply tested in Professional Data Engineer Exam?

I wanted to ask people who have already taken the Databricks Certified Professional Data Engineer exam whether Delta Lake is tested in depth or not. While preparing, I’m currently using the Databricks Certified Professional Data Engineer sample quest...

  • 1646 Views
  • 2 replies
  • 3 kudos
Latest Reply
lucafredo
New Contributor III
  • 3 kudos

Yes, Delta Lake concepts are an important part of the Databricks Professional Data Engineer exam, but they aren’t tested in extreme depth compared to core Spark transformations and data pipeline design. The exam mainly focuses on practical understand...

  • 3 kudos
1 More Replies
jitenjha11
by Databricks Partner
  • 436 Views
  • 2 replies
  • 3 kudos

Getting error when running databricks deploy bundle command

HI all,I am trying to implement MLOps project using https://github.com/databricks/mlops-stacks repo.I have created azure databricks with Premium (+ Role-based access controls) (Click to change) and following bundle creation and deploy using uRL: http...

  • 436 Views
  • 2 replies
  • 3 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 3 kudos

This is expected behavior with mlops-stacks and not an issue with your Terraform version or the CLI. The main problem is that your Azure Databricks workspace does not have Unity Catalog enabled or assigned. The mlops-stacks templates assume Unity Cat...

  • 3 kudos
1 More Replies
kevin11
by Valued Contributor
  • 783 Views
  • 1 replies
  • 0 kudos

AutoML Deprecation?

Hi All,It looks like AutoML is set to be deprecated with the next major version (although the note isn't specific on if that's 18). I haven't seen any announcement or alert about this impending change. Did I just miss it? I know we have teams using t...

  • 783 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @kevin11 ,I guess it's their standard way of library deprecation policy. In their docs they mentioned that when a library is planned for removal, Databricks takes following steps to notify customers:So they've added those note to AutoMl docs:And y...

  • 0 kudos
sharpbetty
by New Contributor II
  • 4540 Views
  • 1 replies
  • 0 kudos

Custom AutoML pipeline: Beyond StandardScaler().

The automated notebook pipeline in an AutoML experiment applies StandardScaler to all numerical features in the training dataset as part of the PreProcessor. See below.But I want a more nuanced and varied treatment of my numeric values (e.g. I have l...

sharpbetty_0-1728884608851.png
  • 4540 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @sharpbetty  Great question! Databricks AutoML's "glass box" approach actually gives you several options to customize preprocessing beyond the default StandardScaler. Here are two practical approaches: Option A: Pre-process Features Before ...

  • 0 kudos
dkxxx-rc
by Contributor
  • 4687 Views
  • 2 replies
  • 4 kudos

Resolved! AutoML master notebook failing

I have recently been able to run AutoML successfully on a certain dataset.  But it has just failed on a second dataset of similar construction, before being able to produce any machine learning training runs or output.  The Experiments page says```Mo...

dkxxxrc_0-1740403690249.png
  • 4687 Views
  • 2 replies
  • 4 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 4 kudos

Hi @dkxxx-rc , Thanks for the detailed context. This error is almost certainly coming from AutoML’s internal handling of imbalanced data and sampling, not your dataset itself. The internal column _automl_sample_weight_0000 is created by AutoML when i...

  • 4 kudos
1 More Replies
SreeRam
by New Contributor
  • 4166 Views
  • 1 replies
  • 0 kudos

Patient Risk Score based on health history: Unable to create data folder for artifacts in S3 bucket

Hi All,we're using the below git project to build PoC on the concept of "Patient-Level Risk Scoring Based on Condition History": https://github.com/databricks-industry-solutions/hls-patient-riskI was able to import the solution into Databricks and ru...

  • 4166 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @SreeRam , here are some suggestions for you. Based on the error you're encountering with the hls-patient-risk solution accelerator, this is a common issue related to MLflow artifact access and storage configuration in Databricks. The probl...

  • 0 kudos
sangramraje
by New Contributor
  • 4615 Views
  • 1 replies
  • 1 kudos

AutoML "need to sample" not working as expected

tl; dr:When the AutoML run realizes it needs to do sampling because the driver / worker node memory is not enough to load / process the entire dataset, it fails. A sample weight column is NOT provided by me, but I believe somewhere in the process the...

sangramraje_0-1732300084616.png sangramraje_1-1732300133987.png
  • 4615 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @sangramraje , sorry for the late response.  I wanted to check in to see if this is still an issue with the latest release?  Please let me know. Cheers, Louis.

  • 1 kudos
spearitchmeta
by Contributor
  • 650 Views
  • 1 replies
  • 1 kudos

Resolved! How does Databricks AutoML handle null imputation for categorical features by default?

Hi everyone I’m using Databricks AutoML (classification workflow) on Databricks Runtime 10.4 LTS ML+, and I’d like to clarify how missing (null) values are handled for categorical (string) columns by default.From the AutoML documentation, I see that:...

  • 650 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hello @spearitchmeta , I looked internally to see if I could help with this and I found some information that will shed light on your question.   Here’s how missing (null) values in categorical (string) columns are handled in Databricks AutoML on Dat...

  • 1 kudos
MightyMasdo
by New Contributor III
  • 3849 Views
  • 3 replies
  • 7 kudos

Spark context not implemented Error when using Databricks connect

I am developing an application using databricks connect and when I try to use VectorAssembler I get the Error sc is not none Assertion Error. is there a workaround for this ?

  • 3849 Views
  • 3 replies
  • 7 kudos
Latest Reply
pibe1
New Contributor II
  • 7 kudos

Ran into exactly the same issue as @Łukasz1 After some googling, I found this SO post explaining the issue: later versions of databricks connect no longer support the SparkContext API. Our code is failing because the underlying library is trying to f...

  • 7 kudos
2 More Replies
spearitchmeta
by Contributor
  • 3724 Views
  • 4 replies
  • 3 kudos

Resolved! Data Drift & Model Comparison in Production MLOps: Handling Scale Changes with AutoML

BackgroundI'm implementing a production MLOps pipeline for part classification using Databricks AutoML. My pipeline automatically retrains models when new data arrives and compares performance with existing production models.The ChallengeI've encount...

  • 3724 Views
  • 4 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

Here are my thoughts to the questions you pose. However, it is important that you dig into the documentation to fully understand the capabilites of Lakehouse Monitoring. I will also be helpful if you deploy it to understand the mechanics of how it wo...

  • 3 kudos
3 More Replies
staskh
by Contributor
  • 2002 Views
  • 3 replies
  • 3 kudos

Resolved! Error in automl.regress

Hi,I'm running example notebook from https://docs.databricks.com/aws/en/machine-learning/automl/regression-train-api on a node with ML cluster 17.0 (includes Apache Spark 4.0.0, Scala 2.13) and getting error at from databricks import automlsummary = ...

staskh_0-1756215619847.png
  • 2002 Views
  • 3 replies
  • 3 kudos
Latest Reply
staskh
Contributor
  • 3 kudos

Ilir, greetings!Thank you for a prompt response. Unfortunately, none of the suggested solutions works. I checked with Genie:"The error occurs because databricks-automl is not available for Databricks Runtime 17.0.x. Databricks AutoML is not supported...

  • 3 kudos
2 More Replies
Aravinda
by New Contributor III
  • 1364 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks Machine Learning Practitioner Plan - DBC section unavailability

Hi Everyone,I am not able to locate any DBC folders for each course present in the machine learning practitioner plan.Earlier, we used to have DBC sections where we can access the course and lab materials.Do we have any solution to this??? Or can som...

  • 1364 Views
  • 5 replies
  • 2 kudos
Latest Reply
Aravinda
New Contributor III
  • 2 kudos

Thanks @szymon_dybczak !!

  • 2 kudos
4 More Replies
drii_cavalcanti
by New Contributor III
  • 1912 Views
  • 2 replies
  • 1 kudos

Resolved! Installing opencv-python on DBX

Hi everyone,I was wondering how I can install such a basic Python package on Databricks without running into conflict issues or downgrading to a runtime version lower than 15.Specs:The worker type is g4dn.xlarge [T4].The runtime is 16.4 LTS (includes...

  • 1912 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @drii_cavalcanti ,You encountered this issue because opencv-python depends on packages that still require numpy in version lower than 2. You need to reinstall numpy to supported version and then try once again installing library. You can do it usi...

  • 1 kudos
1 More Replies
Labels