cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kevin11
by Valued Contributor
  • 29 Views
  • 1 replies
  • 0 kudos

AutoML Deprecation?

Hi All,It looks like AutoML is set to be deprecated with the next major version (although the note isn't specific on if that's 18). I haven't seen any announcement or alert about this impending change. Did I just miss it? I know we have teams using t...

  • 29 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @kevin11 ,I guess it's their standard way of library deprecation policy. In their docs they mentioned that when a library is planned for removal, Databricks takes following steps to notify customers:So they've added those note to AutoMl docs:And y...

  • 0 kudos
sharpbetty
by New Contributor II
  • 3793 Views
  • 1 replies
  • 0 kudos

Custom AutoML pipeline: Beyond StandardScaler().

The automated notebook pipeline in an AutoML experiment applies StandardScaler to all numerical features in the training dataset as part of the PreProcessor. See below.But I want a more nuanced and varied treatment of my numeric values (e.g. I have l...

sharpbetty_0-1728884608851.png
  • 3793 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @sharpbetty  Great question! Databricks AutoML's "glass box" approach actually gives you several options to customize preprocessing beyond the default StandardScaler. Here are two practical approaches: Option A: Pre-process Features Before ...

  • 0 kudos
dkxxx-rc
by Contributor
  • 3778 Views
  • 2 replies
  • 4 kudos

Resolved! AutoML master notebook failing

I have recently been able to run AutoML successfully on a certain dataset.  But it has just failed on a second dataset of similar construction, before being able to produce any machine learning training runs or output.  The Experiments page says```Mo...

dkxxxrc_0-1740403690249.png
  • 3778 Views
  • 2 replies
  • 4 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 4 kudos

Hi @dkxxx-rc , Thanks for the detailed context. This error is almost certainly coming from AutoML’s internal handling of imbalanced data and sampling, not your dataset itself. The internal column _automl_sample_weight_0000 is created by AutoML when i...

  • 4 kudos
1 More Replies
SreeRam
by New Contributor
  • 3506 Views
  • 1 replies
  • 0 kudos

Patient Risk Score based on health history: Unable to create data folder for artifacts in S3 bucket

Hi All,we're using the below git project to build PoC on the concept of "Patient-Level Risk Scoring Based on Condition History": https://github.com/databricks-industry-solutions/hls-patient-riskI was able to import the solution into Databricks and ru...

  • 3506 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @SreeRam , here are some suggestions for you. Based on the error you're encountering with the hls-patient-risk solution accelerator, this is a common issue related to MLflow artifact access and storage configuration in Databricks. The probl...

  • 0 kudos
sangramraje
by New Contributor
  • 3902 Views
  • 1 replies
  • 1 kudos

AutoML "need to sample" not working as expected

tl; dr:When the AutoML run realizes it needs to do sampling because the driver / worker node memory is not enough to load / process the entire dataset, it fails. A sample weight column is NOT provided by me, but I believe somewhere in the process the...

sangramraje_0-1732300084616.png sangramraje_1-1732300133987.png
  • 3902 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @sangramraje , sorry for the late response.  I wanted to check in to see if this is still an issue with the latest release?  Please let me know. Cheers, Louis.

  • 1 kudos
spearitchmeta
by Contributor
  • 185 Views
  • 1 replies
  • 1 kudos

Resolved! How does Databricks AutoML handle null imputation for categorical features by default?

Hi everyone I’m using Databricks AutoML (classification workflow) on Databricks Runtime 10.4 LTS ML+, and I’d like to clarify how missing (null) values are handled for categorical (string) columns by default.From the AutoML documentation, I see that:...

  • 185 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hello @spearitchmeta , I looked internally to see if I could help with this and I found some information that will shed light on your question.   Here’s how missing (null) values in categorical (string) columns are handled in Databricks AutoML on Dat...

  • 1 kudos
MightyMasdo
by New Contributor III
  • 3331 Views
  • 3 replies
  • 7 kudos

Spark context not implemented Error when using Databricks connect

I am developing an application using databricks connect and when I try to use VectorAssembler I get the Error sc is not none Assertion Error. is there a workaround for this ?

  • 3331 Views
  • 3 replies
  • 7 kudos
Latest Reply
pibe1
New Contributor II
  • 7 kudos

Ran into exactly the same issue as @Łukasz1 After some googling, I found this SO post explaining the issue: later versions of databricks connect no longer support the SparkContext API. Our code is failing because the underlying library is trying to f...

  • 7 kudos
2 More Replies
spearitchmeta
by Contributor
  • 1259 Views
  • 4 replies
  • 3 kudos

Resolved! Data Drift & Model Comparison in Production MLOps: Handling Scale Changes with AutoML

BackgroundI'm implementing a production MLOps pipeline for part classification using Databricks AutoML. My pipeline automatically retrains models when new data arrives and compares performance with existing production models.The ChallengeI've encount...

  • 1259 Views
  • 4 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

Here are my thoughts to the questions you pose. However, it is important that you dig into the documentation to fully understand the capabilites of Lakehouse Monitoring. I will also be helpful if you deploy it to understand the mechanics of how it wo...

  • 3 kudos
3 More Replies
staskh
by New Contributor III
  • 1076 Views
  • 3 replies
  • 3 kudos

Resolved! Error in automl.regress

Hi,I'm running example notebook from https://docs.databricks.com/aws/en/machine-learning/automl/regression-train-api on a node with ML cluster 17.0 (includes Apache Spark 4.0.0, Scala 2.13) and getting error at from databricks import automlsummary = ...

staskh_0-1756215619847.png
  • 1076 Views
  • 3 replies
  • 3 kudos
Latest Reply
staskh
New Contributor III
  • 3 kudos

Ilir, greetings!Thank you for a prompt response. Unfortunately, none of the suggested solutions works. I checked with Genie:"The error occurs because databricks-automl is not available for Databricks Runtime 17.0.x. Databricks AutoML is not supported...

  • 3 kudos
2 More Replies
Aravinda
by New Contributor III
  • 919 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks Machine Learning Practitioner Plan - DBC section unavailability

Hi Everyone,I am not able to locate any DBC folders for each course present in the machine learning practitioner plan.Earlier, we used to have DBC sections where we can access the course and lab materials.Do we have any solution to this??? Or can som...

  • 919 Views
  • 5 replies
  • 2 kudos
Latest Reply
Aravinda
New Contributor III
  • 2 kudos

Thanks @szymon_dybczak !!

  • 2 kudos
4 More Replies
drii_cavalcanti
by New Contributor III
  • 1014 Views
  • 2 replies
  • 1 kudos

Resolved! Installing opencv-python on DBX

Hi everyone,I was wondering how I can install such a basic Python package on Databricks without running into conflict issues or downgrading to a runtime version lower than 15.Specs:The worker type is g4dn.xlarge [T4].The runtime is 16.4 LTS (includes...

  • 1014 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @drii_cavalcanti ,You encountered this issue because opencv-python depends on packages that still require numpy in version lower than 2. You need to reinstall numpy to supported version and then try once again installing library. You can do it usi...

  • 1 kudos
1 More Replies
dbuser24
by Contributor
  • 3826 Views
  • 14 replies
  • 12 kudos

Resolved! ML experiment giving error - RESOURCE_DOES_NOT_EXIST

Followed the below documentation to create a ML experiment - https://docs.databricks.com/aws/en/mlflow/experimentsI created an experiment using the databricks console, then tried running the below code but getting error - getting error - RESOURCE_DOE...

  • 3826 Views
  • 14 replies
  • 12 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 12 kudos

can you mark your own post as a solution as well @dbuser24? (would be useful for the additional steps)Appreciate you feeding back your findings.Congrats on getting it working.All the best,BS

  • 12 kudos
13 More Replies
elisabethfalck
by New Contributor
  • 541 Views
  • 1 replies
  • 0 kudos

Forecasting serverless can write predicitons, compute cluster cannot ???

Hi! I have something I don't understand.... I used automl forecasting (serverless) to train a model and marked my schema edw_forecasting as output database where it saved the predictions of my best model. Awesome.However, when I try to do automl fore...

elisabethfalck_0-1754142475527.png elisabethfalck_1-1754142546110.png
  • 541 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 0 kudos

Did you contact your account team? @elisabethfalck  Also as per the error: can you make 5 max worker nodes?  

  • 0 kudos
Sri2025
by New Contributor
  • 943 Views
  • 1 replies
  • 0 kudos

Not able to run end to end ML project on Databricks Trial

I started using Databricks trial version from today. I want to explore full end to end ML lifecycle on the databricks. I observed for the compute only 'serverless' option is available. I was trying to execute the notebook posted on https://docs.datab...

  • 943 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

I can take up to 15 minutes for the serving endpoint to be created. Once you initiate the "create endpoint" chunk of code go and grab a cup of coffee and wait 15 minutes.  Then, before you use it verify it is running (bottom left menu "Serving") by g...

  • 0 kudos
cmd0160
by New Contributor
  • 1109 Views
  • 1 replies
  • 0 kudos

Interactive EDA task in a Job Workflow

I am trying to configure an interactive EDA task as part of a job workflow. I'd like to be able to trigger a workflow, perform some basic analysis then proceed to a subsequent task. I haven't had any success freezing execution. Also, the job workflow...

  • 1109 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @cmd0160, Freezing job execution to perform interactive tasks directly within a job workflow is not natively supported in Databricks. The job workflow UI and the notebook UI serve different purposes, and the interactive capabilities you find in...

  • 0 kudos
Labels