cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sanjay
by Valued Contributor II
  • 43502 Views
  • 2 replies
  • 1 kudos

Resolved! torch.cuda.OutOfMemoryError: CUDA out of memory

Hi,I am using pynote/whisper large model and trying to process data using spark UDF and getting following error.torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 14.76 GiB total capacity; 6.07 GiB already allocated...

  • 43502 Views
  • 2 replies
  • 1 kudos
Latest Reply
JMTech18
New Contributor II
  • 1 kudos

Try to run these codesimport torchtorch.cuda.empty_cache()And make sure to find the optimize batch size otherwise the error can occur again

  • 1 kudos
1 More Replies
Tingting
by New Contributor III
  • 1459 Views
  • 2 replies
  • 0 kudos

Error on Workflow: Failure to initialize configuration for storage account

I have set up a workflow with a sequence of jobs. Each job run fine in an interactive mode, that is, run the notebook directly. However, when I tried to run the workflow, it got error on a step which uses a function from a Repo. the error says "Failu...

  • 1459 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Tingting ,It seems that when you run notebook interactively, your personal credentials are used to access ADLS.When the workflow job is run, Databricks uses different context. Could you share whether your job is accessing some storage account, an...

  • 0 kudos
1 More Replies
Yairama
by New Contributor III
  • 1800 Views
  • 1 replies
  • 0 kudos

Resolved! Mlflow not saving flavor correctly

Hello!Im trying to save my model with mlflow in databricks, it is a xgboost model, when I save it using code it saves with a sklearn flavor and not saves other parameters, also I'm using kedro with kedro-mlflow plugin.def log_metrics_and_model(model,...

  • 1800 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yairama
New Contributor III
  • 0 kudos

Hello!It was the magic of all porpoise clusters, just restart the cluster and done x.x

  • 0 kudos
Visakh_Vijayan
by Databricks Partner
  • 3272 Views
  • 5 replies
  • 2 kudos

Resolved! Uninstall whl file from databricks cluster via CLI

Hello, we have a need to uninstall older versions of whl files from personal cluster via databricks CLI - could you please provide the exact command to be used here. We tried with many found on the documentations but none of them worked to do the act...

  • 3272 Views
  • 5 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @Visakh_Vijayan ,Did you try to use databricks libraries uninstall? It's exactly crafted for this purposedatabricks libraries uninstall --json YOUR_JSON_WITH_REQUEST_BODYAlso, when you uninstall a library from a cluster, the library is removed onl...

  • 2 kudos
4 More Replies
NielsMH
by New Contributor III
  • 9219 Views
  • 2 replies
  • 1 kudos

problem switching profile when using databricks cli

HiI have installed databricks CLI and have created some different profiles, and havnt had any problems until now. When i try to use a specific profile with my commands using the --profile flag, fx "databricks secrets list-scopes --profile prod" i enc...

  • 9219 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @NielsMH ,By default, the Databricks CLI looks for the .databrickscfg file in your ~ (your user home) folder on Unix. You can try to delete this file and run configuration process again.You can also use describe command to check what credentials a...

  • 1 kudos
1 More Replies
dzhou
by New Contributor
  • 1415 Views
  • 2 replies
  • 0 kudos

machine learning compute cluster?

can the community edition be allowed to create a machine learning compute cluster?

  • 1415 Views
  • 2 replies
  • 0 kudos
Latest Reply
michael569gardn
New Contributor III
  • 0 kudos

No, most of the platforms, including services like Azure and Databricks, are not able to allow the creation of a machine learning compute cluster on this community edition. Most of these editions allow only basic features and resources enough to expl...

  • 0 kudos
1 More Replies
163050
by New Contributor II
  • 4447 Views
  • 3 replies
  • 0 kudos

Error installing datasets needed for LLM course

I signed up for this course via Databricks Academy : LLMs: Application through Production However I am getting this error when trying to download the needed datasets for the course:Installing datasets:| from "wasbs://courseware@dbacademy.blob.core.wi...

163050_0-1696757589160.png
  • 4447 Views
  • 3 replies
  • 0 kudos
Latest Reply
david_for_db
Databricks Partner
  • 0 kudos

You would need to install the python library. You can either:1) Run %pip install datasets2) Put it as part of the PyPi packages to load in your cluster This should solve your issue

  • 0 kudos
2 More Replies
Noura_azza
by New Contributor II
  • 2137 Views
  • 2 replies
  • 0 kudos

AutoML split with dt column not working properly

I am using AutoML and want to split my data to train/validation and test  using a dt column (one date for train one different date for validation and a third date for test. The problem that the autoML fails, there are only training metrics (no valiat...

  • 2137 Views
  • 2 replies
  • 0 kudos
Latest Reply
maggiewang
Databricks Employee
  • 0 kudos

Hello! Did you try specify a column name as manual split column?  Then you can fully control which rows are in train / validate / test splits: https://docs.databricks.com/en/machine-learning/automl/automl-data-preparation.html#split-data-for-regressi...

  • 0 kudos
1 More Replies
Ariane
by New Contributor II
  • 3421 Views
  • 3 replies
  • 0 kudos

Error using score_batch for batch inference

Hey everybody,I have been learning to use the Databricks feature store and I was trying to train the model using the stored features and compute batch inference. I am getting an error though, running prediction using score_batch, I have been getting ...

Ariane_0-1692892706534.png
  • 3421 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ariane
New Contributor II
  • 0 kudos

Hey @Kumaran, I am using a Random forest classifier however I have tried to set the max depth to none since it is the default value but the error still exists. 

  • 0 kudos
2 More Replies
yorabhir
by New Contributor III
  • 1177 Views
  • 0 replies
  • 0 kudos

ModuleNotFoundError: No module named 'model_train' when using mlflow.sklearn.load_model

Hello,I have multiple versions of a model registered in model registry. When I am trying to load any other version except model version 1 by mlflow.sklearn.load_model(f"models:/{model_name}/{model_version}")I am getting ModuleNotFoundError: No module...

  • 1177 Views
  • 0 replies
  • 0 kudos
ukaplan
by New Contributor III
  • 841 Views
  • 0 replies
  • 0 kudos

Serving Endpoint Container Image Creation Fails

Hello, yesterday I send this message but I guess some AI flagging tool or non-technical moderator thought error logs are spam so no one could see my message. Thus, I am restating my problem without error logs this time.Essentially, after I train my m...

  • 841 Views
  • 0 replies
  • 0 kudos
Quinten
by Databricks Partner
  • 1928 Views
  • 2 replies
  • 0 kudos

TrainingSet schema difference during training and inference

Hi,I'm using the Feature Store to train an ml model and log it using MLflow and FeatureStoreClient(). This model is then used for inference.I understand the schema of the TrainingSet should not differ between training time and inference time. However...

  • 1928 Views
  • 2 replies
  • 0 kudos
Latest Reply
KumaranT
Databricks Employee
  • 0 kudos

Hi  @Quinten,You can consider creating a custom feature group to store the "weight" column during training. This way, you can keep the schema of the TrainingSet consistent between training and inference time.Here are the steps you can follow:Create a...

  • 0 kudos
1 More Replies
MohsenJ
by Databricks Partner
  • 2111 Views
  • 2 replies
  • 0 kudos

FeatureEngineeringClient failing to run inference with mlflow.spark flavor

I am using Databricks FeatureEngineeringClient to log my spark.ml model for batch inference. I use the ALS model on the movielens dataset. My dataset has three columns: user_id, item_id and rankhere is my code to prepare the dataset:fe_data = fe.crea...

MohsenJ_0-1723641930280.png
  • 2111 Views
  • 2 replies
  • 0 kudos
Latest Reply
MohsenJ
Databricks Partner
  • 0 kudos

@KumaranT I did it already with the same result import mlflow.pyfunc # Load the model as a PyFuncModel model = mlflow.pyfunc.load_model(model_uri=f"{model_version_uri}") # Create a Spark UDF for scoring predict_udf = mlflow.pyfunc.spark_udf(spark, ...

  • 0 kudos
1 More Replies
HappyScientist
by New Contributor
  • 3793 Views
  • 1 replies
  • 0 kudos

Received Fatal error: The Python kernel is unresponsive.

I am running a databricks job on a cluster and I keep running into the following issue (pasted below in bold) The job trains a machine learning model on a modestly sized dataset (~ half GB). Note that I use pandas dataframes for the data, sklearn for...

  • 3793 Views
  • 1 replies
  • 0 kudos
Latest Reply
KumaranT
Databricks Employee
  • 0 kudos

Hi @HappyScientist,Can you increase the memory size of your cluster and try again?

  • 0 kudos
Labels