cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Science & Machine Learning

Forum Posts

hv129
by New Contributor
  • 1356 Views
  • 1 replies
  • 0 kudos

OutOfMemoryError: CUDA out of memory on LLM Finetuning

I am trying to finetune llama2_lora model using the xTuring library, while facing this error. (batch size is 1). I am working on a cluster having 1 Worker (28 GB Memory, 4 Cores) and 1 Driver (110 GB Memory, 16 Cores). I am facing this error: OutOfMe...

  • 1356 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @hv129, The error message you’re encountering indicates that your CUDA memory is running out while trying to allocate additional memory for your model. Let’s break down the details: Total Capacity: The 15.57 GiB mentioned in the error message ...

  • 0 kudos
Amoozegar
by New Contributor II
  • 1178 Views
  • 3 replies
  • 0 kudos

Upgrading cuDNN on Databricks notebook

I'm trying to upgrade Tensorflow version from 2.8 to 2.13 on Databricks notebook that is attached to a cluster with Databricks Runtime 10.4. How can I upgrade cuDNN from 8.0 to at least 8.6 to be compatible with the Tensorflow new version?  

Machine Learning
GPU enabled clusters
Tensorflow
  • 1178 Views
  • 3 replies
  • 0 kudos
Latest Reply
Amoozegar
New Contributor II
  • 0 kudos

Hi @Kaniz_Fatma  , Thanks for your response. When I run '!conda list cudnn' on databricks notebook, I get the following error:    '/bin/bash: conda: command not found'

  • 0 kudos
2 More Replies
Amoozegar
by New Contributor II
  • 676 Views
  • 1 replies
  • 0 kudos

Error in Tensorflow training job

I upgraded Tensorflow on Databricks notebook using %pip command. Now when running the training job, I get this error: "DNN library initialization failed."

Machine Learning
GPU enabled clusters
Tensorflow
  • 676 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Amoozegar,  Check TensorFlow Version: Ensure that the TensorFlow version you upgraded to is compatible with your existing code and dependencies. Sometimes, upgrading TensorFlow can lead to compatibility issues. You might want to verify if the sp...

  • 0 kudos
User100024
by New Contributor II
  • 853 Views
  • 2 replies
  • 1 kudos

Using AutoML to predict completion dates of a project management dataset

Hello! I am fairly new to Databricks. I'm trying to do a proof of concept with AutoML in Databricks at my organization, and the dataset I am using is a project management dataset. Here's a sample: project_idmarketgeneral_contractorproject_typepermit_...

  • 853 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @User100024, Let’s break down your requirements and tackle them step by step. Reducing Completion Date Period: To understand how different factors impact the completion date, you can use regression analysis. Specifically, you want to predict t...

  • 1 kudos
1 More Replies
iago_gonzalez
by New Contributor III
  • 4685 Views
  • 8 replies
  • 2 kudos

Resolved! Scalable ML course error on Lab Setup (Community Edition)

Hello,I am trying to complete the exercises of the course "Scalable Machine Learning with Apache Spark" using Databricks Community Edition, but when I run the Lab Setup I get the following error:HTTPError: 503 Server Error: Service Unavailable for ur...

  • 4685 Views
  • 8 replies
  • 2 kudos
Latest Reply
AK601
New Contributor II
  • 2 kudos

I'm experiencing the same issue while using community edition for this classroom: https://github.com/databricks-academy/large-language-models. What subscription level do I upgrade to?

  • 2 kudos
7 More Replies
marcelo2108
by Contributor
  • 2142 Views
  • 4 replies
  • 0 kudos

Resolved! An error occurred while loading the model. Failed to load the pickled function from a hexadecimal

[8586fsbgpb] An error occurred while loading the model. Failed to load the pickled function from a hexadecimal string. Error: Can't get attribute 'transform_input' on <module '__main__' from '/opt/conda/envs/mlflow-env/bin/gunicorn'>.I´m using the fu...

  • 2142 Views
  • 4 replies
  • 0 kudos
Latest Reply
marcelo2108
Contributor
  • 0 kudos

However I could not progress in the end I mean because I found the error I reported in other thread as follows[5bb99fzs2f] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/ml...

  • 0 kudos
3 More Replies
david_stroud
by New Contributor II
  • 818 Views
  • 1 replies
  • 1 kudos

Resolved! Using AutoML in Azure Databricks with a shared cluster

Do you have to use AutoML in Azure Databricks on a personal compute cluster or can you use a shared cluster?Can you point me to some documentation that supports the statement that you can run AutoML on a shared cluster with Azure Databricks?

  • 818 Views
  • 1 replies
  • 1 kudos
Latest Reply
AlliaKhosla
New Contributor III
  • 1 kudos

Hi @david_stroud  Greetings! AutoML is not supported on Shared clusters. Please check the documentation below https://learn.microsoft.com/en-us/azure/databricks/machine-learning/automl/#--requirements

  • 1 kudos
cl2
by New Contributor II
  • 7377 Views
  • 5 replies
  • 1 kudos

Mlflowexception: "Connection broken: ConnectionResetError(104, \\\'Connection reset by peer\\\')"

Hello,I have a workflow running which from time to time crashes with the error:MlflowException: The following failures occurred while downloading one or more artifacts from models:/incubator-forecast-charging-demand-power-and-io-dk2/Production: {'pyt...

  • 7377 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 1 kudos
4 More Replies
acsmaggart
by New Contributor III
  • 3551 Views
  • 6 replies
  • 2 kudos

`collect()`ing Large Datasets in R

Background: I'm working on a pilot project to assess the pros and cons of using DataBricks to train models using R. I am using a dataset that occupies about 5.7GB of memory when loaded into a pandas dataframe. The data are stored in a delta table in ...

collecting the data using pyspark collecting the data using R
  • 3551 Views
  • 6 replies
  • 2 kudos
Latest Reply
Annapurna_Hiriy
New Contributor III
  • 2 kudos

@acsmaggart Please try using collect_larger() to collect the larger dataset. This should work. Please refer to the following document for more info on the library.https://medium.com/@NotZacDavies/collecting-large-results-with-sparklyr-8256a0370ec6

  • 2 kudos
5 More Replies
fa
by New Contributor III
  • 1522 Views
  • 2 replies
  • 5 kudos

How can I view the storage space taken by a registered model using MLFlow?

The information viewed about the registered models on the Models tab is very minimal. Just showing the tags we pass in and version information. How can I get more details about the model such as the size on disk?

  • 1522 Views
  • 2 replies
  • 5 kudos
Latest Reply
Octavian1
Contributor
  • 5 kudos

Hi,I have used the MLFlow client, but I am not sure where to find the size of the model image.The response to client.search_registered_models() I am getting is the following:<RegisteredModel: aliases={}, creation_timestamp=17061..., description='', l...

  • 5 kudos
1 More Replies
run480
by New Contributor II
  • 6677 Views
  • 7 replies
  • 0 kudos

Resolved! Model serving endpoint requires workspace-access entitlement?

Hi all, is anyone getting status 403 when requesting a model serving endpoint with error message "This API is disabled for users without the workspace-access entitlement"? I am accessing my model serving endpoint with a service principal access token...

Machine Learning
Model serving
  • 6677 Views
  • 7 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 0 kudos
6 More Replies
PSK017
by New Contributor
  • 853 Views
  • 2 replies
  • 0 kudos

Loading Pre-trained Models in Databricks

Hello talented members of the community,I'm a very new Databricks user so please bear with me.I'm building a description matcher which uses a pre-trained model (universal-sentence-encoder). How can I load and use this model in my Databricks python no...

  • 853 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 0 kudos
1 More Replies
Anil_M
by New Contributor II
  • 2733 Views
  • 8 replies
  • 0 kudos

TypeError: 'JavaPackage' object is not callable

Hi Team,I am facing issue with above error while I am trying to do BERT embeddings, by specifying the model path and it is giving error while downloading the model.spark version is 3.3.0Can any one of you help me on this?

  • 2733 Views
  • 8 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 0 kudos
7 More Replies
cl2
by New Contributor II
  • 1418 Views
  • 1 replies
  • 0 kudos

Editing posts

Hey,I have a post that I would like to edit. However, when I use the drop down menu there is no possibility to edit or delete the post - can someone help?

  • 1418 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @cl2, I understand that you're experiencing difficulty editing or deleting a post on the Databricks community platform. To assist you further, could you please provide me with the details of the post you're looking to modify, along with the specif...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels