Machine Learning

by Data_Cowboy • New Contributor III

03-16-2023 2:12:01 PM

950 Views
2 replies
0 kudos

Resolved! Problems with xgboost.spark model loading from MLflow.

When loading an xgboost model from mlflow following the provided instructions in Databricks hosted MLflow the input sizes I am showing on the job are over 1 TB. Is anyone else using an xgboost.spark model and noticing the same behavior? Below are som...

Machine Learning

Reply

950 Views
2 replies
0 kudos

03-16-2023 2:12:01 PM

View Replies

Latest Reply

dbx-user7354
New Contributor III

7m ago

0 kudos

Thank you very much @Data_Cowboy !!! I had the same issue. I even had 14 TiB Databricks should really fix this

0 kudos

7m ago

1 More Replies

by ml-engineer • Visitor

yesterday

25 Views
0 replies
0 kudos

while registering model I am getting error: AssertionError:

while registering model I am getting error: AssertionError:I am getting error while running the code with workflow if I running code individually with notebook then its running fine. below is the code : fe = FeatureEngineeringClient() ...

Machine Learning

Reply

25 Views
0 replies
0 kudos

yesterday

by Colombia • New Contributor II

2 weeks ago

270 Views
2 replies
1 kudos

Use OF API from package enerbitdso 0.1.8 PYPI

Hello! I have code to use an API supplied in the energitdso package (This is the repository https://pypi.org/project/enerbitdso/). I changed the code adapting it to AZURE DATABRICKS in python, but although there is a connection with the API, it does ...

Machine Learning

Reply

270 Views
2 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Colombia
New Contributor II

Tuesday

1 kudos

The owner of the package updated it to use the time out as a parameter of up to 20 seconds and updated a dependent package in DataBricks, with the above the problem was solved

1 kudos

Tuesday

1 More Replies

by re • New Contributor II

Monday

158 Views
2 replies
0 kudos

RBAC and VectorSearch

When implementing the managed VectorSearch, what is the preferred way to implement row based access control? I see that you can use the filter API during a query, so simple filters using a certain column may work, but what if all the security informa...

Machine Learning

Reply

158 Views
2 replies
0 kudos

Monday

View Replies

Latest Reply

re
New Contributor II

Tuesday

0 kudos

Thanks AI for summarizing my question. However, you did not actually answer it.

0 kudos

Tuesday

1 More Replies

by Lcsp • New Contributor

2 weeks ago

309 Views
1 replies
0 kudos

AssertionError Failed to create the catalog

getting this error when trying to setup the get-started-with-databricks-for-machine-learning LAB . Unity catalog is enabled. Validating the locally installed datasets: | listing local files...(0 seconds) | validation completed...(0 seconds total) C...

Machine Learning

Reply

309 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

PL_db
New Contributor III

Monday

0 kudos

It looks like you don't have the CREATE CATALOG privilege on the metastore you're trying to create the catalog in: Privilege types by securable object in Unity Catalog

0 kudos

Monday

by AndersenHuang • New Contributor

Friday

146 Views
0 replies
0 kudos

Spacy Retraining failure

Hello, I'm having problems trying to run my retraining notebook for a spacy model. The notebook creates a shell file with the following lines of code: cmd = f''' awk '{{sub("source = ","source = /dbfs/FileStore/{dbfs_folder}/textcat/categories...

Machine Learning

Reply

146 Views
0 replies
0 kudos

Friday

by moh3th1 • New Contributor

a week ago

110 Views
1 replies
0 kudos

Optimal Cluster Configuration for Training on Billion-Row Datasets

Hello Databricks Community,I am currently facing a challenge in configuring a cluster for training machine learning models on a dataset consisting of approximately a billion rows and 40 features. Given the volume of data, I want to ensure that the cl...

Machine Learning

Reply

110 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

Kaniz
Community Manager

a week ago

0 kudos

Hi @moh3th1 , Machine Selection: Memory (RAM): Having sufficient memory is essential for large datasets. Ensure that your machine type has enough RAM to accommodate your data.CPU: CPU power impacts data processing speed. Consider CPUs with multiple...

0 kudos

a week ago

by Anonymous • Not applicable

03-01-2022 10:01:00 AM

127884 Views
60 replies
3 kudos

Community Edition Login Issues Below is a list of troubleshooting steps for failing to login with email/password at community.cloud.databricks.com: ...

Community Edition Login Issues Below is a list of troubleshooting steps for failing to login with email/password at community.cloud.databricks.com: Troubleshooting Tips If this is your first time logging in, ensure that you did indeed sign u...

Machine Learning

Reply

127884 Views
60 replies
3 kudos

03-01-2022 10:01:00 AM

View Replies

Latest Reply

akuma67
New Contributor II

a week ago

3 kudos

Hey,I have been logged out and even the password reset email is not coming. How much time it takes to resolve?My account is ak.email86@gmail.com

3 kudos

a week ago

59 More Replies

by Shreyash • New Contributor II

a week ago

289 Views
4 replies
0 kudos

java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler

I am trying to serve a pyspark model using an endpoint. I was able to load and register the model normally. I could also load that model and perform inference but while serving the model, I am getting the following error: [94fffqts54] ERROR StatusLog...

Machine Learning

Model serving

sparknlp

Reply

289 Views
4 replies
0 kudos

a week ago

View Replies

Latest Reply

Kaniz
Community Manager

a week ago

0 kudos

Hi @Shreyash, It looks like your code is encountering a java.lang.ClassNotFoundException for the com.johnsnowlabs.nlp.DocumentAssembler class while serving your PySpark model. This error occurs when the required class is not found in the classpath. ...

0 kudos

a week ago

3 More Replies

by amal15 • New Contributor II

2 weeks ago

125 Views
1 replies
0 kudos

XGBoostEstimator is not a member of package ml.dmlc.xgboost4j.scala.spark ?

XGBoostEstimator is not a member of package ml.dmlc.xgboost4j.scala.spark ?How can I resolve this error?

Machine Learning

Reply

125 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

a week ago

0 kudos

Hi @amal15, The error message you’re encountering, “XGBoostEstimator is not a member of package ml.dmlc.xgboost4j.scala.spark,” indicates that the XGBoostEstimator class is not being recognized within the specified package. Check Dependencie...

0 kudos

a week ago

by e6exghu8 • New Contributor

a week ago

311 Views
1 replies
0 kudos

Help - org.apache.spark.SparkException: Job aborted due to stage failure: Task 47 in stage 2842.0

Hello, I am training a SparkXGBRegressor model. It runs without errors if the complexity is low, however when I increase the max_depth and/or num_parallel_tree parameters, I get an error. I checked the cluster metrics during training and it doesn't l...

Machine Learning

Reply

311 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

Kaniz
Community Manager

a week ago

0 kudos

Hi @e6exghu8, Ensure that your cluster has sufficient memory to handle the increased complexity (higher max_depth and num_parallel_tree).Check the memory configuration for your Spark executors. You might need to allocate more memory to each executor...

0 kudos

a week ago

by cmilligan • Contributor II

11-23-2022 12:43:30 PM

3147 Views
3 replies
2 kudos

Issue with Multi-column In predicates are not supported in the DELETE condition.

I'm trying to delete rows from a table with the same date or id as records in another table. I'm using the below query and get the error 'Multi-column In predicates are not supported in the DELETE condition'. delete from cost_model.cm_dispatch_consol...

Machine Learning

Reply

3147 Views
3 replies
2 kudos

11-23-2022 12:43:30 PM

View Replies

Latest Reply

shubhaskar
New Contributor II

a week ago

2 kudos

Had the same issue. Please check the subquery returned value there must be something wrong with that.

2 kudos

a week ago

2 More Replies

by AChang • New Contributor III

08-22-2023 1:38:44 PM

1899 Views
2 replies
1 kudos

How to fix this runtime error in this Databricks distributed training tutorial workbook

I am following along with this notebook found from this article. I am attempting to fine tune the model with a single node and multiple GPUs, so I run everything up to the "Run Local Training" section, but from there I skip to "Run distributed traini...

Machine Learning

Reply

1899 Views
2 replies
1 kudos

08-22-2023 1:38:44 PM

View Replies

Latest Reply

KYX
New Contributor II

2 weeks ago

1 kudos

Hi AChang, have you eventually resolved the error? I've also having the same error.

1 kudos

2 weeks ago

1 More Replies

by amal15 • New Contributor II

2 weeks ago

425 Views
2 replies
1 kudos

Resolved! import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstimator, XGBoostClassificationModel}

how i can import : import com.microsoft.ml.spark.{LightGBMClassifier,LightGBMClassificationModel}import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstimator, XGBoostClassificationModel} projet spark & scala in databricks

Machine Learning

Reply

425 Views
2 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

amal15
New Contributor II

2 weeks ago

1 kudos

XGBoostEstimator is not a member of package ml.dmlc.xgboost4j.scala.spark ?How can I resolve this error?with maven : ml.dmlc:xgboost4j-spark_2.12:2.0.3

1 kudos

2 weeks ago

1 More Replies

by chrisf_sts • New Contributor II

2 weeks ago

250 Views
0 replies
0 kudos

Extract calculations naive bayes model

I have a naive Bayes ML model that takes call attributes and predicts if the caller is going to abandon the call while they are on hold waiting to speak to an agent. The model lives in Databricks ML flow, I have it registered. What I need to do is ex...

Machine Learning

Reply

250 Views
0 replies
0 kudos

2 weeks ago

Databricks

Forum Posts

Resolved! Problems with xgboost.spark model loading from MLflow.

while registering model I am getting error: AssertionError:

Use OF API from package enerbitdso 0.1.8 PYPI

RBAC and VectorSearch

AssertionError Failed to create the catalog

Spacy Retraining failure

Optimal Cluster Configuration for Training on Billion-Row Datasets

Community Edition Login Issues Below is a list of troubleshooting steps for failing to login with email/password at community.cloud.databricks.com: ...

java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler

XGBoostEstimator is not a member of package ml.dmlc.xgboost4j.scala.spark ?

Help - org.apache.spark.SparkException: Job aborted due to stage failure: Task 47 in stage 2842.0

Issue with Multi-column In predicates are not supported in the DELETE condition.

How to fix this runtime error in this Databricks distributed training tutorial workbook

Resolved! import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstimator, XGBoostClassificationModel}

Extract calculations naive bayes model

pdb debugger on databricks

import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstim...

Query ML Endpoint with R and Curl

'error_code': 'INVALID_PARAMETER_VALUE', 'message'...

AutoMl Dataset too large