Machine Learning

by venkat09 • New Contributor III

05-18-2022 1:17:14 PM

960 Views
1 replies
1 kudos

Resolved! Running into an issue while setting up dbx locally?

Followed the documentation and facing issue while running dbx execute on all-purpose/interactive cluster, which is up and running already. Ran this command dbx execute --cluster-id=XXXXXX --job=dbx-demo-job --no-rebuild --debug. If anyone faced it ...

Machine Learning

Reply

960 Views
1 replies
1 kudos

05-18-2022 1:17:14 PM

View Replies

Latest Reply

venkat09
New Contributor III

05-18-2022 6:14:48 PM

1 kudos

before running package it as wheel before running `dbx execute` fix the issue

1 kudos

05-18-2022 6:14:48 PM

by Nithin • New Contributor II

01-04-2022 9:21:25 PM

7199 Views
14 replies
4 kudos

Resolved! How to access databricks feature store outside databricks?

We are building the feature store using databricks API. Few of the machine learning engineers are using Jupyter notebooks. Is it possible to use feature store outside databricks?

Machine Learning

Reply

7199 Views
14 replies
4 kudos

01-04-2022 9:21:25 PM

View Replies

Latest Reply

datariel
New Contributor II

04-28-2022 8:58:00 AM

4 kudos

Hi @Kaniz Fatma and @Jose Gonzalez ,turning back to the original question, and considering that one of the main benefits of the Feature Store is the removal of the online/offline skew, how could I access to the features from a client application l...

4 kudos

04-28-2022 8:58:00 AM

13 More Replies

by naveen_marthala • Contributor

05-01-2022 7:28:38 AM

2165 Views
4 replies
2 kudos

Resolved! why does the client need to have git installed for auto-logging to an mlflow server running in "--serve-artifacts" mode?

I have an mlflow server with `--serve-artifacts` and with postgres as `--backend-store-uri`. The machine(container with base image python:3.9-bullseye) running the server has git installed which is available on path. I am logging from jupyter-noteboo...

Machine Learning

Reply

2165 Views
4 replies
2 kudos

05-01-2022 7:28:38 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-18-2022 5:56:47 AM

2 kudos

Hi @Naveen Marthala , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

2 kudos

05-18-2022 5:56:47 AM

3 More Replies

by naveen_marthala • Contributor

05-02-2022 9:08:48 AM

4530 Views
2 replies
3 kudos

Resolved! How to PREVENT mlflow's autologging from logging ALL runs?

I am logging runs from jupyter notebook. the cells which has `mlflow.sklearn.autlog()` behaves as expected. but, the cells which has .fit() method being called on sklearn's estimators are also being logged as runs without explicitly mentioning `mlflo...

Machine Learning

Reply

4530 Views
2 replies
3 kudos

05-02-2022 9:08:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-18-2022 5:35:40 AM

3 kudos

https://apkmiz.com/showbox-apk-old-latest-version-android/

3 kudos

05-18-2022 5:35:40 AM

1 More Replies

by Direo • Contributor

04-25-2022 3:34:14 AM

5262 Views
4 replies
2 kudos

Resolved! xgboost 1.5.1 gives 'XGBModel' object has no attribute 'enable_categorical' error

Should I pip install xgboost==1.4.2. (the last version it worked) or is there a better way to solve it having in mind that this solution might cause problems later if this version of xgboost is not supported on future python versions.

Machine Learning

Reply

5262 Views
4 replies
2 kudos

04-25-2022 3:34:14 AM

View Replies

Latest Reply

Direo
Contributor

05-16-2022 2:51:30 AM

2 kudos

Hi, @Kaniz Fatma. No, I have found a solution. Needed to retrain models using new version of xgboost.

2 kudos

05-16-2022 2:51:30 AM

3 More Replies

by mradassaad • New Contributor III

05-03-2022 9:44:08 AM

2008 Views
4 replies
1 kudos

Resolved! Tuning `CrossValidator` spark job performance

I am running a 3-fold cross validation of an ML pipeline that utilizes `GBTClassifier` as the final step. It takes 18 hours to run and I am looking for feedback into how to improve the performance as I expect this to go faster.For context here is the...

Machine Learning

Reply

2008 Views
4 replies
1 kudos

05-03-2022 9:44:08 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-13-2022 3:26:29 AM

1 kudos

Hi @Assaad Mrad , Just a friendly follow-up. Do you still need help, or @Chris Chalcraft 's response help you to find the solution? Please let us know.

1 kudos

05-13-2022 3:26:29 AM

3 More Replies

by Mr__E • Contributor II

05-11-2022 10:01:56 AM

398 Views
0 replies
0 kudos

Custom AutoML evaluation metric for ranking model

I built a model which is used for ranking and I have a notebook that takes that model to generate rankings and then uses a UDF-based metric to evaluate those rankings. Is there any way that I can have this ranking / UDF be used during the AutoML trai...

Machine Learning

Reply

398 Views
0 replies
0 kudos

05-11-2022 10:01:56 AM

by findinpath • Contributor

04-26-2022 10:33:40 PM

2780 Views
9 replies
5 kudos

How to mount s3 bucket in community edition cluster?

I'm using Databricks Community Edition for testing purposes on a OSS project.I'm spinning up the cluster automatically through Databricks Clusters API.The automated tests rely on AWS S3 infrastructure, reason why I need to mount the S3 bucket on the ...

Machine Learning

Reply

2780 Views
9 replies
5 kudos

04-26-2022 10:33:40 PM

View Replies

Latest Reply

findinpath
Contributor

05-09-2022 4:09:42 AM

5 kudos

I haven't found any solution.I'm assuming that currently my only option is the usage of Databricks Enterprise to model scenarios involving the mounting of object storage buckets.

5 kudos

05-09-2022 4:09:42 AM

8 More Replies

by Wayne • New Contributor III

04-27-2022 9:05:44 AM

828 Views
2 replies
3 kudos

Question Submitted How to tune a job to avoid paying extra cost for EXPAND DISK? Is it due to the shuffle or data skew? Is there a way to configure the workers with larger disk? If not having EXPAND DISK, it will fail since no space left on the disk.

Machine Learning

Reply

828 Views
2 replies
3 kudos

04-27-2022 9:05:44 AM

View Replies

Latest Reply

Wayne
New Contributor III

05-05-2022 6:54:43 AM

3 kudos

No error, just seeing the EXPAND DISK in cluster event logs. This is just a regular spark application. I am not sure if the cloud storage matters - a spark application uses it as input and output.

3 kudos

05-05-2022 6:54:43 AM

1 More Replies

by romanzdk • New Contributor II

04-28-2022 4:32:17 AM

1333 Views
1 replies
0 kudos

Databricks online store - Login to Azure SQL Database with Service Principal

I want to use Databricks Online Store with Azure SQL Database, however I am unable to autenthicate through Databricks Feature Store API. I need to use Service Principal credentials.I tried using Application ID as username and Secret as password, but ...

Machine Learning

Reply

1333 Views
1 replies
0 kudos

04-28-2022 4:32:17 AM

View Replies

Latest Reply

romanzdk
New Contributor II

05-02-2022 1:13:45 AM

0 kudos

no one?

0 kudos

05-02-2022 1:13:45 AM

by mhansinger • New Contributor II

02-25-2022 4:07:47 AM

11044 Views
6 replies
1 kudos

Resolved! Set default "spark.driver.maxResultSize" from the notebook

Hello,I would like to set the default "spark.driver.maxResultSize" from the notebook on my cluster. I know I can do that in the cluster settings, but is there a way to set it by code?I also know how to do it when I start a spark session, but in my ca...

Machine Learning

Reply

11044 Views
6 replies
1 kudos

02-25-2022 4:07:47 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2022 9:37:53 AM

1 kudos

Hi @Maximilian Hansinger Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Thanks!

1 kudos

04-28-2022 9:37:53 AM

5 More Replies

by EdoardoVivo • New Contributor

04-27-2022 12:15:34 AM

1010 Views
0 replies
0 kudos

Pymc3 on Databricks: Progress bar

Hello everybody..I am trying to run pymc3 models on Databricks (runtime 9.1) and when I start the sampling process, the progress bar is not showing. It is a bit annoying since this way I do not have any information on when the process is going to end...

Machine Learning

Reply

1010 Views
0 replies
0 kudos

04-27-2022 12:15:34 AM

by Vijeth • New Contributor II

04-20-2022 1:35:32 AM

2649 Views
2 replies
2 kudos

Resolved! How to deploy or create mlflow model as docker image with REST api endpoint within databricks?

Is it possible to create mlflow model as a docker image with REST api endpoint and use it for inferencing within databricks or hosting the image in azure container instances?

Machine Learning

Reply

2649 Views
2 replies
2 kudos

04-20-2022 1:35:32 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-26-2022 2:50:21 AM

2 kudos

Hi @Vijeth Moudgalya , Just a friendly follow-up. Did you follow @Bilal Aslam 's suggestion? Please let us know.

2 kudos

04-26-2022 2:50:21 AM

1 More Replies

by Vik1 • New Contributor II

01-21-2022 9:16:42 AM

2260 Views
4 replies
2 kudos

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

My setup:Worker type: Standard_D32d_v4, 128 GB Memory, 32 Cores, Min Workers: 2, Max Workers: 8Driver type: Standard_D32ds_v4, 128 GB Memory, 32 CoresDatabricks Runtime Version: 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)I ran a snowflake quer...

Machine Learning

Reply

2260 Views
4 replies
2 kudos

01-21-2022 9:16:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:23:05 AM

2 kudos

Hey there @Vivek Ranjan Checking in. If Joseph's answer helped, would you let us know and mark the answer as best? It would be really helpful for the other members to find the solution more quickly.Thanks!

2 kudos

04-22-2022 7:23:05 AM

3 More Replies

by wchen • New Contributor II

03-22-2022 12:15:22 PM

5071 Views
7 replies
2 kudos

Resolved! In Databricks, the Python kafka consumer app in notebook to Confluent Cloud having the issue captured in the Body of question: SASL/PLAIN authentication being used

kafkashaded.org.apache.kafka.common.KafkaException: Failed to construct kafka consumer at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:823) at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.<init>...

Machine Learning

Reply

5071 Views
7 replies
2 kudos

03-22-2022 12:15:22 PM

View Replies

Latest Reply

bigdata70
New Contributor III

04-13-2022 11:00:11 AM

2 kudos

@Kaniz Fatma I am having the same issue.%python import pyspark.sql.functions as fn from pyspark.sql.types import StringType binary_to_string = fn.udf(lambda x: str(int.from_bytes(x, byteorder='big')), StringType()) df = spark.readStream.format("...

2 kudos

04-13-2022 11:00:11 AM

6 More Replies

Databricks

Forum Posts

Resolved! Running into an issue while setting up dbx locally?

Resolved! How to access databricks feature store outside databricks?

Resolved! why does the client need to have git installed for auto-logging to an mlflow server running in "--serve-artifacts" mode?

Resolved! How to PREVENT mlflow's autologging from logging ALL runs?

Resolved! xgboost 1.5.1 gives 'XGBModel' object has no attribute 'enable_categorical' error

Resolved! Tuning `CrossValidator` spark job performance

Custom AutoML evaluation metric for ranking model

How to mount s3 bucket in community edition cluster?

Question Submitted How to tune a job to avoid paying extra cost for EXPAND DISK? Is it due to the shuffle or data skew? Is there a way to configure the workers with larger disk? If not having EXPAND DISK, it will fail since no space left on the disk.

Databricks online store - Login to Azure SQL Database with Service Principal

Resolved! Set default "spark.driver.maxResultSize" from the notebook

Pymc3 on Databricks: Progress bar

Resolved! How to deploy or create mlflow model as docker image with REST api endpoint within databricks?

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

Resolved! In Databricks, the Python kafka consumer app in notebook to Confluent Cloud having the issue captured in the Body of question: SASL/PLAIN authentication being used

pdb debugger on databricks

import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstim...

Query ML Endpoint with R and Curl

'error_code': 'INVALID_PARAMETER_VALUE', 'message'...

AutoMl Dataset too large