Machine Learning

by Santhanalakshmi • New Contributor II

07-13-2022 10:25:17 PM

1458 Views
3 replies
0 kudos

Throwing IndexoutofBound Exception in Pyspark

Hello All,I am trying to read the data and trying to group the data in order to pass it to predict function via @F.pandas_udf method.#Loading Model pkl_model = pickle.load(open(filepath,'rb')) # build schema for output labels filter_schema=[] ...

Machine Learning

Reply

1458 Views
3 replies
0 kudos

07-13-2022 10:25:17 PM

View Replies

Latest Reply

Vindhya
New Contributor II

04-18-2023 1:30:14 PM

0 kudos

@Santhanalakshmi Manoharan Was this issue resolved, Am also getting same error, any guidance would be of great help.Appreciate your help.

0 kudos

04-18-2023 1:30:14 PM

2 More Replies

by its-kumar • New Contributor III

04-14-2023 12:46:55 AM

2846 Views
2 replies
0 kudos

MLFlow Remote model registry connection is not working in Databricks

Dear community,I am having multiple Databricks workspaces in my azure subscription, and I have one central workspace. I want to use the central workspace for model registry and experiments tracking from the multiple other workspaces.So, If I am train...

Machine Learning

Reply

2846 Views
2 replies
0 kudos

04-14-2023 12:46:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 2:22:05 AM

0 kudos

@Kumar Shanu :The error you are seeing (API request to endpoint /api/2.0/mlflow/runs/create failed with error code 404 != 200) suggests that the API endpoint you are trying to access is not found. This could be due to several reasons, such as incorr...

0 kudos

04-18-2023 2:22:05 AM

1 More Replies

by Spencer_Kent • New Contributor III

03-21-2023 9:39:06 PM

1167 Views
2 replies
1 kudos

Resolved! Lacking support for column-level select grants or attribute-based access control

In the Unity Catalog launch and its accompanying blog post, one of the primary selling points was a set of granular access control features that would at least partially eliminate the need to create a multitude of separate table views and the attenda...

Machine Learning

Reply

1167 Views
2 replies
1 kudos

03-21-2023 9:39:06 PM

View Replies

Latest Reply

Spencer_Kent
New Contributor III

04-17-2023 10:08:30 PM

1 kudos

Simply amazing that 2 years on from the initial announcement, this feature is not available. You released Unity Catalog missing one of it's most-hyped features.

1 kudos

04-17-2023 10:08:30 PM

1 More Replies

by karthik_p • Esteemed Contributor

04-05-2023 5:19:53 AM

1688 Views
6 replies
2 kudos

when we are trying to create folder/file or list file using dbutils we are getting forbidden error in aws

HI Team,we have created new premium workspace with custom managed vpc, workspace deployed successfully in AWS. we are trying to create folder in dbfs, we are getting below error. we have compared cross account custom managed role (Customer-managed VP...

Machine Learning

Reply

1688 Views
6 replies
2 kudos

04-05-2023 5:19:53 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-17-2023 1:45:37 PM

2 kudos

@Debayan Mukherjee Issue resolved, looks cloud team have not updated required security groups that has been shared, after revisiting them we are able to find missing security groups and added them

2 kudos

04-17-2023 1:45:37 PM

5 More Replies

by ammarchalifah • New Contributor

04-11-2023 9:30:35 AM

2055 Views
1 replies
0 kudos

DeltaFileNotFoundException in a multi cluster conflict

I have several parallel data pipeline running in different Airflow DAGs. All of these pipeline execute two dbt selectors in a dedicated Databricks cluster: one of them is a common selector executed in all DAGs. This selector includes a test that is d...

Machine Learning

Reply

2055 Views
1 replies
0 kudos

04-11-2023 9:30:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-16-2023 12:15:47 AM

0 kudos

@Ammar Ammar :The error message you're seeing suggests that the Delta Lake transaction log for the common model's test table has been truncated or deleted, either manually or due to the retention policies set in your cluster. This can happen if the ...

0 kudos

04-16-2023 12:15:47 AM

by DK • New Contributor II

10-03-2022 11:54:56 PM

885 Views
1 replies
1 kudos

Unable to call logged ML model from a different notebook when using Spark ML

Hi, I am a R user and I am experimenting to build an ml model with R and with spark flavoured algorithms in Databricks. However, I am struggling to call a model that is logged as part of the experiment from a different notebook when I use spark flavo...

Machine Learning

Reply

885 Views
1 replies
1 kudos

10-03-2022 11:54:56 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2023 9:41:33 AM

1 kudos

@Dip Kundu :It seems like the error you are facing is related to sparklyr, which is used to interact with Apache Spark from R, and not directly related to mlflow. The error message suggests that an object could not be found, but it's not clear which...

1 kudos

04-14-2023 9:41:33 AM

by Anonymous • Not applicable

10-03-2022 4:37:04 AM

852 Views
1 replies
1 kudos

Hive Catalog DDL, describe extended returns "... n more fields" when detailing a many column array<struct<

I am using Hackolade data modelling tool to reverse engineer (using cluster connection) deployed databases and their table and view definitions.Some of our tables contain large multi-column structs, and these can only be partially described as a char...

Machine Learning

Reply

852 Views
1 replies
1 kudos

10-03-2022 4:37:04 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2023 9:38:37 AM

1 kudos

Yes, it is possible to configure the Hive Catalog in Databricks to return full descriptions of tables with large multi-column structs.One way to achieve this is to increase the value of the Hive configuration property "hive.metastore.client.record.ma...

1 kudos

04-14-2023 9:38:37 AM

by thomasm • New Contributor II

04-13-2023 4:41:26 AM

1675 Views
3 replies
1 kudos

Resolved! Online Feature Store MLflow serving problem

When I try to serve a model stored with FeatureStoreClient().log_model using the feature-store-online-example-cosmosdb tutorial Notebook, I get errors suggesting that the primary key schema is not configured properly. However, if I look in the Featur...

Machine Learning

Reply

1675 Views
3 replies
1 kudos

04-13-2023 4:41:26 AM

View Replies

Latest Reply

NandiniN
Valued Contributor II

04-14-2023 12:13:14 AM

1 kudos

Hello @Thomas Michielsen , this error seems to occur when you may have created the table yourself. You must use publish_table() to create the table in the online store. Do not manually create a database or container inside Cosmos DB. publish_table()...

1 kudos

04-14-2023 12:13:14 AM

2 More Replies

by lurban • New Contributor

01-25-2023 9:56:15 AM

629 Views
1 replies
0 kudos

CloudFilesIllegalStateException: Found mismatched event: key old_file_path doesn't have the prefix: new_file_path

My team currently uses Autoloader and Delta Live Tables to process incremental data from ADLS storage. We are needing to keep the same table and history, but switch the filepath to a different location in storage. When I test a filepath change, I rec...

Machine Learning

Reply

629 Views
1 replies
0 kudos

01-25-2023 9:56:15 AM

View Replies

Latest Reply

DD_Sharma
New Contributor III

04-14-2023 12:15:03 AM

0 kudos

Autoloader doesn't support changing the source path for running job so if you change your source path your stream fails because the source path has changed. However, if you really want to change the path you can change it by using the new checkpoint ...

0 kudos

04-14-2023 12:15:03 AM

by ryojikn • New Contributor III

01-30-2023 8:52:24 AM

2763 Views
2 replies
0 kudos

How to use spark-submit python task with the usage of --archives parameter passing a .tar.gz conda env?

We've been trying to launch a spark-submit python task using the parameter "archives", similar to that one used in Yarn.However, we've not been able to successfully make it work in databricks.We know that for our OnPrem installation we can use som...

Machine Learning

Reply

2763 Views
2 replies
0 kudos

01-30-2023 8:52:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:04:06 AM

0 kudos

@Ryoji Kuwae Neto :To use the --archives parameter with a conda environment in Databricks, you can follow these steps:1) Create a conda environment for your project and export it as a .tar.gz file:conda create --name myenv conda activate myenv conda...

0 kudos

04-10-2023 7:04:06 AM

1 More Replies

by Vish1 • New Contributor II

02-02-2023 1:39:19 AM

3245 Views
3 replies
1 kudos

pyspark: Stage failure due to One hot encoding

I am facing the below error while fitting my model. I am trying to run a model with cross validation with a pipeline inside of it. Below is the code snippet for data transformation:qd = QuantileDiscretizer(relativeError=0.01, handleInvalid="error", n...

Machine Learning

Reply

3245 Views
3 replies
1 kudos

02-02-2023 1:39:19 AM

View Replies

Latest Reply

shyam_9
Valued Contributor

04-10-2023 11:21:38 AM

1 kudos

Hi @Vishnu P, could you please share the full stack trace? Also, observe how the workers memory utilizing?

1 kudos

04-10-2023 11:21:38 AM

2 More Replies

by Cristianmarja • New Contributor

01-12-2023 6:54:17 PM

340 Views
1 replies
0 kudos

Hi everyone,Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label...

Hi everyone,Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label: name 'DoubleType' is not defined.I put the code bellow for your reference.I would like any help ab...

Machine Learning

Reply

340 Views
1 replies
0 kudos

01-12-2023 6:54:17 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 8:21:17 AM

0 kudos

@Cristian Martinez :The error you are seeing is occurring because the DoubleType class has not been imported. To fix this, add the following line to the top of your code to import DoubleType:from pyspark.sql.types import DoubleTypeThis should resolv...

0 kudos

04-10-2023 8:21:17 AM

by invalidargument • New Contributor II

01-18-2023 3:13:18 AM

473 Views
1 replies
0 kudos

Model storage requirements management

Hi.We have around 30 models in model storage that we use for batch scoring. These are created at different times by different person and on different cluster run times.Now we have run into problems that we can't de-serialize the models and use for in...

Machine Learning

Reply

473 Views
1 replies
0 kudos

01-18-2023 3:13:18 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 8:05:12 AM

0 kudos

@Jonas Lindberg :To address the issues you are facing with model serialization and versioning, I would recommend the following approach:Use MLflow to manage the lifecycle of your models, including versioning, deployment, and monitoring. MLflow is an...

0 kudos

04-10-2023 8:05:12 AM

by Cristianmarja • New Contributor

01-12-2023 6:54:44 PM

437 Views
1 replies
0 kudos

2.0 Train and Validate ML Model - Exercise / Double Type is not defined

Hi everyone,Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label: name 'DoubleType' is not defined.I would like any help about this subject.

Machine Learning

Reply

437 Views
1 replies
0 kudos

01-12-2023 6:54:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 8:00:35 AM

0 kudos

@Cristian Martinez :In Databricks, you need to import the necessary classes from the pyspark.sql.types module in order to use them in your code. To fix the NameError you're encountering with the label "name 'DoubleType' is not defined" in Exercise 2...

0 kudos

04-10-2023 8:00:35 AM

by Orianh • Valued Contributor II

01-19-2023 2:55:50 AM

1345 Views
1 replies
2 kudos

MLflow log pytorch distributed training

Hey Guys,I have few question that i hope you can help me with.I start to train pytorch model in distributed training using petastorm + Horovod like databricks suggest in docs.Q 1:I can see that each worker is train the model, but when epochs are done...

Machine Learning

Reply

1345 Views
1 replies
2 kudos

01-19-2023 2:55:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:38:33 AM

2 kudos

@orian hindi :Regarding your questions:Q1: The error message you are seeing is likely related to a segmentation fault, which can occur due to various reasons such as memory access violations or stack overflows. It could be caused by several factors,...

2 kudos

04-10-2023 7:38:33 AM

Databricks

Forum Posts

Throwing IndexoutofBound Exception in Pyspark

MLFlow Remote model registry connection is not working in Databricks

Resolved! Lacking support for column-level select grants or attribute-based access control

when we are trying to create folder/file or list file using dbutils we are getting forbidden error in aws

DeltaFileNotFoundException in a multi cluster conflict

Unable to call logged ML model from a different notebook when using Spark ML

Hive Catalog DDL, describe extended returns "... n more fields" when detailing a many column array<struct<

Resolved! Online Feature Store MLflow serving problem

CloudFilesIllegalStateException: Found mismatched event: key old_file_path doesn't have the prefix: new_file_path

How to use spark-submit python task with the usage of --archives parameter passing a .tar.gz conda env?

pyspark: Stage failure due to One hot encoding

Hi everyone,Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label...

Model storage requirements management

2.0 Train and Validate ML Model - Exercise / Double Type is not defined

MLflow log pytorch distributed training

pdb debugger on databricks

import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstim...

Query ML Endpoint with R and Curl

'error_code': 'INVALID_PARAMETER_VALUE', 'message'...

AutoMl Dataset too large