cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sridhar0109
by New Contributor
  • 1443 Views
  • 2 replies
  • 0 kudos

Tracking changes in data distribution by using pyspark

Hi All,I'm working on creating a data quality dashboard. I've created few rules like checking nulls in a column, checking for data type of the column , removing duplicates etc.We follow medallion architecture and are applying these data quality check...

  • 1443 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Sridhar Varanasi​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 0 kudos
1 More Replies
hulma
by New Contributor II
  • 1182 Views
  • 2 replies
  • 1 kudos

dbfs file reference in pyfunc model for serverless inference

Hi, I was trying to migrate model serving from classic to serverless realtime inference.My model is currently being logged as pyfunc model and part of model script is to read dbfs file for inference. Now, with serverless i have error which it not abl...

  • 1182 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Hulma Abdul Rahman​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 1 kudos
1 More Replies
Gilg
by Contributor II
  • 7886 Views
  • 1 replies
  • 0 kudos

Failed to add 1 container to the cluster. will attempt retry: false. reason: bootstrap timeout

Hi Team,When creating a new cluster in a workspace within a VNET receiving this error:Failed to add 1 container to the cluster. will attempt retry: false. reason: bootstrap timeoutCluster terminated. Reason: Bootstrap TimeoutCheers.Gil

  • 7886 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Gil Gonong​ :The error message you are receiving suggests that the creation of the new cluster has failed due to a bootstrap timeout. The bootstrap process is responsible for setting up the initial configuration of the cluster, and if it takes too l...

  • 0 kudos
isaac_gritz
by Databricks Employee
  • 5184 Views
  • 1 replies
  • 3 kudos

Resolved! Pricing on Databricks

How Pricing Works on DatabricksI highly recommend checking out this blog post on how databricks pricing works from my colleague @MENDELSOHN CHAN​Databricks has a consumption based pricing model, so you pay only for the compute you use.For interactive...

  • 5184 Views
  • 1 replies
  • 3 kudos
Latest Reply
Meag
New Contributor III
  • 3 kudos

I read the read blog you will share it helps thanks for sharing.

  • 3 kudos
Santhanalakshmi
by New Contributor II
  • 3689 Views
  • 3 replies
  • 0 kudos

Throwing IndexoutofBound Exception in Pyspark

Hello All,I am trying to read the data and trying to group the data in order to pass it to predict function via @F.pandas_udf method.#Loading Model pkl_model = pickle.load(open(filepath,'rb'))   # build schema for output labels filter_schema=[] ...

error_db error_2_db error_3_db
  • 3689 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vindhya
New Contributor II
  • 0 kudos

@Santhanalakshmi Manoharan​  Was this issue resolved, Am also getting same error, any guidance would be of great help.Appreciate your help.

  • 0 kudos
2 More Replies
its-kumar
by New Contributor III
  • 9985 Views
  • 2 replies
  • 0 kudos

MLFlow Remote model registry connection is not working in Databricks

Dear community,I am having multiple Databricks workspaces in my azure subscription, and I have one central workspace. I want to use the central workspace for model registry and experiments tracking from the multiple other workspaces.So, If I am train...

  • 9985 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Kumar Shanu​ :The error you are seeing (API request to endpoint /api/2.0/mlflow/runs/create failed with error code 404 != 200) suggests that the API endpoint you are trying to access is not found. This could be due to several reasons, such as incorr...

  • 0 kudos
1 More Replies
Spencer_Kent
by New Contributor III
  • 3013 Views
  • 2 replies
  • 1 kudos

Resolved! Lacking support for column-level select grants or attribute-based access control

In the Unity Catalog launch and its accompanying blog post, one of the primary selling points was a set of granular access control features that would at least partially eliminate the need to create a multitude of separate table views and the attenda...

  • 3013 Views
  • 2 replies
  • 1 kudos
Latest Reply
Spencer_Kent
New Contributor III
  • 1 kudos

Simply amazing that 2 years on from the initial announcement, this feature is not available. You released Unity Catalog missing one of it's most-hyped features.

  • 1 kudos
1 More Replies
karthik_p
by Esteemed Contributor
  • 4186 Views
  • 6 replies
  • 2 kudos

when we are trying to create folder/file or list file using dbutils we are getting forbidden error in aws

HI Team,we have created new premium workspace with custom managed vpc, workspace deployed successfully in AWS. we are trying to create folder in dbfs, we are getting below error. we have compared cross account custom managed role (Customer-managed VP...

  • 4186 Views
  • 6 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@Debayan Mukherjee​ Issue resolved, looks cloud team have not updated required security groups that has been shared, after revisiting them we are able to find missing security groups and added them

  • 2 kudos
5 More Replies
ammarchalifah
by New Contributor
  • 3689 Views
  • 1 replies
  • 0 kudos

DeltaFileNotFoundException in a multi cluster conflict

I have several parallel data pipeline running in different Airflow DAGs. All of these pipeline execute two dbt selectors in a dedicated Databricks cluster: one of them is a common selector executed in all DAGs. This selector includes a test that is d...

  • 3689 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ammar Ammar​ :The error message you're seeing suggests that the Delta Lake transaction log for the common model's test table has been truncated or deleted, either manually or due to the retention policies set in your cluster. This can happen if the ...

  • 0 kudos
DK
by New Contributor II
  • 2089 Views
  • 1 replies
  • 1 kudos

Unable to call logged ML model from a different notebook when using Spark ML

Hi, I am a R user and I am experimenting to build an ml model with R and with spark flavoured algorithms in Databricks. However, I am struggling to call a model that is logged as part of the experiment from a different notebook when I use spark flavo...

  • 2089 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Dip Kundu​ :It seems like the error you are facing is related to sparklyr, which is used to interact with Apache Spark from R, and not directly related to mlflow. The error message suggests that an object could not be found, but it's not clear which...

  • 1 kudos
Anonymous
by Not applicable
  • 1725 Views
  • 1 replies
  • 1 kudos

Hive Catalog DDL, describe extended returns "... n more fields" when detailing a many column array<struct<

I am using Hackolade data modelling tool to reverse engineer (using cluster connection) deployed databases and their table and view definitions.Some of our tables contain large multi-column structs, and these can only be partially described as a char...

  • 1725 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Yes, it is possible to configure the Hive Catalog in Databricks to return full descriptions of tables with large multi-column structs.One way to achieve this is to increase the value of the Hive configuration property "hive.metastore.client.record.ma...

  • 1 kudos
thomasm
by New Contributor II
  • 4885 Views
  • 3 replies
  • 1 kudos

Resolved! Online Feature Store MLflow serving problem

When I try to serve a model stored with FeatureStoreClient().log_model using the feature-store-online-example-cosmosdb tutorial Notebook, I get errors suggesting that the primary key schema is not configured properly. However, if I look in the Featur...

  • 4885 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hello @Thomas Michielsen​ , this error seems to occur when you may have created the table yourself. You must use publish_table() to create the table in the online store. Do not manually create a database or container inside Cosmos DB. publish_table()...

  • 1 kudos
2 More Replies
lurban
by New Contributor II
  • 1684 Views
  • 1 replies
  • 0 kudos

CloudFilesIllegalStateException: Found mismatched event: key old_file_path doesn't have the prefix: new_file_path

My team currently uses Autoloader and Delta Live Tables to process incremental data from ADLS storage. We are needing to keep the same table and history, but switch the filepath to a different location in storage. When I test a filepath change, I rec...

  • 1684 Views
  • 1 replies
  • 0 kudos
Latest Reply
DD_Sharma
New Contributor III
  • 0 kudos

Autoloader doesn't support changing the source path for running job so if you change your source path your stream fails because the source path has changed. However, if you really want to change the path you can change it by using the new checkpoint ...

  • 0 kudos
ryojikn
by New Contributor III
  • 5951 Views
  • 2 replies
  • 0 kudos

How to use spark-submit python task with the usage of --archives parameter passing a .tar.gz conda env?

We've been trying to launch a spark-submit python task using the parameter "archives", similar to that one used in Yarn.​However, we've not been able to successfully make it work in databricks.​​We know that for our OnPrem installation we can use som...

  • 5951 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ryoji Kuwae Neto​ :To use the --archives parameter with a conda environment in Databricks, you can follow these steps:1) Create a conda environment for your project and export it as a .tar.gz file:conda create --name myenv conda activate myenv conda...

  • 0 kudos
1 More Replies
Vish1
by New Contributor II
  • 9814 Views
  • 3 replies
  • 1 kudos

pyspark: Stage failure due to One hot encoding

I am facing the below error while fitting my model. I am trying to run a model with cross validation with a pipeline inside of it. Below is the code snippet for data transformation:qd = QuantileDiscretizer(relativeError=0.01, handleInvalid="error", n...

image
  • 9814 Views
  • 3 replies
  • 1 kudos
Latest Reply
shyam_9
Databricks Employee
  • 1 kudos

Hi @Vishnu P​, could you please share the full stack trace? Also, observe how the workers memory utilizing?

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels