cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mradassaad
by New Contributor III
  • 8076 Views
  • 3 replies
  • 1 kudos

Resolved! Tuning `CrossValidator` spark job performance

I am running a 3-fold cross validation of an ML pipeline that utilizes `GBTClassifier` as the final step. It takes 18 hours to run and I am looking for feedback into how to improve the performance as I expect this to go faster.For context here is the...

Random Forest Job Random Forest Job Summary GBT storage top half
  • 8076 Views
  • 3 replies
  • 1 kudos
Latest Reply
cchalc
Databricks Employee
  • 1 kudos

Hello @Assaad Mrad​ , So this looks like trying to decide between putting the pipeline in the cross validator or the cross validator in the pipeline. Since you are doing the polynomial expansion as part of the pipeline you might want to consider putt...

  • 1 kudos
2 More Replies
jonathanhodges
by New Contributor II
  • 5169 Views
  • 4 replies
  • 0 kudos

Training Job Failure (Driver Error)

We have a new model training job that was running fine for a few days and then started failing. I have attached images for more details.I am wondering if 'can't reach driver cluster' is a red herring. It says the driver is healthy right before execut...

  • 5169 Views
  • 4 replies
  • 0 kudos
Latest Reply
jonathanhodges
New Contributor II
  • 0 kudos

In our case, we needed to correct our dependent libraries. We had an incorrect path referenced.

  • 0 kudos
3 More Replies
nikviz
by New Contributor II
  • 2114 Views
  • 2 replies
  • 0 kudos

Resolved! Vector search index stops at 45406

I am trying to create a vector search index for a table, but it stops at 45406 rows. I can see that the writeback table has all the records but the indexing stops. Is there a hard limit on index?

  • 2114 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

There are some limits that you can be hitting: Row Size for Delta Sync Index: The maximum row size is 100KB.Embedding Source Column Size for Delta Sync Index: The maximum size is 32764 bytes.Bulk Upsert Request Size Limit for Direct Vector Index: The...

  • 0 kudos
1 More Replies
miahopman
by New Contributor II
  • 4906 Views
  • 2 replies
  • 1 kudos

AutoML Runs Failing

After the Data Exploration notebook runs successfully, all AutoML trials fail without providing a source notebook. I have ensured that the training data labels have no null values or any labels with 16 or less occurrences associated with them. I cann...

  • 4906 Views
  • 2 replies
  • 1 kudos
Latest Reply
rtreves
Contributor
  • 1 kudos

@AnNg Have there been any updates on this feature?

  • 1 kudos
1 More Replies
JoeAckerman
by New Contributor II
  • 1664 Views
  • 2 replies
  • 0 kudos

Python running far slower than locally, even with large cluster and multiple workers

I have a notebook that is running extremely slowly even when I try to do pretty basic python functions. It is running far slower than locally no matter what I try, this is in spite of using a 32gb 4 core cluster with 4-8 workers. For context, my data...

  • 1664 Views
  • 2 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Please share more information, for example: Type of data sourceType of operations being executed (sharing code if possible)Timings of local runs and Databricks runs

  • 0 kudos
1 More Replies
yopbibo
by Contributor II
  • 4479 Views
  • 3 replies
  • 5 kudos

Deploy a ML model, trained and registered in Databricks to AKS

Hi,I can train, registered a ML Model in my Datbricks Workspace.Then, to deploy it on AKS, I need to register the model in Azure ML, and then, deploy to AKS.Is it possible to skip the Azure ML step?I would like to deploy directly into my AKS instance...

  • 4479 Views
  • 3 replies
  • 5 kudos
Latest Reply
sidharthpradhan
New Contributor II
  • 5 kudos

Is it still the case, can't we serve the model in Databricks. I am new to this, so I am just wondering the capabilities.

  • 5 kudos
2 More Replies
damselfly20
by New Contributor III
  • 2499 Views
  • 2 replies
  • 1 kudos

Endpoint creation without scale-to-zero

Hi, I've got a question about deploying an endpoint for Llama 3.1 8b. The following code should create the endpoint without scale-to-zero. The endpoint is being created, but with scale-to-zero, although scale_to_zero_enabled is set to False. Instead ...

  • 2499 Views
  • 2 replies
  • 1 kudos
Latest Reply
damselfly20
New Contributor III
  • 1 kudos

Thanks for the reply @Walter_C. This didn't quite work, since it used a CPU and didn't consider the max_provisioned_throughput, but I finally got it to work like this: from mlflow.deployments import get_deploy_client client = get_deploy_client("data...

  • 1 kudos
1 More Replies
cmilligan
by Contributor II
  • 10215 Views
  • 5 replies
  • 2 kudos

Issue with Multi-column In predicates are not supported in the DELETE condition.

I'm trying to delete rows from a table with the same date or id as records in another table. I'm using the below query and get the error 'Multi-column In predicates are not supported in the DELETE condition'. delete from cost_model.cm_dispatch_consol...

  • 10215 Views
  • 5 replies
  • 2 kudos
Latest Reply
thisisthemurph
New Contributor II
  • 2 kudos

I seem to get this error on some DeltaTables and not others:df.createOrReplaceTempView("channels_to_delete") spark.sql(""" delete from lake.something.earnings where TenantId = :tenantId and ChannelId = in ( select ChannelId ...

  • 2 kudos
4 More Replies
amirA
by New Contributor II
  • 3811 Views
  • 3 replies
  • 1 kudos

Resolved! Extracting Topics From Text Data Using PySpark

Hi EveryoneI tried to follow the same steps in Topic from Text on similar data as example. However, when I tri to fit the model with data I get this error.IllegalArgumentException: requirement failed: Column features must be of type equal to one of t...

  • 3811 Views
  • 3 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @amirA ,The LDA model expects the features column to be of type Vector from the pyspark.ml.linalg module, specifically either a SparseVector or DenseVector, whereas you have provided Row type.You need to convert your Row object to SparseVector.Che...

  • 1 kudos
2 More Replies
ukaplan
by New Contributor III
  • 9680 Views
  • 15 replies
  • 2 kudos

Serving Endpoint Container Image Creation Fails

Hello, I trained a model using MLFlow, and saved the model as an artifact. I can load the model from a notebook and it works as expected (i.e. I can load the model using its URI).However, when I want to deploy it using Databricks endpoints, container...

  • 9680 Views
  • 15 replies
  • 2 kudos
Latest Reply
damselfly20
New Contributor III
  • 2 kudos

@ivan_calvo The problem still exists. Surely there has to be some other option than downgrading the ML cluster to DBR 14.3 LTS ML?

  • 2 kudos
14 More Replies
Swappatil2506
by Databricks Partner
  • 1307 Views
  • 2 replies
  • 0 kudos

I want to develop an automated lead allocation system to prospect sales representatives.

I want to develop an automated lead allocation system to prospect sales representatives. Please suggest a suitable solution also any links if available.

  • 1307 Views
  • 2 replies
  • 0 kudos
Latest Reply
Swappatil2506
Databricks Partner
  • 0 kudos

Hi jamesl,My use case is related to match the prospect sales agent for the customer entering retail store, when a customer enters a store based on the inputs provided and checking on if the customer is existing or new customer, I want to create a rea...

  • 0 kudos
1 More Replies
avishkarborkar
by New Contributor III
  • 4269 Views
  • 6 replies
  • 4 kudos
  • 4269 Views
  • 6 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

There could be multiple reasone why you're getting this error @avishkarborkar . If the course you're following requires Unity Catalog, first you need to check if you have a premium workspace. Next you need to make sure that your workspace is enabled ...

  • 4 kudos
5 More Replies
Mikkel
by New Contributor III
  • 3235 Views
  • 1 replies
  • 0 kudos

Unable to Check Experiment Existence with path starting with /Workspace/ Directory in Databricks Pla

https://github.com/mlflow/mlflow/issues/11077  In Databricks, when attempting to set an experiment with an experiment_name specified as an absolute path from /Workspace/Shared/mlflow_experiment/<experiment_name>, the mlflow.set_experiment() function ...

  • 3235 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Before setting the experiment, use mlflow.get_experiment_by_name() to check if the experiment already exists. If it does, you can set the experiment without attempting to create it again.

  • 0 kudos
sergiopolimante
by New Contributor II
  • 941 Views
  • 1 replies
  • 0 kudos

What is the best to way to not deploy/run a workflow in production?

I am building and MLOps architecture.I do not want to deploy the training workflow to prod. My first approach was to selectively not deploy the workflow to prod, but this does not seem to be possible as in this thread:https://community.databricks.com...

  • 941 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

  Target Override Feature: You can use the target override feature to specify different configurations for different environments. However, this does not provide a direct way to exclude specific job resources.Environment-Specific Folders: Another app...

  • 0 kudos
Labels