cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Science & Machine Learning

Forum Posts

danielvdc
by New Contributor
  • 45 Views
  • 0 replies
  • 0 kudos

Rolling predictions with FeatureEngineeringClient

I am performing a time series analysis, using a XGBoostRegressor with rolling predictions. I am doing so using the FeatureEngineeringClient (in combination with Unity Catalog), where I create and load in my features during training and inference, as ...

  • 45 Views
  • 0 replies
  • 0 kudos
nikviz
by New Contributor
  • 70 Views
  • 2 replies
  • 0 kudos

Resolved! Vector search index stops at 45406

I am trying to create a vector search index for a table, but it stops at 45406 rows. I can see that the writeback table has all the records but the indexing stops. Is there a hard limit on index?

  • 70 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

There are some limits that you can be hitting: Row Size for Delta Sync Index: The maximum row size is 100KB.Embedding Source Column Size for Delta Sync Index: The maximum size is 32764 bytes.Bulk Upsert Request Size Limit for Direct Vector Index: The...

  • 0 kudos
1 More Replies
miahopman
by New Contributor II
  • 2861 Views
  • 2 replies
  • 0 kudos

AutoML Runs Failing

After the Data Exploration notebook runs successfully, all AutoML trials fail without providing a source notebook. I have ensured that the training data labels have no null values or any labels with 16 or less occurrences associated with them. I cann...

  • 2861 Views
  • 2 replies
  • 0 kudos
Latest Reply
rtreves
New Contributor III
  • 0 kudos

@AnNg Have there been any updates on this feature?

  • 0 kudos
1 More Replies
JoeAckerman
by New Contributor II
  • 131 Views
  • 2 replies
  • 0 kudos

Python running far slower than locally, even with large cluster and multiple workers

I have a notebook that is running extremely slowly even when I try to do pretty basic python functions. It is running far slower than locally no matter what I try, this is in spite of using a 32gb 4 core cluster with 4-8 workers. For context, my data...

  • 131 Views
  • 2 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Please share more information, for example: Type of data sourceType of operations being executed (sharing code if possible)Timings of local runs and Databricks runs

  • 0 kudos
1 More Replies
xgbeast
by New Contributor
  • 764 Views
  • 2 replies
  • 0 kudos

What's the recommended way to scale XGBoost/LGBM to datasets that don't fit in memory ?

I'm looking to scale xgboost to large datasets which won't fit in memory on a single large EC2 instance (billions to tens of billions of rows scale). I also require many of the bells & whistles of regular in-memory xgboost slash lightgbm including:Th...

  • 764 Views
  • 2 replies
  • 0 kudos
Latest Reply
chindi_gpu_user
New Contributor
  • 0 kudos

Facing the same exact issue

  • 0 kudos
1 More Replies
sangramraje
by New Contributor
  • 55 Views
  • 0 replies
  • 0 kudos

AutoML "need to sample" not working as expected

tl; dr:When the AutoML run realizes it needs to do sampling because the driver / worker node memory is not enough to load / process the entire dataset, it fails. A sample weight column is NOT provided by me, but I believe somewhere in the process the...

sangramraje_0-1732300084616.png sangramraje_1-1732300133987.png
  • 55 Views
  • 0 replies
  • 0 kudos
roman_belkin
by New Contributor II
  • 296 Views
  • 2 replies
  • 0 kudos

Gemini though Mosaic Gateway

I am trying to configure the Gemini Vertex API in Databricks. In simple Python code, everything works fine, which indicates that I have correctly set up the API and credentials. Error message: {"error_code":"INVALID_PARAMETER_VALUE","message":"INVALI...

  • 296 Views
  • 2 replies
  • 0 kudos
Latest Reply
roman_belkin
New Contributor II
  • 0 kudos

No, it seems they gave up 

  • 0 kudos
1 More Replies
yopbibo
by Contributor II
  • 2016 Views
  • 3 replies
  • 5 kudos

Deploy a ML model, trained and registered in Databricks to AKS

Hi,I can train, registered a ML Model in my Datbricks Workspace.Then, to deploy it on AKS, I need to register the model in Azure ML, and then, deploy to AKS.Is it possible to skip the Azure ML step?I would like to deploy directly into my AKS instance...

  • 2016 Views
  • 3 replies
  • 5 kudos
Latest Reply
sidharthpradhan
New Contributor II
  • 5 kudos

Is it still the case, can't we serve the model in Databricks. I am new to this, so I am just wondering the capabilities.

  • 5 kudos
2 More Replies
damselfly20
by New Contributor III
  • 174 Views
  • 1 replies
  • 0 kudos

Resolved! Serving Endpoint: Container Image Creation Fails

For my RAG use case, I've registered my langchain chain as a model to Unity Catalog. When I'm trying to serve the model, container image creation fails with the following error in the build log:[...] #16 178.1 Downloading langchain_core-0.3.17-py3-no...

  • 174 Views
  • 1 replies
  • 0 kudos
Latest Reply
damselfly20
New Contributor III
  • 0 kudos

I was able to solve the problem by adding python-snappy==0.7.3 to the requirements.

  • 0 kudos
damselfly20
by New Contributor III
  • 131 Views
  • 2 replies
  • 1 kudos

Endpoint creation without scale-to-zero

Hi, I've got a question about deploying an endpoint for Llama 3.1 8b. The following code should create the endpoint without scale-to-zero. The endpoint is being created, but with scale-to-zero, although scale_to_zero_enabled is set to False. Instead ...

  • 131 Views
  • 2 replies
  • 1 kudos
Latest Reply
damselfly20
New Contributor III
  • 1 kudos

Thanks for the reply @Walter_C. This didn't quite work, since it used a CPU and didn't consider the max_provisioned_throughput, but I finally got it to work like this: from mlflow.deployments import get_deploy_client client = get_deploy_client("data...

  • 1 kudos
1 More Replies
NielsMH
by New Contributor III
  • 193 Views
  • 1 replies
  • 0 kudos

spark_session invocation from executor side error, when using sparkXGBregressor and fe client

Hi I have created a model and pipeline using xgboost.spark's sparkXGBregressor and pyspark.ml's Pipeline instance. However, i run into a "RuntimeError: _get_spark_session should not be invoked from executor side." when i try to save the predictions i...

  • 193 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The error you're encountering is due to attempting to access the Spark session on the executor side, which is not allowed in Spark's distributed computing model. This typically happens when trying to use Spark-specific functionality within a UDF or d...

  • 0 kudos
cmilligan
by Contributor II
  • 5457 Views
  • 5 replies
  • 2 kudos

Issue with Multi-column In predicates are not supported in the DELETE condition.

I'm trying to delete rows from a table with the same date or id as records in another table. I'm using the below query and get the error 'Multi-column In predicates are not supported in the DELETE condition'. delete from cost_model.cm_dispatch_consol...

  • 5457 Views
  • 5 replies
  • 2 kudos
Latest Reply
thisisthemurph
New Contributor II
  • 2 kudos

I seem to get this error on some DeltaTables and not others:df.createOrReplaceTempView("channels_to_delete") spark.sql(""" delete from lake.something.earnings where TenantId = :tenantId and ChannelId = in ( select ChannelId ...

  • 2 kudos
4 More Replies
amirA
by New Contributor II
  • 905 Views
  • 3 replies
  • 1 kudos

Resolved! Extracting Topics From Text Data Using PySpark

Hi EveryoneI tried to follow the same steps in Topic from Text on similar data as example. However, when I tri to fit the model with data I get this error.IllegalArgumentException: requirement failed: Column features must be of type equal to one of t...

  • 905 Views
  • 3 replies
  • 1 kudos
Latest Reply
filipniziol
Contributor III
  • 1 kudos

Hi @amirA ,The LDA model expects the features column to be of type Vector from the pyspark.ml.linalg module, specifically either a SparseVector or DenseVector, whereas you have provided Row type.You need to convert your Row object to SparseVector.Che...

  • 1 kudos
2 More Replies
ukaplan
by New Contributor III
  • 2615 Views
  • 11 replies
  • 1 kudos

Serving Endpoint Container Image Creation Fails

Hello, I trained a model using MLFlow, and saved the model as an artifact. I can load the model from a notebook and it works as expected (i.e. I can load the model using its URI).However, when I want to deploy it using Databricks endpoints, container...

  • 2615 Views
  • 11 replies
  • 1 kudos
Latest Reply
damselfly20
New Contributor III
  • 1 kudos

@ivan_calvo The problem still exists. Surely there has to be some other option than downgrading the ML cluster to DBR 14.3 LTS ML?

  • 1 kudos
10 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels