Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'

jnkthms — Mon, 29 Jul 2024 10:48:35 GMT

When setting up a vector search in databricks using the bge_m3 (Version 1) embedding model available in system.ai schema, the setup runs for 20 minutes or so and then fails. Querying the served embedding models from the browser works perfectly fine.

The exact same data worked in the past (although in a different workspace), I've retried several times, over a longer period of time, so this does not seem to be a temporary issue.

The flow_progress step in the pipeline creating fails with

Failed to resolve flow: '__online_index_view'

and error details:

java.lang.Exception: Error: Response Code: 400, Response: {"error_code":"INVALID_PARAMETER_VALUE","message":"Failed to call Model Serving endpoint: bge_m3_embedding."} at com.databricks.pipelines.execution.extensions.brickindex.DatabricksHttpClient.$anonfun$sendRequestWithRetries$5(DatabricksHttpClient.scala:129) at com.databricks.pipelines.execution.extensions.brickindex.DatabricksHttpClient.$anonfun$sendRequestWithRetries$5$adapted(DatabricksHttpClient.scala:121) at scala.util.Using$.resource(Using.scala:269) at com.databricks.pipelines.execution.extensions.brickindex.DatabricksHttpClient.$anonfun$sendRequestWithRetries$4(DatabricksHttpClient.scala:121) at com.databricks.backend.common.util.TimeUtils$.$anonfun$retryWithExponentialBackoff0$1(TimeUtils.scala:191) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.common.util.TimeUtils$.retryWithExponentialBackoff0(TimeUtils.scala:191) at com.databricks.backend.common.util.TimeUtils$.retryWithExponentialBackoff(TimeUtils.scala:145) at com.databricks.pipelines.execution.extensions.brickindex.DatabricksHttpClient.sendRequestWithRetries(DatabricksHttpClient.scala:120) at com.databricks.pipelines.execution.extensions.brickindex.DatabricksHttpClient.post(DatabricksHttpClient.scala:209) at com.databricks.pipelines.execution.extensions.brickindex.BrickIndexGatewayClient.$anonfun$makePredictions$2(GatewayClient.scala:335) at com.databricks.pipelines.execution.extensions.brickindex.BrickIndexGatewayClient.withCredentials(GatewayClient.scala:157) at com.databricks.pipelines.execution.extensions.brickindex.BrickIndexGatewayClient.makePredictions(GatewayClient.scala:332) at com.databricks.pipelines.execution.extensions.brickindex.ModelServingBatchProcessor.processViaGateway(ModelServingBatchProcessor.scala:96) at com.databricks.pipelines.execution.extensions.brickindex.ModelServingBatchProcessor.process(ModelServingBatchProcessor.scala:75) at com.databricks.pipelines.execution.extensions.brickindex.VectorSearchIngestionProcessor.$anonfun$processIngestionWithConcurrency$6(VectorSearchIngestionProcessor.scala:125) at com.databricks.pipelines.execution.extensions.brickindex.VectorSearchIngestionProcessor.$anonfun$processIngestionWithConcurrency$6$adapted(VectorSearchIngestionProcessor.scala:125) at com.databricks.pipelines.execution.extensions.brickindex.VectorSearchIngestionProcessor.$anonfun$processIngestionBatchFuture$1(VectorSearchIngestionProcessor.scala:216) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:46) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:46) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:77) at com.databricks.threading.DatabricksExecutionContext$InstrumentedRunnable.run(DatabricksExecutionContext.scala:36) at com.databricks.threading.NamedExecutor$$anon$2.$anonfun$run$1(NamedExecutor.scala:367) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:426) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:216) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:424) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418) at com.databricks.threading.NamedExecutor.withAttributionContext(NamedExecutor.scala:294) at com.databricks.threading.NamedExecutor$$anon$2.run(NamedExecutor.scala:365) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Any ideas what the problem might be?

Re: Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'

tyler-xorbix — Mon, 29 Jul 2024 18:15:59 GMT

I would double check what specific values are being sent to the model in the workflow. Possibly transitioning to environments changed a value's type or possibly data isn't defined correctly leaving certain parameters empty?
The "INVALID_PARAMETER_VALUE" from the embedding output makes me believe something isn't being set correctly in the workflow when accessing the endpoint programmatically.

Re: Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'

jnkthms — Tue, 30 Jul 2024 07:50:59 GMT

Hi taylor-xorbix,

I'm not defining a workflow manually or setting any environment variables. I'm using the databricks UI (so from the unity catalog I'm using the create/vector search index dropdown. Having a running (and working) bge_m3 endpoint.

Looking at the example from the UI for the served embedding model, it seems that the API now specifies "inputs". If I remember correctly previously is was called "message" at some point in the past, which would explain the error message above.

The thing is that I'm not installing anything manually and just using the databricks UI functionality, so this should all work together

Re: Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'

jnkthms — Wed, 31 Jul 2024 08:50:04 GMT

The issue was most likely to use a CPU compute for the deployed model, switching to GPU (small) solved the issue.

topic Re: Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view' in Machine Learning

Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'

Re: Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'

Re: Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'

Re: Initializing Vector Search index Sync failes with Failed to resolve flow: '__online_index_view'