Databricks Community

BeadsPlayer · ‎12-06-2023

Hi there,

I'm trying to run a streaming inference with Delta Live Tables with tables and a model registered in Unity Catalog, but it fails for unclear reasons.

The DLT pipeline is based on a notebook, the channel is set to 'Preview', presumably running on Runtime 13.3 LTS.

The code:

********************************************************************************************

%pip install mlflow[databricks]==2.8.0
%pip install importlib_metadata==4.11.3
%pip install zipp==3.8.0
%pip install MarkupSafe==2.0.1 #2.1.3
%pip install Jinja2==2.11.3

import mlflow
import dlt

from pyspark.sql.functions import struct
from delta.tables import *

--Input Table Name and schema
--source table
catalog = "aiml"
database = "titanic"
input_table_name = "delta_live_infer_input"
input_table_name_full = f"{catalog}.{database}.{input_table_name}"

mlflow.set_registry_uri('databricks-uc')

model_name = 'aiml.titanic.dev-titanic-model'
model_uri = f"models:/{model_name}/2"

target_column = 'Survived_prediction'
id_column = 'PassengerId'
output_cols = [id_column, target_column]

input_delta_table = DeltaTable.forName(spark, input_table_name_full)

--The input table schema stored as an array of strings. This is used to pass in the schema to the model predict udf.
input_dlt_table_columns = input_delta_table.toDF().columns

--create spark user-defined function for model prediction.
--Note: : Here we use virtualenv to restore the python environment that was used to train the model.
predict = mlflow.pyfunc.spark_udf(spark, model_uri, result_type="double", env_manager='virtualenv')

@dlt.table(
comment=f"DLT for predictions scored by {model_name} based on {input_table_name} Delta table.",
table_properties={
"quality": "gold"
}
)
def delta_live_predictions():
return (
spark.readStream.table(input_table_name_full)
.withColumn(target_column, predict(struct(input_dlt_table_columns)))
.select(output_cols)
)

********************************************************************************************

The model is a spark logistic regression.

I had to add the installment of specific versions of packages otherwise the pipeline would fail, complaining that those packages are missing, had to figure out which one to specify.

This works fine for models and tables not in Unity Catalog, but with Unity Catalog it returns the error below.

The model was trained and logged with mlflow==2.8.0, Runtime 14.2 ML. I tried mlflow[databricks] versions 2.4.1, 2.5.0, 2.6.0, 2.7.1, 2.8.0 - all the same. Looks like the missing dependency 'GLIBC_2.3X' prevents mlflow from starting the virtual env.

What I'm doing wrong?

***********************Traceback**********************************************

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = e3153759-7718-4993-b5fe-caaf8881c8cd, runId = 9adca3c0-925b-46ba-a56c-afa6cf5f0bdd] terminated with exception: Exception thrown in awaitResult: Job aborted due to stage failure: Task 7 in stage 87.0 failed 4 times, most recent failure: Lost task 7.3 in stage 87.0 (TID 197) (10.1.4.10 executor 0): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function udf(named_struct(PassengerId, PassengerId#5911, Sex, Sex#5912, Age, Age#5913, Fare, Fare#5914, Pclass, Pclass#5915, Family_cnt, Family_cnt#5916, Cabin_ind, Cabin_ind#5917)) failed.

== Error ==

mlflow.exceptions.MlflowException: During spark UDF task execution, mlflow model server failed to launch. MLflow model server output:

/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/virtualenv_envs/mlflow-0be5b9a8b81d469722f3d82be553c02bfe5b71ab/bin/python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/virtualenv_envs/mlflow-0be5b9a8b81d469722f3d82be553c02bfe5b71ab/bin/python)

/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/virtualenv_envs/mlflow-0be5b9a8b81d469722f3d82be553c02bfe5b71ab/bin/python: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.35' not found (required by /local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/pyenv_root/versions/3.10.12/lib/libpython3.10.so.1.0)

/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/virtualenv_envs/mlflow-0be5b9a8b81d469722f3d82be553c02bfe5b71ab/bin/python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/pyenv_root/versions/3.10.12/lib/libpython3.10.so.1.0)

/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/virtualenv_envs/mlflow-0be5b9a8b81d469722f3d82be553c02bfe5b71ab/bin/python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/pyenv_root/versions/3.10.12/lib/libpython3.10.so.1.0)

/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/virtualenv_envs/mlflow-0be5b9a8b81d469722f3d82be553c02bfe5b71ab/bin/python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-3bc28-e5133-418de-6/mlflow/envs/pyenv_root/versions/3.10.12/lib/libpython3.10.so.1.0)

== Stacktrace ==

File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-1a41553e-c975-4f29-ac42-ba4262b5bb4e/lib/python3.10/site-packages/mlflow/pyfunc/__init__.py", line 1266, in udf

raise MlflowException(err_msg) from e

at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionSafeSpark(QueryExecutionErrors.scala:258)

at com.databricks.sql.execution.safespark.EvalExternalUDFExec.awaitBatchResult(EvalExternalUDFExec.scala:258)

at com.databricks.sql.execution.safespark.EvalExternalUDFExec.$anonfun$doExecute$12(EvalExternalUDFExec.scala:204)

at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)

at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)

at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)

at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)

at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)

at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:195)

at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)

at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:92)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:87)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:58)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39)

at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:196)

at org.apache.spark.scheduler.Task.doRunTask(Task.scala:181)

at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:146)

at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)

at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)

at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)

at scala.util.Using$.resource(Using.scala:269)

at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)

at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:146)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.scheduler.Task.run(Task.scala:99)

at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$8(Executor.scala:930)

at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)

at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)

at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:102)

at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:933)

at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:825)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:

org.apache.spark.SparkException: Exception thrown in awaitResult: Job aborted due to stage failure: Task 7 in stage 87.0 failed 4 times, most recent failure: Lost task 7.3 in stage 87.0 (TID 197) (10.1.4.10 executor 0): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function udf(named_struct(PassengerId, PassengerId#5911, Sex, Sex#5912, Age, Age#5913, Fare, Fare#5914, Pclass, Pclass#5915, Family_cnt, Family_cnt#5916, Cabin_ind, Cabin_ind#5917)) failed.