Hi @AChang , Based on the logs provided, it appears that your workers are being terminated due to insufficient memory, as indicated by the repeated "Worker (pid:X) was sent SIGKILL! Perhaps out of memory?" messages. This suggests that the model you're trying to deploy might be too large or complex for the current amount of allocated memory.Databricks Model Serving, by default, provides 4 GB of memory for your model. If your model requires more memory, you can reach out to your Databricks support contact to increase this limit up to 16 GB per model.
Before moving to the largest compute, you might want to consider the following steps:
1. Try optimizing your model. This could involve simplifying the model architecture, reducing the dimensionality of your data, or using a more memory-efficient data representation.
2. Monitor the memory usage of your model during training and inference to get a sense of how much memory it requires.
3. If your model is indeed too large for the current memory allocation, consider requesting an increase in memory limit from the Databricks support.
Remember that moving to a larger compute resource may incur additional costs, so it's important to ensure that this is necessary before making the change.