Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...
Hi @MuthuLakshmi , As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...
I trained my model and was able to get the batch prediction from that model as specified below. But I want to also get the probability scores for each prediction. Do you have any idea? Thank you!logged_model = path_to_model# Load model as a PyFuncMod...
Now you can log the model using this parameter:mlflow.sklearn.log_model(
..., # the usual params
pyfunc_predict_fn="predict_proba"
) which will return probabilities for the first class apparently when using the model for inference (e.g. when...
Hello, I am trying to serve a model endpoint (using Databricks GUI) for a model that was successfully logged to the Model Registry. However, the endpoint creation failed with the following errors: Endpoint logs with error messagesEndpoint events with...
Hi @Nikhil Gajghate We haven't heard from you since the last response from @Kaniz Fatma , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to o...
Hi, I tried to deploy a Feature Store packaged model into Delta Live Table using mlflow.pyfunc.spark_udf in Azure Databricks. This model is built by Databricks autoML with joined Feature Table inside it.And I'm trying to make prediction using the fol...
Hello,I have a problem with the model registry. I'm not sure if I'm registering the model incorrectly or if it´s a bug.Here are some code snippets:import pandas as pd
import mlflow
import mlflow.keras
import mlflow.tensorflow
from mlflow.tracking.c...
I'm training a NeuralProphet for a time series forecasting problem. I'm trying to parallelize my training, but this error is appearingThe folder lightning_logs has a hparams.yaml but it's empty. Is this related to permissions on the cluster? Thanks i...
I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?
Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns)
interaction = build_initial_df()...
It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...
I trained a basic image classification model on MNIST using Tensorflow, logging the experiment run with MLflow.Model: "my_sequential"
_________________________________________________________________
Layer (type) Output Shape ...
I built a machine learning model:lr = LinearRegression()
lr.fit(X_train, y_train)which I can save to the filestore by:filename = "/dbfs/FileStore/lr_model.pkl"
with open(filename, 'wb') as f:
pickle.dump(lr, f)Ideally, I wanted to save the model ...
Workspace and Repo is not full available via dbfs as they have separate access rights. It is better to use MLFlow for your models as it is like git but for ML. I think using MLOps you can than put your model also to git.
Hi all,We are constructing our CI/CD pipelines with the Repos feature following this guide:https://databricks.com/blog/2021/09/20/part-1-implementing-ci-cd-on-databricks-using-databricks-notebooks-and-azure-devops.htmlI'm trying to implement my pipes...
So you are managing your models with MLflow, and want to include them in a git repository?You can do that in a CI/CD process; it would run the mlflow CLI to copy the model you want (e.g. model:/my_model/production) to a git checkout and then commit i...
Hi all, I'm trying to register a model with python 3 support, but continue getting only python 2. I can see that runtime 6.0 and above get python 3 by default, but I don't see a way to set neither runtime version, nor python version during model regi...
Hi team, thanks for getting back to me. Let's put this on hold for now. I will update once it's needed again. It was solely for education purpose and right now I have quite urgent stuff to do.Have a great day.
Hey guys,I'm trying to train deep learning model at ML databricks with numpy arrays as input.For now i organized all the data inside DF- df contains 4 columns : col1,col2,col3,col4col1 and col2 have arrays with shape (1,3,3,3,3), col 3 have array wit...
I am building a machine learning model using sklearn Pipeline which includes a ColumnTransformer as a preprocessor before the actual model. Below is the code how the pipeline is created.transformers = []
num_pipe = Pipeline(steps=[
('imputer', Si...