cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mrcity
by New Contributor II
  • 1936 Views
  • 3 replies
  • 1 kudos

Exclude absent lookup keys from dataframes made by create_training_set()

I've got data stored in feature tables, plus in a data lake. The feature tables are expected to lag the data lake by at least a little bit. I want to filter data coming out of the feature store by querying the data lake for lookup keys out of my inde...

  • 1936 Views
  • 3 replies
  • 1 kudos
Latest Reply
Quinten
New Contributor II
  • 1 kudos

I'm facing the same issue as described by @mrcity. There is no easy way to alter the dataframe, which is created inside the score_batch() function. Filtering out rows in the (sklearn) pipeline itself is also not convenient since these transformers ar...

  • 1 kudos
2 More Replies
Nasreddin
by New Contributor
  • 5333 Views
  • 2 replies
  • 0 kudos

ColumnTransformer not fitted after sklearn Pipeline loaded from Mlflow

I am building a machine learning model using sklearn Pipeline which includes a ColumnTransformer as a preprocessor before the actual model. Below is the code how the pipeline is created.transformers = [] num_pipe = Pipeline(steps=[ ('imputer', Si...

  • 5333 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Nasreddin, MLflow is compatible with sklearn Pipeline with multiple steps. The error you're encountering, "This ColumnTransformer instance is not fitted yet. Call’ fit’ with appropriate arguments before using this estimator." is likely because  C...

  • 0 kudos
1 More Replies
MudassarA
by New Contributor II
  • 13798 Views
  • 4 replies
  • 1 kudos

Resolved! How to fix TypeError: __init__() got an unexpected keyword argument 'max_iter'?

# Create the model using sklearn (don't worry about the parameters for now): model = SGDRegressor(loss='squared_loss', verbose=0, eta0=0.0003, max_iter=3000) Train/fit the model to the train-part of the dataset: odel.fit(X_train, y_train) ERROR: Typ...

  • 13798 Views
  • 4 replies
  • 1 kudos
Latest Reply
Fantomas_nl
New Contributor II
  • 1 kudos

Replacing max_iter with n_iter resolves the error. Thnx! It is a bit unusual to expect errors like this with this type of solution from Microsoft. As if it could not be prevented..

  • 1 kudos
3 More Replies
AlexRomano
by New Contributor
  • 6696 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not pickle the task to send it to the workers.

I am using sklearn in a databricks notebook to fit an estimator in parallel. Sklearn uses joblib with loky backend to do this. Now, I have file in databricks which I can import my custom Classifier from, and everything works fine. However, if I lite...

  • 6696 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi, aromano I know this issue was opened almost a year ago, but I faced the same problem and I was able to solve it. So, I'm sharing the solution in order to help others. Probably, you're using SparkTrials to optimize the model's hyperparameters ...

  • 0 kudos
Labels