Databricks Community

Santhanalakshmi · ‎07-13-2022

Hello All,

I am trying to read the data and trying to group the data in order to pass it to predict function via @F.pandas_udf method.

#Loading Model 
pkl_model = pickle.load(open(filepath,'rb')) 
 
 # build schema for output labels
 filter_schema=[]
  t = T.StructField("anomaly_prediction", T.IntegerType(),True)
  filter_schema.append(t)         
  
  t1 = T.StructField("anomaly_score", T.DoubleType(),True)
  filter_schema.append(t1)         
  
  return_schema = T.StructType(df.select(df.columns).schema.fields+filter_schema)                                       
 
  @F.pandas_udf(return_schema, F.PandasUDFType.GROUPED_MAP)
  def inferdata(data):
    dt = data[labelnames].to_numpy()
    #dt = np.asarray(dt).astype('float64')
    score, pred = pkl_model.predict(dt)
    print('score and prediction is ',score, pred)
    data["anomaly_prediction"] = pred
    data["anomaly_score"] = score
    return(data)
  
  df = df.groupby('filename').apply(inferdata)
  print(df.show(2))

But it is throwing an error:

"java.lang.IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))"

I have attached the code snippet and error images for your reference. I have been stuck with this problem for a week.

Could anybody please help me to resolve this issue?

AmanSehgal · ‎07-14-2022

You might have to share the code above the cell. Please paste the code using code editor and not as an image..

Santhanalakshmi · ‎07-14-2022

Thanks I have updated the code in the cell

Vindhya · ‎04-18-2023

@Santhanalakshmi Manoharan Was this issue resolved, Am also getting same error, any guidance would be of great help.

Appreciate your help.

Databricks Community

Throwing IndexoutofBound Exception in Pyspark

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon