Data Engineering

by Pbarbosa154 • New Contributor III

04-28-2023 7:30:44 AM

2122 Views
2 replies
0 kudos

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into Datab...

Data Engineering

2122 Views
2 replies
0 kudos

04-28-2023 7:30:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2023 10:34:53 AM

0 kudos

@Pedro Barbosa :It seems like you are running out of memory when trying to convert the PySpark dataframe to an H2O frame. One possible approach to solve this issue is to partition the PySpark dataframe before converting it to an H2O frame.You can us...

0 kudos

04-28-2023 10:34:53 AM

1 More Replies

by vas610 • New Contributor III

08-06-2021 8:36:18 AM

4478 Views
5 replies
0 kudos

Error loading h2o model in mlflow

I'm getting the following error when I'm trying to load a h2o model using mlflow for prediction Error: Error Job with key $03017f00000132d4ffffffff$_990da74b0db027b33cc49d1d90934149 failed with an exception: java.lang.IllegalArgumentException:...

Data Engineering

4478 Views
5 replies
0 kudos

08-06-2021 8:36:18 AM

View Replies

Latest Reply

Dan_Z
Databricks Employee

08-06-2021 3:56:45 PM

0 kudos

I ran this in Databricks and it worked with no issues. I suggest you make sure your wget path is correct, because the one you posted downloads HTML, not the raw csv. That may cause the problem. %sh wget https://raw.githubusercontent.com/mlflow/mlflo...

0 kudos

08-06-2021 3:56:45 PM

4 More Replies

Databricks Community

Forum Posts

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

Error loading h2o model in mlflow