What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?
I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into Datab...
- 1076 Views
- 2 replies
- 0 kudos
Latest Reply
@Pedro Barbosa​ :It seems like you are running out of memory when trying to convert the PySpark dataframe to an H2O frame. One possible approach to solve this issue is to partition the PySpark dataframe before converting it to an H2O frame.You can us...
- 0 kudos