Topics with Label: Datasource

by explorer • New Contributor III

01-11-2023 4:10:40 AM

2662 Views
6 replies
3 kudos

Getting error while loading parquet data into Postgres (using spark-postgres library) ClassNotFoundException: Failed to find data source: postgres. Please find packages at http://spark.apache.org/third-party-projects.html Caused by: ClassNotFoundException

Hi Fellas - I'm trying to load parquet data (in GCS location) into Postgres DB (google cloud) . For bulk upload data into PG we are using (spark-postgres library)https://framagit.org/interhop/library/spark-etl/-/tree/master/spark-postgres/src/main/sc...

Data Engineering

2662 Views
6 replies
3 kudos

01-11-2023 4:10:40 AM

View Replies

Latest Reply

explorer
New Contributor III

01-18-2023 7:44:11 AM

3 kudos

Hi @Kaniz Fatma , @Daniel Sahal - Few updates from my side.After so many hits and trials , psycopg2 worked out in my case.We can process 200+GB data with 10 node cluster (n2-highmem-4,32 GB Memory, 4 Cores) and driver 32 GB Memory, 4 Cores with Run...

3 kudos

01-18-2023 7:44:11 AM

5 More Replies

by Tahseen0354 • Contributor III

03-21-2022 11:54:43 AM

1264 Views
4 replies
2 kudos

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Hi, I have seen it written in the documentation that standard cluster is recommended for a single user. But why ? What is meant by that ? Me and one of my colleagues were testing it on the same notebook. Both of us can use the same standard all purpo...

Data Engineering

1264 Views
4 replies
2 kudos

03-21-2022 11:54:43 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-21-2022 12:11:13 PM

2 kudos

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu. In standard cluster 1 person could utilize all worker cpus as you...

2 kudos

03-21-2022 12:11:13 PM

3 More Replies

Databricks

Forum Posts

Getting error while loading parquet data into Postgres (using spark-postgres library) ClassNotFoundException: Failed to find data source: postgres. Please find packages at http://spark.apache.org/third-party-projects.html Caused by: ClassNotFoundException

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?