- 1231 Views
- 2 replies
- 2 kudos
Hi,Would like to ask if anyone knows how to connect to GCS - basically read csv file from GCS bucket.I have no issue connecting to Data Lake.Thank you so much in advance.
- 1231 Views
- 2 replies
- 2 kudos
Latest Reply
Hi @James C​,Just checking in. If @Kaniz Fatma​'s answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?We'd love to hear from you.Cheers!
1 More Replies
- 961 Views
- 2 replies
- 0 kudos
I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into Datab...
- 961 Views
- 2 replies
- 0 kudos
Latest Reply
@Pedro Barbosa​ :It seems like you are running out of memory when trying to convert the PySpark dataframe to an H2O frame. One possible approach to solve this issue is to partition the PySpark dataframe before converting it to an H2O frame.You can us...
1 More Replies
- 3938 Views
- 6 replies
- 3 kudos
Hi Fellas - I'm trying to load parquet data (in GCS location) into Postgres DB (google cloud) . For bulk upload data into PG we are using (spark-postgres library)https://framagit.org/interhop/library/spark-etl/-/tree/master/spark-postgres/src/main/sc...
- 3938 Views
- 6 replies
- 3 kudos
Latest Reply
Hi @Kaniz Fatma​ , @Daniel Sahal​ - Few updates from my side.After so many hits and trials , psycopg2 worked out in my case.We can process 200+GB data with 10 node cluster (n2-highmem-4,32 GB Memory, 4 Cores) and driver 32 GB Memory, 4 Cores with Run...
5 More Replies
- 3122 Views
- 5 replies
- 4 kudos
Hi Team I have been working on a POC exploring delta live table with GCS location. I have some doubts :how to access the gcs bucket. We have connection established using databricks service account. In a normal cluster creation , we go to cluster page...
- 3122 Views
- 5 replies
- 4 kudos
Latest Reply
Kindly mount the DBFS location to GCS cloud storage, see belowMounting cloud object storage on Databricks | Databricks on Google Cloud
4 More Replies
by
MBV3
• New Contributor III
- 1316 Views
- 1 replies
- 2 kudos
What is the best way to delete files from the gcp bucket inside spark job?
- 1316 Views
- 1 replies
- 2 kudos
Latest Reply
@M Baig​ yes you need just to create service account for databricks and than assign storage admin role to bucket. After that you can mount GCS standard way:bucket_name = "<bucket-name>"mount_name = "<mount-name>"dbutils.fs.mount("gs://%s" % bucket_na...
- 1041 Views
- 1 replies
- 0 kudos
are there any other parameters to consider running OPTIMIZE depending cloud vendor?
- 1041 Views
- 1 replies
- 0 kudos
Latest Reply
The optimize is not dependent on the cloud provider whatsoever. Optimize will produce the same results regardless of the underlying storage. It is idempotent, meaning if it is run twice on the same dataset the the second execution has no effect.