Data Engineering

Forum Posts

Sorted by:

Start a conversation

by rajib76 • New Contributor II

03-11-2022 12:20:13 PM

3685 Views
2 replies
2 kudos

Resolved! DBFS with Google Cloud Storage(GCS)

Does DBFS support GCS?

Data Engineering

3685 Views
2 replies
2 kudos

03-11-2022 12:20:13 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-12-2022 2:51:44 AM

2 kudos

Yes you need just to create service account for databricks and than assign storage admin role to bucket. After that you can mount GCS standard way:bucket_name = "<bucket-name>" mount_name = "<mount-name>" dbutils.fs.mount("gs://%s" % bucket_name, "/m...

2 kudos

03-12-2022 2:51:44 AM

1 More Replies

by James1100 • New Contributor II

06-01-2023 12:13:09 AM

2229 Views
1 replies
1 kudos

Databricks connect to GCS

Hi,Would like to ask if anyone knows how to connect to GCS - basically read csv file from GCS bucket.I have no issue connecting to Data Lake.Thank you so much in advance.

Data Engineering

2229 Views
1 replies
1 kudos

06-01-2023 12:13:09 AM

View Replies

Latest Reply

Vartika
Databricks Employee

06-09-2023 4:14:01 AM

1 kudos

Hi @James C,Just checking in. If @Kaniz Fatma's answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?We'd love to hear from you.Cheers!

1 kudos

06-09-2023 4:14:01 AM

by Pbarbosa154 • New Contributor III

04-28-2023 7:30:44 AM

2161 Views
2 replies
0 kudos

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into Datab...

Data Engineering

2161 Views
2 replies
0 kudos

04-28-2023 7:30:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2023 10:34:53 AM

0 kudos

@Pedro Barbosa :It seems like you are running out of memory when trying to convert the PySpark dataframe to an H2O frame. One possible approach to solve this issue is to partition the PySpark dataframe before converting it to an H2O frame.You can us...

0 kudos

04-28-2023 10:34:53 AM

1 More Replies

by explorer • New Contributor III

01-11-2023 4:10:40 AM

8752 Views
4 replies
3 kudos

Getting error while loading parquet data into Postgres (using spark-postgres library) ClassNotFoundException: Failed to find data source: postgres. Please find packages at http://spark.apache.org/third-party-projects.html Caused by: ClassNotFoundException

Hi Fellas - I'm trying to load parquet data (in GCS location) into Postgres DB (google cloud) . For bulk upload data into PG we are using (spark-postgres library)https://framagit.org/interhop/library/spark-etl/-/tree/master/spark-postgres/src/main/sc...

Data Engineering

8752 Views
4 replies
3 kudos

01-11-2023 4:10:40 AM

View Replies

Latest Reply

explorer
New Contributor III

01-18-2023 7:44:11 AM

3 kudos

Hi @Kaniz Fatma , @Daniel Sahal - Few updates from my side.After so many hits and trials , psycopg2 worked out in my case.We can process 200+GB data with 10 node cluster (n2-highmem-4,32 GB Memory, 4 Cores) and driver 32 GB Memory, 4 Cores with Run...

3 kudos

01-18-2023 7:44:11 AM

3 More Replies

by shrutis23 • New Contributor III

11-08-2022 10:31:14 AM

6257 Views
4 replies
4 kudos

How to use delta live table with google cloud storage

Hi Team I have been working on a POC exploring delta live table with GCS location. I have some doubts :how to access the gcs bucket. We have connection established using databricks service account. In a normal cluster creation , we go to cluster page...

Data Engineering

6257 Views
4 replies
4 kudos

11-08-2022 10:31:14 AM

View Replies

Latest Reply

Senthil1
Contributor

11-30-2022 4:53:59 AM

4 kudos

Kindly mount the DBFS location to GCS cloud storage, see belowMounting cloud object storage on Databricks | Databricks on Google Cloud

4 kudos

11-30-2022 4:53:59 AM

3 More Replies

by MBV3 • Contributor

11-24-2022 3:09:11 PM

2874 Views
1 replies
2 kudos

Delete a file from GCS folder

What is the best way to delete files from the gcp bucket inside spark job?

Data Engineering

2874 Views
1 replies
2 kudos

11-24-2022 3:09:11 PM

View Replies

Latest Reply

Unforgiven
Valued Contributor III

11-24-2022 8:06:03 PM

2 kudos

@M Baig yes you need just to create service account for databricks and than assign storage admin role to bucket. After that you can mount GCS standard way:bucket_name = "<bucket-name>"mount_name = "<mount-name>"dbutils.fs.mount("gs://%s" % bucket_na...

2 kudos

11-24-2022 8:06:03 PM

by Srikanth_Gupta_ • Databricks Employee

06-21-2021 10:17:45 AM

2090 Views
1 replies
0 kudos

Resolved! Does size of optimized files after running OPTIMIZE varies between cloud providers (S3, Blob and GCS)?

are there any other parameters to consider running OPTIMIZE depending cloud vendor?

Data Engineering

2090 Views
1 replies
0 kudos

06-21-2021 10:17:45 AM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

06-21-2021 11:03:17 AM

0 kudos

The optimize is not dependent on the cloud provider whatsoever. Optimize will produce the same results regardless of the underlying storage. It is idempotent, meaning if it is run twice on the same dataset the the second execution has no effect.

0 kudos

06-21-2021 11:03:17 AM

Databricks Community

Resolved! DBFS with Google Cloud Storage(GCS)

Databricks connect to GCS

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

Getting error while loading parquet data into Postgres (using spark-postgres library) ClassNotFoundException: Failed to find data source: postgres. Please find packages at http://spark.apache.org/third-party-projects.html Caused by: ClassNotFoundException

How to use delta live table with google cloud storage

Delete a file from GCS folder

Resolved! Does size of optimized files after running OPTIMIZE varies between cloud providers (S3, Blob and GCS)?