cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

siddharthk
by New Contributor II
  • 803 Views
  • 2 replies
  • 2 kudos

Resolved! Reduce downtime of Postgres table - JDBC overwrite job

I want to overwrite a Postgresql table transactionStats which is used by the customer facing dashboards.This table needs to be updated every 30 mins. I am writing a AWS Glue Spark job via JDBC connection to perform this operation.Spark dataframe writ...

  • 803 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Siddharth Kanojiya​ We haven't heard from you since the last response from @werners (Customer)​ . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 2 kudos
1 More Replies
RamyaN
by New Contributor II
  • 2066 Views
  • 2 replies
  • 3 kudos

How to read enum[] (enum of array) datatype from postgres using spark

We are trying to read a column which is enum of array datatype from postgres as string datatype to target. We could able to achieve this by expilcitly using concat function while extracting like belowval jdbcDF3 = spark.read .format("jdbc") .option(...

  • 2066 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You can try custom schema for JDBC read.option("customSchema", "colname STRING")

  • 3 kudos
1 More Replies
nadia
by New Contributor II
  • 887 Views
  • 1 replies
  • 0 kudos

Resolved! Connection Databricks Postgresql

I use Databricks and I try to connect to posgresql via the following code"jdbcHostname = "xxxxxxx"jdbcDatabase = "xxxxxxxxxxxx"jdbcPort = "5432"username = "xxxxxxx"password = "xxxxxxxx"jdbcUrl = "jdbc:postgresql://{0}:{1}/{2}".format(jdbcHostname, jd...

  • 887 Views
  • 1 replies
  • 0 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 0 kudos

hi @Boumaza nadia​ Please check the Ganglia metrics for the cluster. This could be a scalability issue where cluster is overloading. This can happen due to a large partition not fitting into the given executor's memory. To fix this we recommend bump...

  • 0 kudos
venkyv
by New Contributor II
  • 1320 Views
  • 1 replies
  • 3 kudos

Resolved! Can I use Databricks to join data from S3 and Postgres using SQL?

Hello, I'm very much new to Databricks and I'm finding it hard if it's right solution for our needs.Requirement:We have multiple data sources spread across AWS S3 and Postgres. We need a common SQL endpoint that can be used to write queries to join d...

  • 1320 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Yes you can. You can ETL to data lake storage register your tables to metastore and register your SELECT with JOINS as VIEW or even better create additionally jobs and store your JOINED table. From BI you can connect to databricks sql or to data lake...

  • 3 kudos
longcao
by New Contributor III
  • 9306 Views
  • 5 replies
  • 0 kudos

Resolved! Writing DataFrame to PostgreSQL via JDBC extremely slow (Spark 1.6.1)

Hi there,I'm just getting started with Spark and I've got a moderately sized DataFrame created from collating CSVs in S3 (88 columns, 860k rows) that seems to be taking an unreasonable amount of time to insert (using SaveMode.Append) into Postgres. I...

  • 9306 Views
  • 5 replies
  • 0 kudos
Latest Reply
longcao
New Contributor III
  • 0 kudos

In case anyone was curious how I worked around this, I ended up dropping down to Postgres JDBC and using CopyManager to COPY rows in directly from Spark: https://gist.github.com/longcao/bb61f1798ccbbfa4a0d7b76e49982f84

  • 0 kudos
4 More Replies
Labels