cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kishorekumar
by New Contributor
  • 1338 Views
  • 1 replies
  • 0 kudos

Silent failure in DataFrameWriter when loading data to Redshift

Context:I'm using DataFrameWriter to load the dataSet into the Redshift. DataFrameWriter writes the dataSet to S3, and loads data from S3 to Redshift by issuing the Redshift copy command. Issue:In frequently we are observing, the data is present in t...

  • 1338 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Kishorekumar Somasundaram​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
Edwin
by New Contributor II
  • 714 Views
  • 0 replies
  • 1 kudos

Unable to load data from Redshift

I've been trying to connect to RedShift following Databrick's documentation and validated that I'm using runtime version 11.3 on my cluster and that I have read/write privileges on the tempdir bucket. But, I'm unable to load data from RedShift to a S...

  • 714 Views
  • 0 replies
  • 1 kudos
Lonnie
by New Contributor
  • 1730 Views
  • 1 replies
  • 1 kudos

Recommended Redshift-2-Delta Migration Path

Hello All!My team is previewing Databricks and are contemplating the steps to take to perform one-time migrations of datasets from Redshift to Delta. Based on our understandings of the tool, here are our initial thoughts:Export data from Redshift-2-S...

  • 1730 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Awesome!

  • 1 kudos
LorenRD
by Contributor
  • 7150 Views
  • 11 replies
  • 13 kudos

Resolved! Is it possible to connect Databricks SQL with AWS Redshift DB?

I would like to know if it's possible to connect Databricks SQL module with not just internal Metastore DB and tables from Data Science and Engineering module but also connect with an AWS Redshift DB to do queries and create alerts. 

image
  • 7150 Views
  • 11 replies
  • 13 kudos
Latest Reply
LorenRD
Contributor
  • 13 kudos

Hi @Kaniz Fatma​ I contacted Customer support explaining this issue, they told me that this feature is not implemented yet but it's in the roadmap with no ETA. It would be great if you ping me back when it's possible to access Redshift tables from SQ...

  • 13 kudos
10 More Replies
Anonymous
by Not applicable
  • 2993 Views
  • 7 replies
  • 0 kudos

Resolved! How to use from standalone Spark Jar running from Intellij Idea the library installed in Databricks DBR?

Hello, I tried without success to use several libraries installed by use in the Databricks 9.1 cluster (not provived by default in DBR) from a standalone Spark application runs from Intellij Idea. For instance, for connecting to Redshift it works onl...

  • 2993 Views
  • 7 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Unfortunately, I did not find any solution. We have to package JAR and run it from Databricks job for test/debug. Not efficient but as no solution for remote debug has been found/provided.

  • 0 kudos
6 More Replies
nicole_wong
by New Contributor II
  • 1907 Views
  • 2 replies
  • 1 kudos

Resolved! Best practices for working with Redshift

I have a customer with the following question - I'm posting on their behalf to introduce them to the community. For doing modeling in a python environment what is our best practice for getting the data from redshift? A "load" option seems to leave me...

  • 1907 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @Nicole Wong​ ,Have you check the docs from here? As far as I know, this might be the only way to read/write data to/from redshift.

  • 1 kudos
1 More Replies
sajith_appukutt
by Honored Contributor II
  • 1163 Views
  • 1 replies
  • 0 kudos

Resolved! I'm using the Redshift data source to load data into spark SQL data frames. However, I'm not seeing predicate push down for my queries ran on Redshift - is that expected?

I was expecting filter operations to be pushed down to Redshift by the optimizer. However, the entire dataset is getting loaded from Redshift.

  • 1163 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The Spark driver for Redshift pushes the following operators down into Redshift:FilterProjectSortLimitAggregationJoinHowever, it does not support expressions operating on dates and timestamps today. If you have a similar requirement, please add a fea...

  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 987 Views
  • 1 replies
  • 1 kudos

Resolved! Are there any ways to automatically cleanup temporary files created in s3 by the Amazon Redshift connector

The Amazon Redshift data source in Databricks seems to be using S3 for storing intermediate results. Are there any ways to automatically cleanup temporary files created in S3

  • 987 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

You could use storage lifecycle policy for the s3 bucket used for storing intermediate results and configure expiration actions. This way temporary/intermediate results would be automatically cleaned up

  • 1 kudos
cfregly
by Contributor
  • 5100 Views
  • 4 replies
  • 0 kudos
  • 5100 Views
  • 4 replies
  • 0 kudos
Latest Reply
TianziCai
New Contributor II
  • 0 kudos

sample = (spark.read .format("com.databricks.spark.redshift") .option("url", jdbcUrl) .option("dbtable", "xx.xxx") # schema, table .option("forward_spark_s3_credentials", True) .option("tempdir", tem...

  • 0 kudos
3 More Replies
Labels