cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Notebook of Databricks's result

rt-slowth
Contributor

If there is no data abnormality in redshift connecting to spark from shared in databricks, and the data suddenly decreases, what cause should I check? Also, is there any way to check the variables in widget or code on each execution?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @rt-slowth

If the amount of data that is being loaded from Amazon Redshift to Databricks decreases unexpectedly, the cause of the issue could be related to a change in the data size in a Redshift table accessed by Spark decreases unexpectedly, it could be due to several reasons.

Here are some possible causes you can investigate:

  1. Data updates or deletions: Check if there was any data that was updated or removed in Redshift. A change in the Redshift data source can cause a decrease in the size of the data exposed to the Spark cluster.

  2. Limitations of the Redshift Spectrum: Check if the table in Redshift is a Spectrum external table, as there are limitations on the Spectrum format that may affect the amount of data that is available to Spark. For example, in the Parquet data format, the data size may significantly decrease if the data is heavily compressed or has a high number of empty or null values.

  3. Query optimizations: Check if there has been any optimizations in the Spark queries accessing the Redshift data that may have resulted in a reduction of the amount of data pulled from Redshift.

To check the variables in a widget or code of each execution, you can use the Databricks notebook context to access the dbutils.widgets and dbutils.notebook.entry_point.getDbutils() functions.

  1. Using dbutils.widgets: Use dbutils.widgets to create and set widget variables. For example, to set a widget variable myVar to a value myValue, you can use the following code:
dbutils.widgets.text("myVar", "myValue")

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group