cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Performance Issue : Create DELTA table form 2 TB PARQUET file

KuldeepChitraka
New Contributor III

We are trying to create a DELTA table (CTAS statement) from 2 TB PARQUET file and its taking huge amount of time around 12~ hrs.

is it normal.? What are option to tune/optimize this ? are we doing anything wrong

Cluster : Interactive/30 Cores / 320 GB Memory / 4 workers

3 REPLIES 3

shan_chandra
Databricks Employee
Databricks Employee

@Kuldeep Chitrakar​ - Please try to evaluate(explain plan) the physical plan on the CTAS query before creating the table. Below are a few things that can be validated before turning the cluster size.

  1. validate the join conditions used in CTAS query.
  2. will a plain select query work?
  3. Tuning spark.sql.shuffle.partitions to see if more number of tasks are spun in parallel to reduce the time taken.
  4. Is there a skew in the join?
  5. will AQE config help? (https://docs.databricks.com/optimizations/aqe.html)

Cami
Contributor III

I do not have experience with 2TB dataset but I recommend you check it out:

  1. spark.sql.shuffle.partitions ( doc examples: Link 1, Link 2 )
  2. Tune file size

Can you share with us a screen from SPARK UI for CTAS statement ( SPARK UI ->STAGES -> select CTAS -> Summary metrics and Aggregated metrics )?

Can you check the size of the parquet files created under the delta table.?

Hubert-Dudek
Esteemed Contributor III

Please use COPY INTO (first create an empty delta table) or CONVERT TO DELTA instead of CTAS it will be much more faster, and it process will be auto-optimized.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group