cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sen
by New Contributor
  • 9297 Views
  • 9 replies
  • 1 kudos

Resolved! Performance enhancement while writing dataframes into Parquet tables

Hi,I am trying to write the contents of a dataframe into a parquet table using the command below.df.write.mode("overwrite").format("parquet").saveAsTable("sample_parquet_table")The dataframe contains an extract from one of our source systems, which h...

  • 9297 Views
  • 9 replies
  • 1 kudos
Latest Reply
jhoon
New Contributor II
  • 1 kudos

Great discussion on performance optimization! Managing technical projects like these alongside academic work can be demanding. If you need expert academic support to free up time for your professional pursuits, Dissertation Help Services is here to a...

  • 1 kudos
8 More Replies
tarente
by New Contributor III
  • 3524 Views
  • 3 replies
  • 3 kudos

Partitioned parquet table (folder) with different structure

Hi,We have a parquet table (folder) in Azure Storage Account.The table is partitioned by column PeriodId (represents a day in the format YYYYMMDD) and has data from 20181001 until 20211121 (yesterday).We have a new development that adds a new column ...

  • 3524 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I think problem is in overwrite as when you overwrite it overwrites all folders. Solution is to mix append with dynamic overwrite so it will overwrite only folders which have data and doesn't affect old partitions:spark.conf.set("spark.sql.sources.pa...

  • 3 kudos
2 More Replies
User16826987838
by Contributor
  • 1618 Views
  • 1 replies
  • 0 kudos

Refreshing external tables

After I vacuum the tables, do i need to update the manifest table and parquet table to refresh my external tables for integrations to work?

  • 1618 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
Databricks Employee
  • 0 kudos

Manifest files need to be re-created when partitions are added or altered. Since a VACUUM only deletes all historical versions, you shouldn't need to create an updated manifest file unless you are also running an OPTIMIZE.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1491 Views
  • 1 replies
  • 0 kudos
  • 1491 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat.htmlThe issue mentioned in t...

  • 0 kudos
aladda
by Databricks Employee
  • 1055 Views
  • 1 replies
  • 0 kudos
  • 1055 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Yes Convert to Delta allows for converting a parquet table into Delta format in place by adding a transaction log, infering the schema and also collecting stats to improve query performance - https://docs.databricks.com/spark/latest/spark-sql/languag...

  • 0 kudos
Labels