cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sen
by New Contributor
  • 6178 Views
  • 8 replies
  • 1 kudos

Resolved! Performance enhancement while writing dataframes into Parquet tables

Hi,I am trying to write the contents of a dataframe into a parquet table using the command below.df.write.mode("overwrite").format("parquet").saveAsTable("sample_parquet_table")The dataframe contains an extract from one of our source systems, which h...

  • 6178 Views
  • 8 replies
  • 1 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 1 kudos

Hi,I agree with the reply around the benefits of Delta tables, specifically Delta brings additional features,such as ACID transactions and schema evolution. However, I am not sure whether the problem below and I quote "The problem is, this statement ...

  • 1 kudos
7 More Replies
tarente
by New Contributor III
  • 2936 Views
  • 3 replies
  • 3 kudos

Partitioned parquet table (folder) with different structure

Hi,We have a parquet table (folder) in Azure Storage Account.The table is partitioned by column PeriodId (represents a day in the format YYYYMMDD) and has data from 20181001 until 20211121 (yesterday).We have a new development that adds a new column ...

  • 2936 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I think problem is in overwrite as when you overwrite it overwrites all folders. Solution is to mix append with dynamic overwrite so it will overwrite only folders which have data and doesn't affect old partitions:spark.conf.set("spark.sql.sources.pa...

  • 3 kudos
2 More Replies
User16826987838
by Contributor
  • 1381 Views
  • 1 replies
  • 0 kudos

Refreshing external tables

After I vacuum the tables, do i need to update the manifest table and parquet table to refresh my external tables for integrations to work?

  • 1381 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
New Contributor III
  • 0 kudos

Manifest files need to be re-created when partitions are added or altered. Since a VACUUM only deletes all historical versions, you shouldn't need to create an updated manifest file unless you are also running an OPTIMIZE.

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1209 Views
  • 1 replies
  • 0 kudos
  • 1209 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat.htmlThe issue mentioned in t...

  • 0 kudos
aladda
by Honored Contributor II
  • 909 Views
  • 1 replies
  • 0 kudos
  • 909 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Yes Convert to Delta allows for converting a parquet table into Delta format in place by adding a transaction log, infering the schema and also collecting stats to improve query performance - https://docs.databricks.com/spark/latest/spark-sql/languag...

  • 0 kudos
Labels