cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sen
by New Contributor
  • 3493 Views
  • 8 replies
  • 1 kudos

Resolved! Performance enhancement while writing dataframes into Parquet tables

Hi,I am trying to write the contents of a dataframe into a parquet table using the command below.df.write.mode("overwrite").format("parquet").saveAsTable("sample_parquet_table")The dataframe contains an extract from one of our source systems, which h...

  • 3493 Views
  • 8 replies
  • 1 kudos
Latest Reply
MichTalebzadeh
Contributor
  • 1 kudos

Hi,I agree with the reply around the benefits of Delta tables, specifically Delta brings additional features,such as ACID transactions and schema evolution. However, I am not sure whether the problem below and I quote "The problem is, this statement ...

  • 1 kudos
7 More Replies
tarente
by New Contributor III
  • 2197 Views
  • 3 replies
  • 3 kudos

Partitioned parquet table (folder) with different structure

Hi,We have a parquet table (folder) in Azure Storage Account.The table is partitioned by column PeriodId (represents a day in the format YYYYMMDD) and has data from 20181001 until 20211121 (yesterday).We have a new development that adds a new column ...

  • 2197 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I think problem is in overwrite as when you overwrite it overwrites all folders. Solution is to mix append with dynamic overwrite so it will overwrite only folders which have data and doesn't affect old partitions:spark.conf.set("spark.sql.sources.pa...

  • 3 kudos
2 More Replies
User16826987838
by Contributor
  • 1094 Views
  • 1 replies
  • 0 kudos

Refreshing external tables

After I vacuum the tables, do i need to update the manifest table and parquet table to refresh my external tables for integrations to work?

  • 1094 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
New Contributor III
  • 0 kudos

Manifest files need to be re-created when partitions are added or altered. Since a VACUUM only deletes all historical versions, you shouldn't need to create an updated manifest file unless you are also running an OPTIMIZE.

  • 0 kudos
User16869510359
by Esteemed Contributor
  • 855 Views
  • 1 replies
  • 0 kudos
  • 855 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat.htmlThe issue mentioned in t...

  • 0 kudos
aladda
by Honored Contributor II
  • 654 Views
  • 1 replies
  • 0 kudos
  • 654 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Yes Convert to Delta allows for converting a parquet table into Delta format in place by adding a transaction log, infering the schema and also collecting stats to improve query performance - https://docs.databricks.com/spark/latest/spark-sql/languag...

  • 0 kudos
Labels