<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta table takes too long to write due to S3 full scan in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/83837#M37025</link>
    <description>&lt;P&gt;I also observe that both pipelines have "&lt;SPAN&gt;Metastore is down." events. However, logs contain no stacktraces describing what it actually means&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 21 Aug 2024 18:35:22 GMT</pubDate>
    <dc:creator>ivanychev</dc:creator>
    <dc:date>2024-08-21T18:35:22Z</dc:date>
    <item>
      <title>Delta table takes too long to write due to S3 full scan</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/83836#M37021</link>
      <description>&lt;P&gt;DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.&lt;/P&gt;&lt;P&gt;On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overwrite`, `mergeSchema=true`, `replaceWhere=&amp;lt;day partition&amp;gt;`&lt;/P&gt;&lt;P&gt;Adding `&lt;SPAN&gt;DO_NOT_UPDATE_STATS=true` to table properties didn't help. Adding `&lt;/SPAN&gt;spark.databricks.hive.stats.autogather': 'false', 'spark.hadoop.hive.stats.autogather': 'false'` to options didn't help either.&lt;BR /&gt;&lt;BR /&gt;I opened the Driver's Thread dump and observed a curious stack trace (attached below)&lt;/P&gt;&lt;P&gt;# Question 1: why `createTable` gets invoked by `updateCatalog`?&lt;/P&gt;&lt;PRE&gt;app//com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.createTable(ManagedCatalogSessionCatalog.scala:763) &lt;BR /&gt;app//com.databricks.sql.&lt;STRONG&gt;DatabricksSessionCatalog.createTable&lt;/STRONG&gt;(DatabricksSessionCatalog.scala:233) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.updateCatalog(CreateDeltaTableCommand.scala:873) &lt;/PRE&gt;&lt;P&gt;This &lt;A href="https://github.com/delta-io/delta/blob/71b09f0027c2940806ad2022a6b9fcd10505f3fd/spark/src/main/scala/org/apache/spark/sql/delta/commands/CreateDeltaTableCommand.scala#L555" target="_self"&gt;open search&lt;/A&gt;&amp;nbsp;implementation of&amp;nbsp;updateCatalog creates table only if it doesn't exist, but our table &lt;STRONG&gt;does exist&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;# Question 2:&amp;nbsp;updateTableStatsFast takes all time and scans the whole table&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;com.amazonaws.glue.shims.&lt;STRONG&gt;AwsGlueSparkHiveShims.updateTableStatsFast&lt;/STRONG&gt;(AwsGlueSparkHiveShims.java:62) &lt;BR /&gt;com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:444) &lt;/PRE&gt;&lt;P&gt;How do I opt out from updating Glue stats? They are mostly useless but in this particular case it causes full listing of the whole Delta table on S3 with every write.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# Observed stack trace&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;java.base@17.0.12/jdk.internal.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6fdb721 &lt;BR /&gt;java.base@17.0.12/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341) &lt;BR /&gt;java.base@17.0.12/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:506) &lt;BR /&gt;java.base@17.0.12/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3465) &lt;BR /&gt;java.base@17.0.12/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3436) &lt;BR /&gt;java.base@17.0.12/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1625) &lt;BR /&gt;app//org.spark_project.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1323) &lt;BR /&gt;app//org.spark_project.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:306) &lt;BR /&gt;app//org.spark_project.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:223) &lt;BR /&gt;app//org.apache.spark.sql.hive.client.LocalHiveClientsPool.super$borrowObject(LocalHiveClientImpl.scala:131) &lt;BR /&gt;app//org.apache.spark.sql.hive.client.LocalHiveClientsPool.$anonfun$borrowObject$1(LocalHiveClientImpl.scala:131) &lt;BR /&gt;app//org.apache.spark.sql.hive.client.LocalHiveClientsPool$$Lambda$5925/0x00007fbcc5678f58.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:411) &lt;BR /&gt;app//com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:397) &lt;BR /&gt;app//com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34) &lt;BR /&gt;app//org.apache.spark.sql.hive.client.LocalHiveClientsPool.borrowObject(LocalHiveClientImpl.scala:129) &lt;BR /&gt;app//org.apache.spark.sql.hive.client.PoolingHiveClient.retain(PoolingHiveClient.scala:181) &lt;BR /&gt;app//org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:114) &lt;BR /&gt;app//org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:154) &lt;BR /&gt;app//org.apache.spark.sql.hive.HiveExternalCatalog$$Lambda$5854/0x00007fbcc5655bc0.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:411) &lt;BR /&gt;app//com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:397) &lt;BR /&gt;app//com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34) &lt;BR /&gt;app//org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:153) &lt;BR /&gt;app//org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:333) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.$anonfun$databaseExists$1(ExternalCatalogWithListener.scala:93) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener$$Lambda$7640/0x00007fbcc5ae4868.apply$mcZ$sp(Unknown Source) &lt;BR /&gt;app//scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:984) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.$anonfun$profile$1(ExternalCatalogWithListener.scala:54) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener$$Lambda$7544/0x00007fbcc5ab4478.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.profile(ExternalCatalogWithListener.scala:53) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:93) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.databaseExists(SessionCatalog.scala:837) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.requireDbExists(SessionCatalog.scala:766) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.createTable(SessionCatalog.scala:930) &lt;BR /&gt;app//com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.createTableInternal(ManagedCatalogSessionCatalog.scala:802) &lt;BR /&gt;app//com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.createTable(ManagedCatalogSessionCatalog.scala:763) &lt;BR /&gt;app//com.databricks.sql.DatabricksSessionCatalog.createTable(DatabricksSessionCatalog.scala:233) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.updateCatalog(CreateDeltaTableCommand.scala:873) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.runPostCommitUpdates(CreateDeltaTableCommand.scala:279) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.handleCommit(CreateDeltaTableCommand.scala:259) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.$anonfun$run$2(CreateDeltaTableCommand.scala:169) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$Lambda$8751/0x00007fbcc5c96140.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag(DeltaLogging.scala:225) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:212) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.withOperationTypeTag(CreateDeltaTableCommand.scala:70) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:164) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging$$Lambda$8219/0x00007fbcc5bd4c78.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:294) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:292) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.recordFrameProfile(CreateDeltaTableCommand.scala:70) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:163) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging$$Lambda$8217/0x00007fbcc5bd46d8.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:573) &lt;BR /&gt;app//com.databricks.logging.UsageLogging$$Lambda$681/0x00007fbcc3eb28e8.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:669) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:687) &lt;BR /&gt;app//com.databricks.logging.UsageLogging$$Lambda$684/0x00007fbcc3eb3158.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:426) &lt;BR /&gt;app//com.databricks.logging.UsageLogging$$Lambda$591/0x00007fbcc3e53418.apply(Unknown Source) &lt;BR /&gt;app//scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) &lt;BR /&gt;app//com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:216) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:424) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418) &lt;BR /&gt;app//com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:27) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:472) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:455) &lt;BR /&gt;app//com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:27) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:664) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:582) &lt;BR /&gt;app//com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:27) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:573) &lt;BR /&gt;app//com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:542) &lt;BR /&gt;app//com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:27) &lt;BR /&gt;app//com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:68) &lt;BR /&gt;app//com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:150) &lt;BR /&gt;app//com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:68) &lt;BR /&gt;app//com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:55) &lt;BR /&gt;app//com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:109) &lt;BR /&gt;app//com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:429) &lt;BR /&gt;app//com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:408) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.recordOperation(CreateDeltaTableCommand.scala:70) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:162) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:152) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:142) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.recordDeltaOperation(CreateDeltaTableCommand.scala:70) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.run(CreateDeltaTableCommand.scala:148) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog.$anonfun$createDeltaTable$1(DeltaCatalog.scala:335) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog$$Lambda$8054/0x00007fbcc5b93988.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:294) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:292) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog.recordFrameProfile(DeltaCatalog.scala:117) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog.com$databricks$sql$transaction$tahoe$catalog$DeltaCatalog$$createDeltaTable(DeltaCatalog.scala:158) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog$StagedDeltaTableV2.$anonfun$commitStagedChanges$1(DeltaCatalog.scala:1130) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog$StagedDeltaTableV2$$Lambda$8051/0x00007fbcc5b92aa0.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:294) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:292) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog.recordFrameProfile(DeltaCatalog.scala:117) &lt;BR /&gt;app//com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog$StagedDeltaTableV2.commitStagedChanges(DeltaCatalog.scala:1089) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CreateTableAsSelectBaseExec.$anonfun$writeToTable$2(WriteToDataSourceV2Exec.scala:674) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CreateTableAsSelectBaseExec$$Lambda$7718/0x00007fbcc5b11b18.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1546) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CreateTableAsSelectBaseExec.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:661) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CreateTableAsSelectBaseExec$$Lambda$7717/0x00007fbcc5b11848.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CreateTableAsSelectBaseExec.writeToTable(WriteToDataSourceV2Exec.scala:679) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CreateTableAsSelectBaseExec.writeToTable$(WriteToDataSourceV2Exec.scala:655) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.AtomicReplaceTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:210) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.AtomicReplaceTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:268) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$2(V2CommandExec.scala:48) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CommandExec$$Lambda$7696/0x00007fbcc5affbe0.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.execution.SparkPlan.runCommandWithAetherOff(SparkPlan.scala:178) &lt;BR /&gt;app//org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:189) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$1(V2CommandExec.scala:48) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CommandExec$$Lambda$7695/0x00007fbcc5aff910.apply(Unknown Source) &lt;BR /&gt;app//com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:47) - locked org.apache.spark.sql.execution.datasources.v2.AtomicReplaceTableAsSelectExec@21108331 &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:45) &lt;BR /&gt;app//org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:56) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$4(QueryExecution.scala:358) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1$$Lambda$5795/0x00007fbcc563ea80.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:166) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:358) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1$$Lambda$5226/0x00007fbcc5472800.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$9(SQLExecution.scala:392) &lt;BR /&gt;app//org.apache.spark.sql.execution.SQLExecution$$$Lambda$5239/0x00007fbcc54785b8.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:700) &lt;BR /&gt;app//org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:277) &lt;BR /&gt;app//org.apache.spark.sql.execution.SQLExecution$$$Lambda$5228/0x00007fbcc5472da0.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175) &lt;BR /&gt;app//org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:164) &lt;BR /&gt;app//org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:637) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:357) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1$$Lambda$5225/0x00007fbcc5472530.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1103) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:353) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1$$Lambda$5224/0x00007fbcc54703f0.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:312) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:350) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:334) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:505) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$3659/0x00007fbcc4def7d8.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:505) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:39) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:343) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:339) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:481) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:334) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution$$Lambda$3864/0x00007fbcc4eec870.apply(Unknown Source) &lt;BR /&gt;app//org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:400) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:334) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:271) - locked org.apache.spark.sql.execution.QueryExecution@febcbf6 &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:268) &lt;BR /&gt;app//org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:429) &lt;BR /&gt;app//org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:1040) &lt;BR /&gt;app//org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:746) &lt;BR /&gt;app//org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:677) &lt;BR /&gt;java.base@17.0.12/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) &lt;BR /&gt;java.base@17.0.12/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) &lt;BR /&gt;java.base@17.0.12/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) &lt;BR /&gt;java.base@17.0.12/java.lang.reflect.Method.invoke(Method.java:569) &lt;BR /&gt;app//py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) &lt;BR /&gt;app//py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) &lt;BR /&gt;app//py4j.Gateway.invoke(Gateway.java:306) &lt;BR /&gt;app//py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) &lt;BR /&gt;app//py4j.commands.CallCommand.execute(CallCommand.java:79) &lt;BR /&gt;app//py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) &lt;BR /&gt;app//py4j.ClientServerConnection.run(ClientServerConnection.java:119) &lt;BR /&gt;java.base@17.0.12/java.lang.Thread.run(Thread.java:840)&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Aug 2024 18:02:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/83836#M37021</guid>
      <dc:creator>ivanychev</dc:creator>
      <dc:date>2024-08-21T18:02:17Z</dc:date>
    </item>
    <item>
      <title>Re: Delta table takes too long to write due to S3 full scan</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/83837#M37025</link>
      <description>&lt;P&gt;I also observe that both pipelines have "&lt;SPAN&gt;Metastore is down." events. However, logs contain no stacktraces describing what it actually means&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Aug 2024 18:35:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/83837#M37025</guid>
      <dc:creator>ivanychev</dc:creator>
      <dc:date>2024-08-21T18:35:22Z</dc:date>
    </item>
    <item>
      <title>Re: Delta table takes too long to write due to S3 full scan</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/83841#M37029</link>
      <description>&lt;P&gt;We also observe that `METASTORE_DOWN` even correlates with following logs in `log4j`. (all&amp;nbsp;&amp;lt;redacted_value&amp;gt;s are unique)&lt;BR /&gt;&lt;BR /&gt;```&lt;BR /&gt;24/08/21 12:27:02 INFO GenerateSymlinkManifest: Generated manifest partitions for s3://constructor-analytics-data/tables/delta_prod/query_item_pairs_from_qrl [379]:&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;day=2024-07-01/ac_key=&amp;lt;redacted_value&amp;gt;&lt;BR /&gt;```&lt;/P&gt;</description>
      <pubDate>Wed, 21 Aug 2024 19:07:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/83841#M37029</guid>
      <dc:creator>ivanychev</dc:creator>
      <dc:date>2024-08-21T19:07:55Z</dc:date>
    </item>
    <item>
      <title>Re: Delta table takes too long to write due to S3 full scan</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/85787#M37264</link>
      <description>&lt;P&gt;&lt;STRONG&gt;spark.databricks.delta.catalog.update.enabled=true&lt;/STRONG&gt;&amp;nbsp;setting helped but I still don't understand why the problem started to occur.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-legacy" target="_self"&gt;https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-legacy&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2024 11:16:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-takes-too-long-to-write-due-to-s3-full-scan/m-p/85787#M37264</guid>
      <dc:creator>ivanychev</dc:creator>
      <dc:date>2024-08-28T11:16:27Z</dc:date>
    </item>
  </channel>
</rss>

