<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Accessing shallow cloned data through an External location fails in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76606#M35277</link>
    <description>&lt;P&gt;Also the delta log couldn't have been deleted (or vaccumed) since I am running all three commands one by one.&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jul 2024 10:33:32 GMT</pubDate>
    <dc:creator>dream</dc:creator>
    <dc:date>2024-07-03T10:33:32Z</dc:date>
    <item>
      <title>Accessing shallow cloned data through an External location fails</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76216#M35163</link>
      <description>&lt;P&gt;I have two external locations. On both of these locations I have `ALL PRIVILEGES` access.&lt;/P&gt;&lt;P&gt;I am creating a table on the first external location using the following command:&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;%sql&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;create or replace&lt;/SPAN&gt; &lt;SPAN&gt;table&lt;/SPAN&gt; &lt;SPAN&gt;delta&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;`s3://avinashkhamanekar/tmp/test_table_original12`&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;select&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;from&lt;/SPAN&gt; &lt;SPAN&gt;range&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;100000&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Next, I am creating a shallow clone of this table in an another external location.&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;%sql&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;create or replace&lt;/SPAN&gt; &lt;SPAN&gt;table&lt;/SPAN&gt; &lt;SPAN&gt;delta&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;`s3://tupboto3harsh/tmp/test_table_cloned12`&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;shallow clone&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;delta.&lt;/SPAN&gt;&lt;SPAN&gt;`s3://avinashkhamanekar/tmp/test_table_original12`&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Both of these commands run successfully. But when I try to access the shallow cloned table I get the following error:&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;%sql&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;select&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; delta.&lt;/SPAN&gt;&lt;SPAN&gt;`s3://tupboto3harsh/tmp/test_table_cloned12`&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SparkException: Job aborted due to stage failure: Task 0 in stage 26.0 failed 4 times, most recent failure: Lost task 0.3 in stage 26.0 (TID 22) (ip-10-96-163-239.ap-southeast-1.compute.internal executor driver): com.databricks.sql.io.FileReadException: Error while reading file s3://avinashkhamanekar/tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet. Access Denied against cloud provider
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:724)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:691)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:818)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$hasNext$1(FileScanRDD.scala:510)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:501)
	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:2624)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$5(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$3(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$1(UnsafeRowBatchUtils.scala:68)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$2(Collector.scala:214)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:186)
	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:151)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103)
	at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108)
	at scala.util.Using$.resource(Using.scala:269)
	at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:145)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:958)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:105)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:961)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:853)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkRuntimeException: [CLOUD_PROVIDER_ERROR] Cloud provider error: AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.; request: GET https://avinashkhamanekar.s3.ap-southeast-1.amazonaws.com tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet {} Hadoop 3.3.6, aws-sdk-java/1.12.390 Linux/5.15.0-1058-aws OpenJDK_64-Bit_Server_VM/25.392-b08 java/1.8.0_392 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectRequest; Request ID: XT85JHKSN1MJM6BE, Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=, Cloud Provider: AWS, Instance ID: i-0bfe2e8c5979ba122 credentials-provider: com.amazonaws.auth.BasicAWSCredentials credential-header: AWS4-HMAC-SHA256 Credential=AKIA4MTWJV3MHJUGAQMT/20240701/ap-southeast-1/s3/aws4_request signature-present: true (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: XT85JHKSN1MJM6BE; S3 Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=; Proxy: null) SQLSTATE: 58000
	at org.apache.spark.sql.errors.QueryExecutionErrors$.cloudProviderError(QueryExecutionErrors.scala:1487)
	... 49 more
Caused by: java.nio.file.AccessDeniedException: s3a://avinashkhamanekar/tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet: open s3a://avinashkhamanekar/tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet at 0 on s3a://avinashkhamanekar/tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.; request: GET https://avinashkhamanekar.s3.ap-southeast-1.amazonaws.com tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet {} Hadoop 3.3.6, aws-sdk-java/1.12.390 Linux/5.15.0-1058-aws OpenJDK_64-Bit_Server_VM/25.392-b08 java/1.8.0_392 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectRequest; Request ID: XT85JHKSN1MJM6BE, Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=, Cloud Provider: AWS, Instance ID: i-0bfe2e8c5979ba122 credentials-provider: com.amazonaws.auth.BasicAWSCredentials credential-header: AWS4-HMAC-SHA256 Credential=AKIA4MTWJV3MHJUGAQMT/20240701/ap-southeast-1/s3/aws4_request signature-present: true (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: XT85JHKSN1MJM6BE; S3 Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=; Proxy: null), S3 Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=:InvalidAccessKeyId
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:292)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:135)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:127)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:277)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AInputStream.lambda$lazySeek$2(S3AInputStream.java:466)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$3(Invoker.java:246)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:133)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:127)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$5(Invoker.java:370)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:434)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:366)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:244)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:288)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:459)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:571)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at com.databricks.common.filesystem.LokiS3AInputStream.$anonfun$read$3(LokiS3FS.scala:254)
	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
	at com.databricks.common.filesystem.LokiS3AInputStream.withExceptionRewrites(LokiS3FS.scala:244)
	at com.databricks.common.filesystem.LokiS3AInputStream.read(LokiS3FS.scala:254)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at com.databricks.spark.metrics.FSInputStreamWithMetrics.$anonfun$read$3(FileSystemWithMetrics.scala:90)
	at com.databricks.spark.metrics.FSInputStreamWithMetrics.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:67)
	at com.databricks.spark.metrics.FSInputStreamWithMetrics.read(FileSystemWithMetrics.scala:90)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at com.databricks.sql.io.HDFSStorage.lambda$fetchRange$1(HDFSStorage.java:88)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.databricks.sql.io.HDFSStorage.submit(HDFSStorage.java:119)
	at com.databricks.sql.io.HDFSStorage.fetchRange(HDFSStorage.java:82)
	at com.databricks.sql.io.Storage.fetchRange(Storage.java:183)
	at com.databricks.sql.io.parquet.CachingParquetFileReader$FooterByteReader.readTail(CachingParquetFileReader.java:345)
	at com.databricks.sql.io.parquet.CachingParquetFileReader$FooterByteReader.read(CachingParquetFileReader.java:363)
	at com.databricks.sql.io.parquet.CachingParquetFooterReader.lambda$null$1(CachingParquetFooterReader.java:231)
	at com.databricks.sql.io.caching.NativeDiskCache$.get(Native Method)
	at com.databricks.sql.io.caching.DiskCache.get(DiskCache.scala:568)
	at com.databricks.sql.io.parquet.CachingParquetFooterReader.lambda$readFooterFromStorage$2(CachingParquetFooterReader.java:234)
	at org.apache.spark.util.JavaFrameProfiler.record(JavaFrameProfiler.java:18)
	at com.databricks.sql.io.parquet.CachingParquetFooterReader.readFooterFromStorage(CachingParquetFooterReader.java:214)
	at com.databricks.sql.io.parquet.CachingParquetFooterReader.readFooter(CachingParquetFooterReader.java:134)
	at com.databricks.sql.io.parquet.CachingParquetFileReader.readFooter(CachingParquetFileReader.java:392)
	at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.prepare(SpecificParquetRecordReaderBase.java:162)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.apply(ParquetFileFormat.scala:415)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.apply(ParquetFileFormat.scala:259)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:608)
	... 48 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.; request: GET https://avinashkhamanekar.s3.ap-southeast-1.amazonaws.com tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet {} Hadoop 3.3.6, aws-sdk-java/1.12.390 Linux/5.15.0-1058-aws OpenJDK_64-Bit_Server_VM/25.392-b08 java/1.8.0_392 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectRequest; Request ID: XT85JHKSN1MJM6BE, Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=, Cloud Provider: AWS, Instance ID: i-0bfe2e8c5979ba122 credentials-provider: com.amazonaws.auth.BasicAWSCredentials credential-header: AWS4-HMAC-SHA256 Credential=AKIA4MTWJV3MHJUGAQMT/20240701/ap-southeast-1/s3/aws4_request signature-present: true (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: XT85JHKSN1MJM6BE; S3 Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=; Proxy: null), S3 Extended Request ID: QonH0Konw+DJAtrVirDgZ7m60L/8fKKXPfh3xZEWFbMgUeh7sM70XWM4tu/iotJdreravwGh0U4=
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1524)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AInputStream.lambda$reopen$0(S3AInputStream.java:278)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:133)
	... 90 more

Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 26.0 failed 4 times, most recent failure: Lost task 0.3 in stage 26.0 (TID 22) (ip-10-96-163-239.ap-southeast-1.compute.internal executor driver): com.databricks.sql.io.FileReadException: Error while reading file s3://avinashkhamanekar/tmp/test_table_original12/part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet. Access Denied against cloud provider
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:724)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:691)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:818)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$hasNext$1(FileScanRDD.scala:510)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:501)
	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:2624)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$5(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$3(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have all the access permissions on both external locations so this error should not occur. Is this a limitation of shallow clone? Is there any documentation link for this limitation?&lt;BR /&gt;&lt;BR /&gt;The cluster DBR version is 14.3LTS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jul 2024 07:23:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76216#M35163</guid>
      <dc:creator>dream</dc:creator>
      <dc:date>2024-07-01T07:23:53Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing shallow cloned data through an External location fails</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76417#M35210</link>
      <description>&lt;P class="p1"&gt;Hello&amp;nbsp;,&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;This is an underlying exception that should occur with any SQL statement that require access to this file:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet&lt;/LI-CODE&gt;
&lt;P class="p1"&gt;It looks like the Delta log is referencing a file that doesn't exist anymore. This could happen if the file was removed by vacuum or if the file was manually deleted.&lt;BR /&gt;&lt;BR /&gt;Could you please share the results of the commands below?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;LI-CODE lang="python"&gt;# Check added files
(
    spark
    .read
    .json("s3://avinashkhamanekar/tmp/test_table_original12/_delta_log/", 
     pathGlobFilter="*.json")
    .withColumn("filename", f.input_file_name())
    .where("add.path LIKE '%part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22- 
c000.snappy.parquet%'")
    .withColumn("timestamp", f.expr("from_unixtime(add.modificationTime / 1000)"))
    .select(
        "filename",
        "timestamp",
        "add",
    )
).display()&lt;/LI-CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="p1"&gt;and&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# Check removed files
(
    spark
    .read
    .json("s3://avinashkhamanekar/tmp/test_table_original12/_delta_log/", 
            pathGlobFilter="*.json")
    .withColumn("filename", f.input_file_name())
    .where("remove.path LIKE '%part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22- 
c000.snappy.parquet%'")
    .withColumn("timestamp", f.expr("from_unixtime(remove.deletionTimestamp / 1000)"))
    .select(
        "filename",
        "timestamp",
        "remove"
    )
).display()&lt;/LI-CODE&gt;</description>
      <pubDate>Mon, 01 Jul 2024 22:03:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76417#M35210</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-07-01T22:03:12Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing shallow cloned data through an External location fails</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76604#M35275</link>
      <description>&lt;P&gt;I got these errors while running both commands&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;[&lt;A class="" href="https://docs.databricks.com/error-messages/error-classes.html#uc_command_not_supported.with_recommendation" target="_blank" rel="noopener noreferrer"&gt;UC_COMMAND_NOT_SUPPORTED.WITH_RECOMMENDATION&lt;/A&gt;] The command(s): input_file_name are not supported in Unity Catalog. Please use _metadata.file_path instead. SQLSTATE: 0AKUC&lt;BR /&gt;&lt;BR /&gt;Could you please also try this as a POC at your end using two different external locations?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jul 2024 10:22:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76604#M35275</guid>
      <dc:creator>dream</dc:creator>
      <dc:date>2024-07-03T10:22:46Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing shallow cloned data through an External location fails</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76606#M35277</link>
      <description>&lt;P&gt;Also the delta log couldn't have been deleted (or vaccumed) since I am running all three commands one by one.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jul 2024 10:33:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/76606#M35277</guid>
      <dc:creator>dream</dc:creator>
      <dc:date>2024-07-03T10:33:32Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing shallow cloned data through an External location fails</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/79470#M35767</link>
      <description>&lt;P&gt;&amp;nbsp;Regarding your staments:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;EM&gt;[&lt;A class="" href="https://docs.databricks.com/error-messages/error-classes.html#uc_command_not_supported.with_recommendation" target="_blank" rel="noopener noreferrer nofollow"&gt;UC_COMMAND_NOT_SUPPORTED.WITH_RECOMMENDATION&lt;/A&gt;] The command(s): input_file_name are not supported in Unity Catalog. Please use _metadata.file_path instead. SQLSTATE: 0AKUC&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;Could you please also try this as a POC at your end using two different external locations?&lt;/EM&gt;&lt;/PRE&gt;
&lt;P&gt;input_file_name is a reserved function and use to be available on older DBRs.&amp;nbsp;In Databricks SQL and Databricks Runtime 13.3 LTS and above this function is deprecated. Please use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reference internal" href="https://docs.databricks.com/en/ingestion/file-metadata-column.html" target="_blank" rel="noopener"&gt;&lt;SPAN class="doc"&gt;_metadata.file_name&lt;/SPAN&gt;&lt;/A&gt;. Source: &lt;A href="https://docs.databricks.com/en/sql/language-manual/functions/input_file_name.html#:~:text=Returns%20the%20name%20of%20the,above%20this%20function%20is%20deprecated." target="_self"&gt;input_file_name function.&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I can try to setup a POC but before let's make sure we are not facing any other exceptions.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;EM&gt;Also the delta log couldn't have been deleted (or vaccumed) since I am running all three commands one by one.&lt;/EM&gt;&lt;/PRE&gt;
&lt;DIV class="section"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="section"&gt;Yes the _delta_logs folder could've been deleted but this would throw a different exception. _delta_logs deletion won't be performed by any Databricks service unless a query is ran to delete the table or the folder is manually removed.&amp;nbsp;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;VACUUM&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;shouldn't be the cause of a _delta_logs folder deletion as it will skip all directories that begin with an underscore (&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;_&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;), which includes the&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;_delta_log&lt;/SPAN&gt;&lt;/CODE&gt;. Source:&amp;nbsp;&lt;A href="http://&amp;nbsp;https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html#vacuum-a-delta-table" target="_self"&gt;Vacuum a Delta table&lt;/A&gt;.&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;Please let me know if you're able to progress with your implementation.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jul 2024 16:08:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-shallow-cloned-data-through-an-external-location-fails/m-p/79470#M35767</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-07-19T16:08:43Z</dc:date>
    </item>
  </channel>
</rss>

