cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

FabriceDeseyn
by Contributor
  • 3417 Views
  • 5 replies
  • 6 kudos

Resolved! What does autoloader's cloudfiles.backfillInterval do?

I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...

image
  • 3417 Views
  • 5 replies
  • 6 kudos
Latest Reply
Kiranrathod
New Contributor III
  • 6 kudos

Hi @Lakshay Goel​ ,where can I set the backFillInterval property in the code? Do you have any sample codes for this use case?

  • 6 kudos
4 More Replies
logan0015
by Contributor
  • 2462 Views
  • 6 replies
  • 4 kudos

Resolved! Getting a key mismatch error with Delta Live Tables.

I am attempting to create a streaming delta live table. The main issue I am experiencing is the error below.com.databricks.sql.cloudfiles.errors.CloudFilesIllegalStateException: Found mismatched event: keyI have an aws appflow that is creating a fold...

  • 2462 Views
  • 6 replies
  • 4 kudos
Latest Reply
VijaC_97468
New Contributor II
  • 4 kudos

Hi, I am also facing the same issue, but I found nothing on the documentation to fix it.

  • 4 kudos
5 More Replies
MRTN
by New Contributor III
  • 747 Views
  • 1 replies
  • 1 kudos

Columns archive_time, commit_time, archive_time always NULL when running cloud_files_state

Am attempting to find the commit_time for a given file for a delta table using the cloud_files_state command. However, the archive_time, commit_time, and archive_time coluns are always NULL. I am running databrics runtime 11.3 and have also verified ...

cloud_files_state
  • 747 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Morten Stakkeland​ :The issue you are facing with the cloud_files_state command is a known limitation in Delta Lake as of the latest stable release (Delta Lake 1.0). The commit_time and protocol columns are always null, and the archive_time column i...

  • 1 kudos
Ria
by New Contributor
  • 657 Views
  • 1 replies
  • 1 kudos

py4j.security.Py4JSecurityException

Getting this error while loading data with autoloader. Although table access control is already disabled still getting this error."py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql...

image
  • 657 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi,Are you using a High concurrency cluster? which DBR version are you running?

  • 1 kudos
Malcoln_Dandaro
by New Contributor
  • 1133 Views
  • 0 replies
  • 0 kudos

Is there any way to navigate/access cloud files using the direct abfss URI (no mount) with default python functions/libs like open() or os.listdir()?

Hello, Today on our workspace we access everything via mount points, we plan to change it to "abfss://" because of security, governance and performance reasons. The problem is sometimes we interact with files using "python only" code, and apparently ...

  • 1133 Views
  • 0 replies
  • 0 kudos
tej1
by New Contributor III
  • 1685 Views
  • 6 replies
  • 7 kudos

Resolved! Trouble accessing `_metadata` column using cloudFiles in Delta Live Tables

We are building a delta live pipeline where we ingest csv files in AWS S3 using cloudFiles. And it is necessary to access the file modification timestamp of the file. As documented here, we tried selecting `_metadata` column in a task in delta live p...

  • 1685 Views
  • 6 replies
  • 7 kudos
Latest Reply
tej1
New Contributor III
  • 7 kudos

Update: We were able to test `_metadata` column feature in DLT "preview" mode (which is DBR 11.0). Databricks doesn't recommend production workloads when using "preview" mode, but nevertheless, glad to be using this feature in DLT.

  • 7 kudos
5 More Replies
Michael_Galli
by Contributor II
  • 2518 Views
  • 3 replies
  • 2 kudos

Resolved! Spark Streaming - only process new files in streaming path?

In our streaming jobs, we currently run streaming (cloudFiles format) on a directory with sales transactions coming every 5 minutes.In this directory, the transactions are ordered in the following format:<streaming-checkpoint-root>/<transaction_date>...

  • 2518 Views
  • 3 replies
  • 2 kudos
Latest Reply
Michael_Galli
Contributor II
  • 2 kudos

Update:Seems that maxFileAge was not a good idea. The following with the option "includeExistingFiles" = False solved my problem:streaming_df = ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", extension) .option("...

  • 2 kudos
2 More Replies
Labels