Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2022 10:07 AM
We are building a delta live pipeline where we ingest csv files in AWS S3 using cloudFiles.
And it is necessary to access the file modification timestamp of the file.
As documented here, we tried selecting `_metadata` column in a task in delta live pipelines without success. Are we doing something wrong?
The code snippet is below:
@dlt.table(
name = "bronze",
comment = f"New {SCHEMA} data incrementally ingested from S3",
table_properties = {
"quality": "bronze"
}
)
def bronze_job():
return spark \
.readStream \
.format("cloudFiles") \
.option("cloudFiles.useNotifications", "true") \
.option("cloudFiles.format", "csv") \
.option("cloudFiles.region", "eu-west-1") \
.option("delimiter", ",") \
.option("escape", "\"") \
.option("header", "false") \
.option("encoding", "UTF-8") \
.schema(cdc_schema) \
.load("/mnt/%s/cdc/%s" % (RAW_MOUNT_NAME, SCHEMA)) \
.select("*", "_metadata")Thanks.
Tejas
Labels:
- Labels:
-
CloudFiles