java.lang.IllegalArgumentException: java.net.URISyntaxException
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-10-2022 03:00 PM
I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day with the following error.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 7.0 failed 4 times, most recent failure: Lost task 1.3 in stage 7.0 (TID 24) (10.150.38.137 executor 0): java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: 2022-04-27T20:09:00 (Attached the complete error message)
I deleted the checkpoint, and target delta table and loaded fresh with the option "cloudFiles.includeExistingFiles":"true". All files loaded successfully and then after a couple of incremental loads the same error occurred.
Autoloader configurations
{"cloudFiles.format":"json","cloudFiles.useNotifications":"false", "cloudFiles.inferColumnTypes":"true", "cloudFiles.schemaEvolutionMode":"addNewColumns", "cloudFiles.includeExistingFiles":"false"}
Path location passed as below
raw_data_location : dbfs:/mnt/DEV-cdl-raw/data/storage-xxxxx/xxxx/
target_delta_table_location : dbfs:/mnt/DEV-cdl-bronze/data/storage-xxxxx/xxxx/
checkpoint_location : dbfs:/mnt/DEV-cdl-bronze/configuration/autoloader/storage-xxxxx/xxxx/checkpoint/
schema_location : dbfs:/mnt/DEV-cdl-bronze/metadata/storage-xxxxx/xxxx/
StreamingQuery = StreamDF.writeStream \
.option("checkpointLocation", checkpoint_location) \
.option("mergeSchema", "true") \
.queryName(f"AutoLoad_RawtoBronze_{sourceFolderName}_{sourceEntityName}") \
.trigger(availableNow=True) \
.partitionBy(targetPartitionByCol) \
.start(target_delta_table_location)
Can someone help me here?
Thanks in advance.
- Labels:
-
Databricks autoloader