- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-15-2026 04:01 PM
I am noticing a difference between using autoloader in an interactive notebook vs using it in a Spark Declarative Pipeline (DLT Pipeline). This issue seems to be very similar to this other unanswered post from a few years ago. Bug report: the delimiter option does not work when run on DLT
The CSV I am trying to ingest uses the pipe character (|) as its separator.
```
df = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("delimiter", "|")
.option("cloudFiles.schemaLocation", f"/Volumes/{location}/schema")
.load(f"/Volumes/{path to csv}")
.writeStream
.option("checkpointLocation", f"/Volumes/{location}/checkpoint")
.trigger(availableNow=True)
.toTable(f"{table_location}")
)
```
^ running this in a notebook serverless has no issues.
``` python
@Dlt.table(
name=f"{bronze_raw_schema}.{table_name}",
)
def load_table():
return spark.readStream.format("cloudFiles")
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("delimiter", "|")
.option("header", "true")
.option("cloudFiles.includeExistingFiles", "true")
.load(f"/Volumes/{path to csv}")
```
When I run the above in a SDP (DLT) pipeline I get the following error. Which appears to be failing to delimit on the pipe (|) character and is thus hitting a length limit on the header column names.
com.databricks.sql.managedcatalog.UnityCatalogServiceException: [ErrorClass=INVALID_PARAMETER_VALUE] Invalid input: RPC UpdateTable Field managedcatalog.ColumnInfo.name: At columns.0: name "HiPortSystemCode|HiportCode|CashAccountCode|ValuationDate|ContractDate|CashTransactionId|DeletedIndicator|CashTransactionTypeCode|AccountingDate|EntryDate|SettlementDate|OriginalAccountingDate|SecurityCode|AnalysisTypeCode|TransferCashAccountCode|Transact..." too long. Maximum length is 255 characters.
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-18-2026 05:42 AM
Hi @ChrisLawford_n1, can you try doing a full refresh of this table? The old schema may still be in the schema location.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2026 01:23 AM
Hey,
Okay thanks @nikhilj0421. I have now solved the issue but not with a full refresh of the table. I had tried this previously and even deleted the DLT pipeline hoping that would provide me the clean slate if this lingering schema was an issue but when re-running the pipeline it still failed with the same issue. The solution was to change the output schema(in my case to {schema_name}_2) of the table/pipeline and run again. I was then able to again alter the schema back to the original schema_name and this time everything worked without a problem.
Overall I have learned that DLT pipelines must store more information than I previously thought when deleting them or full refreshing them.