I am noticing a difference between using autoloader in an interactive notebook vs using it in a Spark Declarative Pipeline (DLT Pipeline). This issue seems to be very similar to this other unanswered post from a few years ago. Bug report: the delimiter option does not work when run on DLT
The CSV I am trying to ingest uses the pipe character (|) as its separator.
```
df = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("delimiter", "|")
.option("cloudFiles.schemaLocation", f"/Volumes/{location}/schema")
.load(f"/Volumes/{path to csv}")
.writeStream
.option("checkpointLocation", f"/Volumes/{location}/checkpoint")
.trigger(availableNow=True)
.toTable(f"{table_location}")
)
```
^ running this in a notebook serverless has no issues.
``` python
@Dlt.table(
name=f"{bronze_raw_schema}.{table_name}",
)
def load_table():
return spark.readStream.format("cloudFiles")
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("delimiter", "|")
.option("header", "true")
.option("cloudFiles.includeExistingFiles", "true")
.load(f"/Volumes/{path to csv}")
```
When I run the above in a SDP (DLT) pipeline I get the following error. Which appears to be failing to delimit on the pipe (|) character and is thus hitting a length limit on the header column names.
com.databricks.sql.managedcatalog.UnityCatalogServiceException: [ErrorClass=INVALID_PARAMETER_VALUE] Invalid input: RPC UpdateTable Field managedcatalog.ColumnInfo.name: At columns.0: name "HiPortSystemCode|HiportCode|CashAccountCode|ValuationDate|ContractDate|CashTransactionId|DeletedIndicator|CashTransactionTypeCode|AccountingDate|EntryDate|SettlementDate|OriginalAccountingDate|SecurityCode|AnalysisTypeCode|TransferCashAccountCode|Transact..." too long. Maximum length is 255 characters.