cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Bug Report: SDP (DLT) with autoloader not passing through pipe delimiter/separator

ChrisLawford_n1
Contributor

I am noticing a difference between using autoloader in an interactive notebook vs using it in a Spark Declarative Pipeline (DLT Pipeline). This issue seems to be very similar to this other unanswered post from a few years ago. Bug report: the delimiter option does not work when run on DLT 
The CSV I am trying to ingest uses the pipe character (|) as its separator.

```

    df = (spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .option("delimiter", "|")
        .option("cloudFiles.schemaLocation", f"/Volumes/{location}/schema")
        .load(f"/Volumes/{path to csv}")
        .writeStream
        .option("checkpointLocation", f"/Volumes/{location}/checkpoint")
        .trigger(availableNow=True)
        .toTable(f"{table_location}")
    )
```
^ running this in a notebook serverless has no issues.

``` python
@Dlt.table(
        name=f"{bronze_raw_schema}.{table_name}",
    )
def load_table():
      return spark.readStream.format("cloudFiles")
        .format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .option("delimiter", "|")
        .option("header", "true")
        .option("cloudFiles.includeExistingFiles", "true")
        .load(f"/Volumes/{path to csv}")
```

When I run the above in a SDP (DLT) pipeline I get the following error. Which appears to be failing to delimit on the pipe (|) character and is thus hitting a length limit on the header column names.

com.databricks.sql.managedcatalog.UnityCatalogServiceException: [ErrorClass=INVALID_PARAMETER_VALUE] Invalid input: RPC UpdateTable Field managedcatalog.ColumnInfo.name: At columns.0: name "HiPortSystemCode|HiportCode|CashAccountCode|ValuationDate|ContractDate|CashTransactionId|DeletedIndicator|CashTransactionTypeCode|AccountingDate|EntryDate|SettlementDate|OriginalAccountingDate|SecurityCode|AnalysisTypeCode|TransferCashAccountCode|Transact..." too long. Maximum length is 255 characters.

0 REPLIES 0