<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ingest a .csv file with spaces in column names using Delta Live into a streaming table in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live/m-p/11059#M548</link>
    <description>&lt;P&gt;After additional googling on "withColumnRenamed", I was able to replace all spaces in column names with "_" all at once by using select and alias instead:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;@dlt.view(
  comment=""
)
def vw_raw():          
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .options(header='true')
      .option("inferSchema", "true")
      .load(path_to_load) 
  )
&amp;nbsp;
@dlt.table(
  comment=""
)
def table_raw():          
  return (
    dlt.readStream("vw_raw")
      .select([col(c).alias(c.replace(" ", "_")) for c in dlt.readStream("vw_raw").columns]) 
  )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;It also works using "cloudFiles.inferColumnTypes" = "true" and "cloudFiles.schemaHints" in the view definition.&lt;/P&gt;</description>
    <pubDate>Thu, 11 Aug 2022 12:30:07 GMT</pubDate>
    <dc:creator>vaver_3</dc:creator>
    <dc:date>2022-08-11T12:30:07Z</dc:date>
    <item>
      <title>ingest a .csv file with spaces in column names using Delta Live into a streaming table</title>
      <link>https://community.databricks.com/t5/machine-learning/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live/m-p/11058#M547</link>
      <description>&lt;P&gt;How do I ingest a .csv file with spaces in column names using Delta Live into a streaming table?  All of the fields should be read using the default behavior .csv files for DLT autoloader - as strings.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Running the pipeline gives me an error about invalid characters in the column names of my schema.  (&lt;I&gt;"Found invalid character(s) among " ,;{}()\n\t=" in the column names of your&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;schema."&lt;/I&gt;)  However, adding column mapping as a table property (as recommended in the full error comment) then gives me the error &lt;I&gt;"com.databricks.sql.transaction.tahoe.ColumnMappingUnsupportedException: &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Schema change is detected:"&lt;/I&gt; and lets me know &lt;I&gt;"Schema changes are not allowed during the change of column mapping mode."&lt;/I&gt;   &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've even tried setting the schema both in the table info and when reading the .csv.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = spark.read.format('csv').options(header='true').load(path_to_load)
tbl_schema = df.schema.add("_rescued_data","string",True)&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;@dlt.table(
  comment="comment",
  schema=tbl_schema, 
  table_properties={
    'delta.minReaderVersion' : '2', 
    'delta.minWriterVersion' : '5', 
    'delta.columnMapping.mode' : 'name'
  }
)
def BB_EDIP_raw():          
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .options(header='true')
      # .option("inferSchema", "true")
      .schema(tbl_schema)
      .load(path_to_load) 
  )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I still get the same error - that there is schema change from the old schema of just "root" to the new schema of root/all the fields (see below - list of fields shortened):&lt;/P&gt;&lt;P&gt;&lt;I&gt;com.databricks.sql.transaction.tahoe.ColumnMappingUnsupportedException: &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Schema change is detected:&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;old schema:&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;root&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;new schema:&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;root&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- UniqueID: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- FirstName: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- MiddleName: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- LastName: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- HOME_BUSINESS: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- BUSINESS_OWNER1: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- Not in use: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- EDUC_MODEL1: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- Political Affiliation: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- Working Couples Dual Income: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- Online Score: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;-- _rescued_data: string (nullable = true)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Schema changes are not allowed during the change of column mapping mode.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, how do I ingest a .csv file with spaces in column names using Delta Live into a streaming table?  Is it possible?  Should I be trying a different method?  These files are provided to us by a vendor, so I would like to not have to pre-process them just to get the raw/bronze layer loaded.  Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For reference, here is the first error about spaces in column names:&lt;/P&gt;&lt;P&gt;&lt;I&gt;org.apache.spark.sql.AnalysisException: &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Found invalid character(s) among " ,;{}()\n\t=" in the column names of your&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;schema. &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Please enable column mapping by setting table property 'delta.columnMapping.mode' to 'name'.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;For more details, refer to &lt;A href="https://docs.databricks.com/delta/delta-column-mapping.html" target="test_blank"&gt;https://docs.databricks.com/delta/delta-column-mapping.html&lt;/A&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Or you can use alias to rename it.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;org.apache.spark.sql.AnalysisException:  Column name "Not in use" contains invalid character(s). Please use alias to rename it.    &lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2022 19:45:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live/m-p/11058#M547</guid>
      <dc:creator>vaver_3</dc:creator>
      <dc:date>2022-08-05T19:45:07Z</dc:date>
    </item>
    <item>
      <title>Re: ingest a .csv file with spaces in column names using Delta Live into a streaming table</title>
      <link>https://community.databricks.com/t5/machine-learning/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live/m-p/11059#M548</link>
      <description>&lt;P&gt;After additional googling on "withColumnRenamed", I was able to replace all spaces in column names with "_" all at once by using select and alias instead:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;@dlt.view(
  comment=""
)
def vw_raw():          
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .options(header='true')
      .option("inferSchema", "true")
      .load(path_to_load) 
  )
&amp;nbsp;
@dlt.table(
  comment=""
)
def table_raw():          
  return (
    dlt.readStream("vw_raw")
      .select([col(c).alias(c.replace(" ", "_")) for c in dlt.readStream("vw_raw").columns]) 
  )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;It also works using "cloudFiles.inferColumnTypes" = "true" and "cloudFiles.schemaHints" in the view definition.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2022 12:30:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live/m-p/11059#M548</guid>
      <dc:creator>vaver_3</dc:creator>
      <dc:date>2022-08-11T12:30:07Z</dc:date>
    </item>
  </channel>
</rss>

