<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Space in Column names when writing to Hive in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/46342#M1222</link>
    <description>&lt;P&gt;All,&lt;/P&gt;&lt;P&gt;I have the following code.&lt;BR /&gt;df_Warehouse_Utilization = (&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; spark.table("hive_metastore.dev_ork.bin_item_detail")&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .join(df_DIM_Bins,col('bin_tag')==df_DIM_Bins.BinKey,'right')&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .groupby(col('BinKey'))&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .agg(count_distinct(when(col('serial_lot_tag')&amp;gt;0,col('serial_lot_tag'))).alias('SerialLotCount'),&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;count_distinct(when(col('tracking_tag')&amp;gt;0,col('tracking_tag'))).alias('Track Count'),&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;min(col('activation_time')).alias('OldestStorageTime')&lt;BR /&gt;&amp;nbsp; &amp;nbsp; )&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .withColumn('BinUsedFlag',when(col('PartNumbers')&amp;gt;0,1).otherwise(0))&lt;/P&gt;&lt;P&gt;)&lt;BR /&gt;df_Warehouse_Utilization.write.mode("overwrite").format("delta").saveAsTable("yorkgold.df_DIM_Binsst")&lt;BR /&gt;display(df_Warehouse_Utilization)&lt;/P&gt;&lt;P&gt;when I write to table after the above transformation, the following error occurs.&lt;BR /&gt;AnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.&lt;/P&gt;&lt;P&gt;The issue is because I have the space in the column name (Track Count, please see the above highlighted), As soon as I rename column as (TrackCount), I am able to write to Hive.&lt;BR /&gt;So my question is, space is not allowed? Or is there any other option?&lt;/P&gt;</description>
    <pubDate>Wed, 27 Sep 2023 00:40:14 GMT</pubDate>
    <dc:creator>JohnJustus</dc:creator>
    <dc:date>2023-09-27T00:40:14Z</dc:date>
    <item>
      <title>Space in Column names when writing to Hive</title>
      <link>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/46342#M1222</link>
      <description>&lt;P&gt;All,&lt;/P&gt;&lt;P&gt;I have the following code.&lt;BR /&gt;df_Warehouse_Utilization = (&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; spark.table("hive_metastore.dev_ork.bin_item_detail")&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .join(df_DIM_Bins,col('bin_tag')==df_DIM_Bins.BinKey,'right')&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .groupby(col('BinKey'))&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .agg(count_distinct(when(col('serial_lot_tag')&amp;gt;0,col('serial_lot_tag'))).alias('SerialLotCount'),&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;count_distinct(when(col('tracking_tag')&amp;gt;0,col('tracking_tag'))).alias('Track Count'),&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;min(col('activation_time')).alias('OldestStorageTime')&lt;BR /&gt;&amp;nbsp; &amp;nbsp; )&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .withColumn('BinUsedFlag',when(col('PartNumbers')&amp;gt;0,1).otherwise(0))&lt;/P&gt;&lt;P&gt;)&lt;BR /&gt;df_Warehouse_Utilization.write.mode("overwrite").format("delta").saveAsTable("yorkgold.df_DIM_Binsst")&lt;BR /&gt;display(df_Warehouse_Utilization)&lt;/P&gt;&lt;P&gt;when I write to table after the above transformation, the following error occurs.&lt;BR /&gt;AnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.&lt;/P&gt;&lt;P&gt;The issue is because I have the space in the column name (Track Count, please see the above highlighted), As soon as I rename column as (TrackCount), I am able to write to Hive.&lt;BR /&gt;So my question is, space is not allowed? Or is there any other option?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 00:40:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/46342#M1222</guid>
      <dc:creator>JohnJustus</dc:creator>
      <dc:date>2023-09-27T00:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: Space in Column names when writing to Hive</title>
      <link>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/46524#M1291</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/89587"&gt;@JohnJustus&lt;/a&gt;&amp;nbsp;You can try using backticks (`) to escape a column names that contains spaces, but the suggestion here is to get rid of spaces or any special characters in a column names.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Sep 2023 05:50:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/46524#M1291</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2023-09-28T05:50:57Z</dc:date>
    </item>
    <item>
      <title>Re: Space in Column names when writing to Hive</title>
      <link>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/46889#M1364</link>
      <description>&lt;P&gt;Thank you Daniel&lt;/P&gt;&lt;P&gt;So I have my code below with back stick but still doesn't work -&amp;nbsp;&amp;nbsp; col('bin_type').alias(`'Bin Type'`),&lt;/P&gt;&lt;P&gt;Can you please correct me?&lt;/P&gt;&lt;P&gt;df_Bins_Step1 = (&lt;BR /&gt;&amp;nbsp; &amp;nbsp; spark.table("hive_metastore.dev_fivetranpoc_york.bin_master")&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .select(col('tag_number').alias('BinKey'),&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col('bin').alias('Bin'),&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col('bin_type').alias(`'Bin Type'`),&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col('zone').alias('Zone'))&lt;BR /&gt;)&lt;BR /&gt;display(df_Bins_Step1)&lt;/P&gt;</description>
      <pubDate>Sat, 30 Sep 2023 00:44:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/46889#M1364</guid>
      <dc:creator>JohnJustus</dc:creator>
      <dc:date>2023-09-30T00:44:56Z</dc:date>
    </item>
    <item>
      <title>Re: Space in Column names when writing to Hive</title>
      <link>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/98401#M4563</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have faced this issue a few times. When we are overwriting the dataframes to hive catalog in databricks, it doesn't naturally allow for column names to have spaces or special characters. However, you can add an option statement to bypass that rule and save your dataframes as tables with column names having spaces or special characters.&lt;/P&gt;&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;Solution:&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt;Let's say your dataframe name is &lt;STRONG&gt;'df'.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;The normal pyspark command for saving tables is:&amp;nbsp;&lt;/U&gt;&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&amp;gt;&amp;gt;&amp;nbsp;&lt;STRONG&gt;&lt;STRONG&gt;&lt;SPAN&gt;df.write.&lt;SPAN&gt;mode&lt;SPAN&gt;(&lt;SPAN&gt;"overwrite"&lt;SPAN&gt;).format("delta")&lt;SPAN&gt;.&lt;SPAN&gt;saveAsTable&lt;SPAN&gt;(&lt;SPAN&gt;"catalog.schema_name.table_name"&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;This will give you the&amp;nbsp;&lt;SPAN&gt;&lt;SPAN&gt;AnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema error.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;You need to add an 'option statement' - &lt;STRONG&gt;".option("delta.columnMapping.mode", "name")"&amp;nbsp;&lt;/STRONG&gt;to the above command after overwrite, to resolve it.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;&lt;U&gt;&lt;U&gt;The final working command would be:&lt;/U&gt;&lt;/U&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;gt;&amp;gt;&amp;nbsp;&amp;nbsp;&lt;STRONG&gt;df.write.mode("overwrite").option("delta.columnMapping.mode", "name").format("delta").saveAsTable("catalog.schema_name.table_name")&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;&lt;DIV&gt;The above edit, resolves the issue for me and the table is perfectly saved even if it has spaces or special characters in the schema.&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;Hope it works for you as well. If it does, happy to help!!&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 12 Nov 2024 04:47:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/space-in-column-names-when-writing-to-hive/m-p/98401#M4563</guid>
      <dc:creator>KandyKad</dc:creator>
      <dc:date>2024-11-12T04:47:41Z</dc:date>
    </item>
  </channel>
</rss>

