cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Poovarasan
New Contributor II

Hi,

Currently, I am using the below-mentioned query to create a materialized view. It was working fine until yesterday in the DLT pipeline, but from today on, the below-provided code throws an error (com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum). However, it works when I select only 4 columns in the select. Please let us know if anyone knows about this issue.

CREATE LIVE TABLE Sales
TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5',
    'delta.columnMapping.mode' = 'name'
  )
AS
select Product_name as `Product Name`, id as `Order ID`,Date as `Order Date`, Price as `Total Amount`, Delivery_Date as 'Delivery Date'
from raw_Sales
2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @PoovarasanThe error message you’re encountering, com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id 2 in column, indicates that there is a conflict related to column IDs in your query.

Let’s break down the issue and explore potential solutions:

  1. Column Mapping Mode:

    • You’ve set the delta.columnMapping.mode property to 'name', which means that column mapping is based on column names rather than IDs.
    • However, despite this setting, you’re still encountering a duplicate column ID issue.
  2. Possible Causes:

    • The error suggests that there are duplicate column IDs (2) in your query.
    • This could happen if there are multiple columns with the same name in your source table (raw_Sales).
  3. Resolution Steps:

    • Here are some steps to troubleshoot and resolve the issue:
      • Check Source Table Columns:
        • Verify the columns in your raw_Sales table. Ensure that there are no duplicate column names.
        • If there are duplicate columns, consider renaming them or removing one of the duplicates.
      • Explicitly Specify Columns:
        • Instead of using SELECT *, explicitly list the columns you need in your materialized view.
        • You mentioned that it works when you select only 4 columns. In that case, explicitly specify those 4 columns in your query.
      • Inspect Column IDs:
        • If you suspect that the issue is related to column IDs, check if there’s any custom logic or transformations that might be affecting the column IDs.
      • Review Schema Changes:
        • Ensure that there haven’t been any recent schema changes (e.g., column additions or deletions) in the source table.
      • Refresh Metadata:
        • Sometimes, refreshing the metadata for the tables involved can help resolve issues related to column mappings.
        • You can do this by running a REFRESH TABLE <table_name> command.
  4. Documentation Reference:

 

Poovarasan
New Contributor II

Hi @Kaniz ,

In my case I do not have any duplicated columns in raw_sales and the existing DLT pipeline was working fine until 27th February 2024. We have tried to reach out databricks support. but we couldn't get any solution.