cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Poovarasan
New Contributor II

Hi,

Currently, I am using the below-mentioned query to create a materialized view. It was working fine until yesterday in the DLT pipeline, but from today on, the below-provided code throws an error (com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum). However, it works when I select only 4 columns in the select. Please let us know if anyone knows about this issue.

CREATE LIVE TABLE Sales
TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5',
    'delta.columnMapping.mode' = 'name'
  )
AS
select Product_name as `Product Name`, id as `Order ID`,Date as `Order Date`, Price as `Total Amount`, Delivery_Date as 'Delivery Date'
from raw_Sales
2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @PoovarasanThe error message you’re encountering, com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id 2 in column, indicates that there is a conflict related to column IDs in your query.

Let’s break down the issue and explore potential solutions:

  1. Column Mapping Mode:

    • You’ve set the delta.columnMapping.mode property to 'name', which means that column mapping is based on column names rather than IDs.
    • However, despite this setting, you’re still encountering a duplicate column ID issue.
  2. Possible Causes:

    • The error suggests that there are duplicate column IDs (2) in your query.
    • This could happen if there are multiple columns with the same name in your source table (raw_Sales).
  3. Resolution Steps:

    • Here are some steps to troubleshoot and resolve the issue:
      • Check Source Table Columns:
        • Verify the columns in your raw_Sales table. Ensure that there are no duplicate column names.
        • If there are duplicate columns, consider renaming them or removing one of the duplicates.
      • Explicitly Specify Columns:
        • Instead of using SELECT *, explicitly list the columns you need in your materialized view.
        • You mentioned that it works when you select only 4 columns. In that case, explicitly specify those 4 columns in your query.
      • Inspect Column IDs:
        • If you suspect that the issue is related to column IDs, check if there’s any custom logic or transformations that might be affecting the column IDs.
      • Review Schema Changes:
        • Ensure that there haven’t been any recent schema changes (e.g., column additions or deletions) in the source table.
      • Refresh Metadata:
        • Sometimes, refreshing the metadata for the tables involved can help resolve issues related to column mappings.
        • You can do this by running a REFRESH TABLE <table_name> command.
  4. Documentation Reference:

 

Poovarasan
New Contributor II

Hi @Kaniz ,

In my case I do not have any duplicated columns in raw_sales and the existing DLT pipeline was working fine until 27th February 2024. We have tried to reach out databricks support. but we couldn't get any solution.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.