cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Poovarasan
New Contributor III

Hi,

Currently, I am using the below-mentioned query to create a materialized view. It was working fine until yesterday in the DLT pipeline, but from today on, the below-provided code throws an error (com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum). However, it works when I select only 4 columns in the select. Please let us know if anyone knows about this issue.

CREATE LIVE TABLE Sales
TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5',
    'delta.columnMapping.mode' = 'name'
  )
AS
select Product_name as `Product Name`, id as `Order ID`,Date as `Order Date`, Price as `Total Amount`, Delivery_Date as 'Delivery Date'
from raw_Sales
2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @PoovarasanThe error message you’re encountering, com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id 2 in column, indicates that there is a conflict related to column IDs in your query.

Let’s break down the issue and explore potential solutions:

  1. Column Mapping Mode:

    • You’ve set the delta.columnMapping.mode property to 'name', which means that column mapping is based on column names rather than IDs.
    • However, despite this setting, you’re still encountering a duplicate column ID issue.
  2. Possible Causes:

    • The error suggests that there are duplicate column IDs (2) in your query.
    • This could happen if there are multiple columns with the same name in your source table (raw_Sales).
  3. Resolution Steps:

    • Here are some steps to troubleshoot and resolve the issue:
      • Check Source Table Columns:
        • Verify the columns in your raw_Sales table. Ensure that there are no duplicate column names.
        • If there are duplicate columns, consider renaming them or removing one of the duplicates.
      • Explicitly Specify Columns:
        • Instead of using SELECT *, explicitly list the columns you need in your materialized view.
        • You mentioned that it works when you select only 4 columns. In that case, explicitly specify those 4 columns in your query.
      • Inspect Column IDs:
        • If you suspect that the issue is related to column IDs, check if there’s any custom logic or transformations that might be affecting the column IDs.
      • Review Schema Changes:
        • Ensure that there haven’t been any recent schema changes (e.g., column additions or deletions) in the source table.
      • Refresh Metadata:
        • Sometimes, refreshing the metadata for the tables involved can help resolve issues related to column mappings.
        • You can do this by running a REFRESH TABLE <table_name> command.
  4. Documentation Reference:

 

Hi @Kaniz_Fatma ,

In my case I do not have any duplicated columns in raw_sales and the existing DLT pipeline was working fine until 27th February 2024. We have tried to reach out databricks support. but we couldn't get any solution.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!