Databricks Community

Costas96 · ‎01-16-2025

Hello everyone. I am new to DLT and I am trying to practice with it by doing some basic ingestions. I have a query like the following where I am getting data from two tables using UNION. I have noticed that everything gets ingested at the first column as a comma separated string. In my pipeline I am executing somethin like the following. Any suggestions would be appreciated. Cheers!

query = """
SELECT
a.column_a as id_column
a.column_b as val_column
FROM 
    catalog_a.schema_a.table_a a

UNION ALL

SELECT
b.column_a as id_column
b.column_b as val_column
FROM 
    catalog_b.shema_b.table_b b"""


@dlt.table
def dim_ship():
    return spark.sql(query)

Costas96 · ‎01-17-2025

Actually I found the solution by using spark.readStream to read the external tables a and b into two dataframes and then I just did combined_df = df_a.union(df_b) to create my DLT table. Thank you!

View solution in original post

Walter_C · ‎01-16-2025

Can you try with the following code?

query_a = """
SELECT
a.column_a as id_column,
a.column_b as val_column
FROM 
    catalog_a.schema_a.table_a a
"""

query_b = """
SELECT
b.column_a as id_column,
b.column_b as val_column
FROM 
    catalog_b.schema_b.table_b b
"""

@dlt.table
def table_a_data():
    return spark.sql(query_a)

@dlt.table
def table_b_data():
    return spark.sql(query_b)

@dlt.table
def dim_ship():
    return spark.sql("""
    SELECT * FROM table_a_data
    UNION ALL
    SELECT * FROM table_b_data
    """)

Costas96 · ‎01-16-2025

Unfortunately I am getting the same behavior.

adriennn · ‎01-16-2025

You are missing the commas that separate the columns.

Costas96 · ‎01-16-2025

The weird thing also is that it doesn't fetch only the specified columns but all the columns from the relevant tables.

adriennn · ‎01-16-2025

try to run the DLT pipeline with the code either as an SQL cell in a notebook or an *.sql file to see if you have the same problem:

SELECT

a.column_a as id_column,

a.column_b as val_column

FROM catalog_a.schema_a.table_a

UNION ALL

SELECT

column_a as id_column,

column_b as val_column

FROM catalog_shema_table_b;

MadhuB · ‎01-16-2025

@Costas96 I would recommend to verify the sql behavior in a notebook/sql editor.

# SQL query with proper comma separation between columns
query = """
SELECT
a.column_a as id_column,
a.column_b as val_column
FROM
catalog_a.schema_a.table_a a

UNION ALL

SELECT
b.column_a as id_column,
b.column_b as val_column
FROM
catalog_b.shema_b.table_b b"""

# Define the Delta Live Table
@dlt.table
def dim_ship():
return spark.sql(query)

# Optional: Verify the output
df = spark.sql(query)

# Check schema
print("Schema:")
df.printSchema()

# Preview data
print("\nData Preview:")
df.show(5)

Costas96 · ‎01-17-2025

Actually I found the solution by using spark.readStream to read the external tables a and b into two dataframes and then I just did combined_df = df_a.union(df_b) to create my DLT table. Thank you!

Databricks Community

Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first column

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April