- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 04:46 AM - edited 01-16-2025 05:02 AM
Hello everyone. I am new to DLT and I am trying to practice with it by doing some basic ingestions. I have a query like the following where I am getting data from two tables using UNION. I have noticed that everything gets ingested at the first column as a comma separated string. In my pipeline I am executing somethin like the following. Any suggestions would be appreciated. Cheers!
query = """
SELECT
a.column_a as id_column
a.column_b as val_column
FROM
catalog_a.schema_a.table_a a
UNION ALL
SELECT
b.column_a as id_column
b.column_b as val_column
FROM
catalog_b.shema_b.table_b b"""
@dlt.table
def dim_ship():
return spark.sql(query)
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2025 01:21 AM - edited 01-17-2025 01:22 AM
Actually I found the solution by using spark.readStream to read the external tables a and b into two dataframes and then I just did combined_df = df_a.union(df_b) to create my DLT table. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 05:19 AM
Can you try with the following code?
query_a = """
SELECT
a.column_a as id_column,
a.column_b as val_column
FROM
catalog_a.schema_a.table_a a
"""
query_b = """
SELECT
b.column_a as id_column,
b.column_b as val_column
FROM
catalog_b.schema_b.table_b b
"""
@dlt.table
def table_a_data():
return spark.sql(query_a)
@dlt.table
def table_b_data():
return spark.sql(query_b)
@dlt.table
def dim_ship():
return spark.sql("""
SELECT * FROM table_a_data
UNION ALL
SELECT * FROM table_b_data
""")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 05:32 AM
Unfortunately I am getting the same behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 05:24 AM
You are missing the commas that separate the columns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 07:10 AM
The weird thing also is that it doesn't fetch only the specified columns but all the columns from the relevant tables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 07:53 AM
try to run the DLT pipeline with the code either as an SQL cell in a notebook or an *.sql file to see if you have the same problem:
SELECT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 08:25 AM
@Costas96 I would recommend to verify the sql behavior in a notebook/sql editor.
# SQL query with proper comma separation between columns
query = """
SELECT
a.column_a as id_column,
a.column_b as val_column
FROM
catalog_a.schema_a.table_a a
UNION ALL
SELECT
b.column_a as id_column,
b.column_b as val_column
FROM
catalog_b.shema_b.table_b b"""
# Define the Delta Live Table
@dlt.table
def dim_ship():
return spark.sql(query)
# Optional: Verify the output
df = spark.sql(query)
# Check schema
print("Schema:")
df.printSchema()
# Preview data
print("\nData Preview:")
df.show(5)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2025 01:21 AM - edited 01-17-2025 01:22 AM
Actually I found the solution by using spark.readStream to read the external tables a and b into two dataframes and then I just did combined_df = df_a.union(df_b) to create my DLT table. Thank you!

