topic Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum in Data Engineering

Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first column

Costas96 — Thu, 16 Jan 2025 13:02:59 GMT

Hello everyone. I am new to DLT and I am trying to practice with it by doing some basic ingestions. I have a query like the following where I am getting data from two tables using UNION. I have noticed that everything gets ingested at the first column as a comma separated string. In my pipeline I am executing somethin like the following. Any suggestions would be appreciated. Cheers!

query = """ SELECT a.column_a as id_column a.column_b as val_column FROM catalog_a.schema_a.table_a a UNION ALL SELECT b.column_a as id_column b.column_b as val_column FROM catalog_b.shema_b.table_b b""" @dlt.table def dim_ship(): return spark.sql(query)

Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum

Walter_C — Thu, 16 Jan 2025 13:19:09 GMT

Can you try with the following code?

query_a = """ SELECT a.column_a as id_column, a.column_b as val_column FROM catalog_a.schema_a.table_a a """ query_b = """ SELECT b.column_a as id_column, b.column_b as val_column FROM catalog_b.schema_b.table_b b """ @dlt.table def table_a_data(): return spark.sql(query_a) @dlt.table def table_b_data(): return spark.sql(query_b) @dlt.table def dim_ship(): return spark.sql(""" SELECT * FROM table_a_data UNION ALL SELECT * FROM table_b_data """)

Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum

adriennn — Thu, 16 Jan 2025 13:24:03 GMT

You are missing the commas that separate the columns.

Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum

Costas96 — Thu, 16 Jan 2025 13:32:03 GMT

Unfortunately I am getting the same behavior.

Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum

Costas96 — Thu, 16 Jan 2025 15:10:27 GMT

The weird thing also is that it doesn't fetch only the specified columns but all the columns from the relevant tables.

Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum

adriennn — Thu, 16 Jan 2025 15:53:35 GMT

try to run the DLT pipeline with the code either as an SQL cell in a notebook or an *.sql file to see if you have the same problem:

SELECT

a.column_a as id_column,

a.column_b as val_column

FROM catalog_a.schema_a.table_a

UNION ALL

SELECT

column_a as id_column,

column_b as val_column

FROM catalog_shema_table_b;

Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum

MadhuB — Thu, 16 Jan 2025 16:25:31 GMT

@Costas96 I would recommend to verify the sql behavior in a notebook/sql editor.

# SQL query with proper comma separation between columns
query = """
SELECT
a.column_a as id_column,
a.column_b as val_column
FROM
catalog_a.schema_a.table_a a

UNION ALL

SELECT
b.column_a as id_column,
b.column_b as val_column
FROM
catalog_b.shema_b.table_b b"""

# Define the Delta Live Table
@dlt.table
def dim_ship():
return spark.sql(query)

# Optional: Verify the output
df = spark.sql(query)

# Check schema
print("Schema:")
df.printSchema()

# Preview data
print("\nData Preview:")
df.show(5)

Re: Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first colum

Costas96 — Fri, 17 Jan 2025 09:22:22 GMT

Actually I found the solution by using spark.readStream to read the external tables a and b into two dataframes and then I just did combined_df = df_a.union(df_b) to create my DLT table. Thank you!