How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables

thewfhengineer
New Contributor III

Schema Design :

Source : Miltiple CSV Files like (SourceFile1 ,SourceFile2)

Target : Delta Table like (Target_Table)

Excel File : ETL_Mapping_Sheet

File Columns : SourceTable ,SourceColumn, TargetTable, TargetColum , MappingLogic

MappingLogic columns contains (SELECT * FROM TABLE OR

SELECT * FROM SourceFile1 A LEFT JOIN SourceFile2 B

ON A.ID = B.ID ) like SQL statements.

Que : How Can I use the MappingLogic cloumns values in dataframe to build the mapping Logic ??

Can I Directly execute SQL statement from using Column values??

My Approach :

  1. Load Excel file into dataframe (df_mapping)
  2. Assign values of MappingLogic cloumns(SQL Select statements) into a Variable
  3. Call spark.sql(variablename) , it will execute the SQL Query -- Not 100% sure how to do this

Updated a sample rows from a ETL mapping sheet :

image