How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2022 04:46 PM
Schema Design :
Source : Miltiple CSV Files like (SourceFile1 ,SourceFile2)
Target : Delta Table like (Target_Table)
Excel File : ETL_Mapping_Sheet
File Columns : SourceTable ,SourceColumn, TargetTable, TargetColum , MappingLogic
MappingLogic columns contains (SELECT * FROM TABLE OR
SELECT * FROM SourceFile1 A LEFT JOIN SourceFile2 B
ON A.ID = B.ID ) like SQL statements.
Que : How Can I use the MappingLogic cloumns values in dataframe to build the mapping Logic ??
Can I Directly execute SQL statement from using Column values??
My Approach :
- Load Excel file into dataframe (df_mapping)
- Assign values of MappingLogic cloumns(SQL Select statements) into a Variable
- Call spark.sql(variablename) , it will execute the SQL Query -- Not 100% sure how to do this
Updated a sample rows from a ETL mapping sheet :
Labels:
- Labels:
-
Azure databricks
-
SQL