<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32777#M23910</link>
    <description>&lt;P&gt;I think you can construct SQL queries and use the loop to fill them with your code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.sql(f"INSERT INTO {Target} ....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Or even better, use MERGE INTO&lt;/P&gt;</description>
    <pubDate>Tue, 20 Sep 2022 11:01:05 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2022-09-20T11:01:05Z</dc:date>
    <item>
      <title>How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32770#M23903</link>
      <description>&lt;P&gt;&lt;B&gt;Schema Design :&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Source &lt;/B&gt;: Miltiple CSV Files like (SourceFile1 ,SourceFile2)&lt;/P&gt;&lt;P&gt;&lt;B&gt;Target &lt;/B&gt;: Delta Table like (Target_Table)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Excel File&lt;/B&gt; : ETL_Mapping_Sheet&lt;/P&gt;&lt;P&gt;File Columns :&amp;nbsp;SourceTable ,SourceColumn, TargetTable, TargetColum , MappingLogic&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MappingLogic columns contains (SELECT * FROM TABLE OR&lt;/P&gt;&lt;P&gt;SELECT * FROM SourceFile1 A LEFT JOIN SourceFile2 B&lt;/P&gt;&lt;P&gt;ON &lt;A href="https://a.id/" alt="https://a.id/" target="_blank"&gt;A.ID&lt;/A&gt;&amp;nbsp;= &lt;A href="https://b.id/" alt="https://b.id/" target="_blank"&gt;B.ID&lt;/A&gt; ) like SQL statements.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Que &lt;/B&gt;: How Can I use the MappingLogic cloumns values in dataframe to build the mapping Logic ??&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can I Directly execute SQL statement from using Column values??&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;My Approach :&lt;/B&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Load Excel file into dataframe (df_mapping)&lt;/LI&gt;&lt;LI&gt;Assign values of MappingLogic cloumns(SQL Select statements) into a Variable&lt;/LI&gt;&lt;LI&gt;Call spark.sql(variablename) , it will execute the SQL Query -- Not 100% sure how to do this &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Updated a sample rows from a ETL mapping sheet :&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1534i26A0E409CE9EB35C/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 23:46:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32770#M23903</guid>
      <dc:creator>thewfhengineer</dc:creator>
      <dc:date>2022-09-01T23:46:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32771#M23904</link>
      <description>&lt;P&gt;&lt;B&gt;@Aman Sehgal&lt;/B&gt;​&amp;nbsp;&lt;B&gt;  &lt;/B&gt;&amp;nbsp;@Hubert Dudek​&amp;nbsp;@Piper Wilson​&amp;nbsp;@Werner Stinckens​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can someone pls check this query ??&lt;/P&gt;</description>
      <pubDate>Mon, 19 Sep 2022 22:48:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32771#M23904</guid>
      <dc:creator>thewfhengineer</dc:creator>
      <dc:date>2022-09-19T22:48:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32772#M23905</link>
      <description>&lt;P&gt;I struggle to understand the question, so please correct me here:&lt;/P&gt;&lt;P&gt;If I understand correctly you have an excel filled with sql expressions (or field mappings source-sink) and want to use the content of that excel to insert it into spark code?&lt;/P&gt;&lt;P&gt;Technically I think it is possible, you could read the excel file into python or into a spark DF and extract the values (f.e. with the collect() function).&lt;/P&gt;&lt;P&gt;But is this really the way you want to go?  Because basically you put your mapping logic into an excel file, which is opening the gates to hell IMO.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would rather go for a selectExpr() expression.  Like that the mappings reside into code, you can check it in into git, have versioning etc.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2022 07:46:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32772#M23905</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-09-20T07:46:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32773#M23906</link>
      <description>&lt;P&gt;Following on @Werner Stinckens​&amp;nbsp;response, if you can give an example then it will be good.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Ideally you can read each row from excel file in python and pass each column as a parameter to a function.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Eg; def apply_mapping_logic(SourceTable ,SourceColumn, TargetTable, TargetColum , MappingLogic)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Within this function you can define what you would like to do with the mapping logic.&lt;/P&gt;&lt;P&gt;Again, to do this you'll have to come up with a logic based on different types of mapping logics you have in your excel file.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2022 07:55:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32773#M23906</guid>
      <dc:creator>AmanSehgal</dc:creator>
      <dc:date>2022-09-20T07:55:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32774#M23907</link>
      <description>&lt;P&gt;Thanks for your response.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;your understanding is correct.&lt;/P&gt;&lt;P&gt;I updated the sample etl mapping in the que.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As you can see , this mapping sheet contains sql statement to get target values and I have 500 mappings like this so I was thinking to use this mapping sheet directly for the logic&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Don't you think it will be a good approach ??&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2022 10:48:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32774#M23907</guid>
      <dc:creator>thewfhengineer</dc:creator>
      <dc:date>2022-09-20T10:48:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32775#M23908</link>
      <description>&lt;P&gt;@Aman Sehgal​&amp;nbsp;&lt;/P&gt;&lt;P&gt; thanks for your response , I update the sample mapping example &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I already have mapping logic in the mappingsheet so do I still need this extra function now ... can I directly store this SQL logic in a variable and directly exeute like below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Pyspark code :&lt;/P&gt;&lt;P&gt;variable = df.select("mappinglogic").collect()[0]&lt;/P&gt;&lt;P&gt;df_spark_sql = spark.sql(variable )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and after that if I want to perform any further operation , I can easily do it in df_spark_sql dataframe &lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2022 10:55:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32775#M23908</guid>
      <dc:creator>thewfhengineer</dc:creator>
      <dc:date>2022-09-20T10:55:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32776#M23909</link>
      <description>&lt;P&gt;A wise man once said: violence and excel are never the answer &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The issue with the excel approach is that it will be hard to figure out the data lineage.&lt;/P&gt;&lt;P&gt;you also have to consult 2 locations: the notebook and the excel file.&lt;/P&gt;&lt;P&gt;Also, what if someone else opens the excel file and you have to edit it?  Stuff like that.&lt;/P&gt;&lt;P&gt;IMO excel is good for data analysis, it does not belong in data engineering.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2022 10:57:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32776#M23909</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-09-20T10:57:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement Source to Target ETL Mapping sheet in PySpark using Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32777#M23910</link>
      <description>&lt;P&gt;I think you can construct SQL queries and use the loop to fill them with your code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.sql(f"INSERT INTO {Target} ....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Or even better, use MERGE INTO&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2022 11:01:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-source-to-target-etl-mapping-sheet-in-pyspark/m-p/32777#M23910</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-09-20T11:01:05Z</dc:date>
    </item>
  </channel>
</rss>

