How to translate Apache Pig FOREACH GENERATE statement to Spark?

User15787040559
Databricks Employee
Databricks Employee

If you have the following Apache Pig FOREACH GENERATE statement:

XBCUD_Y_TMP1 = FOREACH (FILTER XBCUD BY act_ind == 'Y') GENERATE cust_hash_key,CONCAT(brd_abbr_cd,ctry_cd) as brdCtry:chararray,updt_dt_hash_key;

the equivalent code in Apache Spark is:

XBCUD_Y_TMP1_DF = (XBCUD_DF
    .filter(col("act_ind") == "Y")
    .select(col("cust_hash_key"),
            concat(col("brd_abbr_cd"),col("ctry_cd")).alias("brdCtry"),
            col("updt_dt_hash_key"))
        )

User15725630784
Databricks Employee
Databricks Employee

the equivalent code in Apache Spark is:

  1. XBCUD_Y_TMP1_DF = (XBCUD_DF
  2. .filter(col("act_ind") == "Y")
  3. .select(col("cust_hash_key"),
  4. concat(col("brd_abbr_cd"),col("ctry_cd")).alias("brdCtry"),
  5. col("updt_dt_hash_key"))
  6. )