cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to translate Apache Pig FILTER statement to Spark?

User15787040559
New Contributor III

If you have the following Apache Pig FILTER statement:

XCOCD_ACT_Y = FILTER XCOCD BY act_ind == 'Y';

the equivalent code in Apache Spark is:

XCOCD_ACT_Y_DF = (XCOCD_DF
    .filter(col("act_ind") == "Y"))

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @User15787040559729892342! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

FeliciaWilliam
New Contributor III

Translating an Apache Pig FILTER statement to Spark requires understanding the differences in syntax and functionality between the two processing frameworks. While both aim to filter data, Spark uses a different syntax and approach, typically involving DataFrames or RDDs. It's essential to familiarize yourself with Spark's documentation and apply the equivalent Spark code for filtering based on your specific criteria, taking into account the unique features and capabilities of Spark.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.