cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Women in Data & AI
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง๐ฌ ๐š๐ง๐ ๐š๐œ๐ญ๐ข๐จ๐ง๐ฌ ๐ฎ๐ฌ๐ž๐ ๐ข๐ง ๐€๐ฉ๐š๐œ๐ก๐ž ๐’๐ฉ๐š๐ซ๐ค

Yogic24
Valued Contributor II

๐‹๐ข๐ฌ๐ญ ๐จ๐Ÿ ๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง๐ฌ ๐š๐ง๐ ๐š๐œ๐ญ๐ข๐จ๐ง๐ฌ ๐ฎ๐ฌ๐ž๐ ๐ข๐ง ๐€๐ฉ๐š๐œ๐ก๐ž ๐’๐ฉ๐š๐ซ๐ค ๐ƒ๐š๐ญ๐š๐…๐ซ๐š๐ฆ๐ž๐ฌ ๐Ÿ๐จ๐ซ ๐š ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐  ๐ซ๐จ๐ฅ๐ž:

๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง๐ฌ:
Transformations are operations on DataFrames that return a new DataFrame. They are lazily evaluated, meaning they do not execute immediately but build a logical plan that is executed when an action is performed.

๐Ÿ. ๐๐š๐ฌ๐ข๐œ ๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง๐ฌ:
๐ฌ๐ž๐ฅ๐ž๐œ๐ญ():  Select specific columns.
๐Ÿ๐ข๐ฅ๐ญ๐ž๐ซ(): Filter rows based on a condition.
๐ฐ๐ข๐ญ๐ก๐‚๐จ๐ฅ๐ฎ๐ฆ๐ง():Add or replace a column.
๐๐ซ๐จ๐ฉ():  Remove columns.
๐ฐ๐ก๐ž๐ซ๐ž(๐œ๐จ๐ง๐๐ข๐ญ๐ข๐จ๐ง): Equivalent to filter(condition).
๐๐ซ๐จ๐ฉ(*๐œ๐จ๐ฅ๐ฌ): Returns a new DataFrame with columns dropped.
๐๐ข๐ฌ๐ญ๐ข๐ง๐œ๐ญ():Remove duplicate rows.
๐ฌ๐จ๐ซ๐ญ(): Sort the DataFrame by columns.
๐จ๐ซ๐๐ž๐ซ๐๐ฒ(): Order the DataFrame by columns.

๐Ÿ. ๐€๐ ๐ ๐ซ๐ž๐ ๐š๐ญ๐ข๐จ๐ง ๐š๐ง๐ ๐†๐ซ๐จ๐ฎ๐ฉ๐ข๐ง๐ :
๐ ๐ซ๐จ๐ฎ๐ฉ๐๐ฒ(): Group rows by column values.
๐š๐ ๐ (): Aggregate data using functions.
๐œ๐จ๐ฎ๐ง๐ญ():  Count rows.
๐ฌ๐ฎ๐ฆ(*๐œ๐จ๐ฅ๐ฌ):Computes the sum for each numeric column.
๐š๐ฏ๐ (*๐œ๐จ๐ฅ๐ฌ): Computes the average for each numeric column.
๐ฆ๐ข๐ง(*๐œ๐จ๐ฅ๐ฌ):Computes the minimum value for each column.
๐ฆ๐š๐ฑ(*๐œ๐จ๐ฅ๐ฌ): Computes the maximum value for each column.

๐Ÿ‘. ๐‰๐จ๐ข๐ง๐ข๐ง๐  ๐ƒ๐š๐ญ๐š๐…๐ซ๐š๐ฆ๐ž๐ฌ:
๐ฃ๐จ๐ข๐ง(๐จ๐ญ๐ก๐ž๐ซ, ๐จ๐ง=๐๐จ๐ง๐ž, ๐ก๐จ๐ฐ=๐๐จ๐ง๐ž):  Joins with another DataFrame using the given join expression.
๐ฎ๐ง๐ข๐จ๐ง(): Combine two DataFrames with the same schema.
๐ข๐ง๐ญ๐ž๐ซ๐ฌ๐ž๐œ๐ญ(): Return common rows between DataFrames.

๐Ÿ’. ๐€๐๐ฏ๐š๐ง๐œ๐ž๐ ๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง๐ฌ:
๐ฐ๐ข๐ญ๐ก๐‚๐จ๐ฅ๐ฎ๐ฆ๐ง๐‘๐ž๐ง๐š๐ฆ๐ž๐():  Rename a column.
๐๐ซ๐จ๐ฉ๐ƒ๐ฎ๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ž๐ฌ(): Drop duplicate rows based on columns.
๐ฌ๐š๐ฆ๐ฉ๐ฅ๐ž(): Sample a fraction of rows.
๐ฅ๐ข๐ฆ๐ข๐ญ(): Limit the number of rows.

๐Ÿ“. ๐–๐ข๐ง๐๐จ๐ฐ ๐…๐ฎ๐ง๐œ๐ญ๐ข๐จ๐ง๐ฌ:
๐จ๐ฏ๐ž๐ซ(๐ฐ๐ข๐ง๐๐จ๐ฐ๐’๐ฉ๐ž๐œ): Defines a window specification for window functions.
๐ซ๐จ๐ฐ_๐ง๐ฎ๐ฆ๐›๐ž๐ซ().๐จ๐ฏ๐ž๐ซ(๐ฐ๐ข๐ง๐๐จ๐ฐ๐’๐ฉ๐ž๐œ): Assigns a row number starting at 1 within a window partition.
rank().over(windowSpec):  Provides the rank of rows within a window partition.

๐€๐œ๐ญ๐ข๐จ๐ง๐ฌ:
Actions trigger the execution of the transformations and return a result to the driver program or write data to an external storage system.

1. Basic Actions:
show(): Display the top rows of the DataFrame.
collect(): Return all rows as an array.
count(): Count the number of rows.
take(): Return the first N rows as an array.
first(): Return the first row.
head(): Return the first N rows.

2. Writing Data:
write(): Write the DataFrame to external storage.
write.mode(): Specify save mode (e.g., overwrite, append).
save(): Save the DataFrame to a specified path.
toJSON(): Convert the DataFrame to a JSON dataset.

3. Other Actions:
foreach(): Apply a function to each row.
foreachPartition(): Apply a function to each partition.

 

1 REPLY 1

ramyabodapati
New Contributor II

Very informative thanks ๐Ÿ˜Š 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now