๐๐ข๐ฌ๐ญ ๐จ๐ ๐ญ๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ญ๐ข๐จ๐ง๐ฌ ๐๐ง๐ ๐๐๐ญ๐ข๐จ๐ง๐ฌ ๐ฎ๐ฌ๐๐ ๐ข๐ง ๐๐ฉ๐๐๐ก๐ ๐๐ฉ๐๐ซ๐ค ๐๐๐ญ๐๐
๐ซ๐๐ฆ๐๐ฌ ๐๐จ๐ซ ๐ ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ ๐ซ๐จ๐ฅ๐:
๐๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ญ๐ข๐จ๐ง๐ฌ:
Transformations are operations on DataFrames that return a new DataFrame. They are lazily evaluated, meaning they do not execute immediately but build a logical plan that is executed when an action is performed.
๐. ๐๐๐ฌ๐ข๐ ๐๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ญ๐ข๐จ๐ง๐ฌ:
๐ฌ๐๐ฅ๐๐๐ญ(): Select specific columns.
๐๐ข๐ฅ๐ญ๐๐ซ(): Filter rows based on a condition.
๐ฐ๐ข๐ญ๐ก๐๐จ๐ฅ๐ฎ๐ฆ๐ง():Add or replace a column.
๐๐ซ๐จ๐ฉ(): Remove columns.
๐ฐ๐ก๐๐ซ๐(๐๐จ๐ง๐๐ข๐ญ๐ข๐จ๐ง): Equivalent to filter(condition).
๐๐ซ๐จ๐ฉ(*๐๐จ๐ฅ๐ฌ): Returns a new DataFrame with columns dropped.
๐๐ข๐ฌ๐ญ๐ข๐ง๐๐ญ():Remove duplicate rows.
๐ฌ๐จ๐ซ๐ญ(): Sort the DataFrame by columns.
๐จ๐ซ๐๐๐ซ๐๐ฒ(): Order the DataFrame by columns.
๐. ๐๐ ๐ ๐ซ๐๐ ๐๐ญ๐ข๐จ๐ง ๐๐ง๐ ๐๐ซ๐จ๐ฎ๐ฉ๐ข๐ง๐ :
๐ ๐ซ๐จ๐ฎ๐ฉ๐๐ฒ(): Group rows by column values.
๐๐ ๐ (): Aggregate data using functions.
๐๐จ๐ฎ๐ง๐ญ(): Count rows.
๐ฌ๐ฎ๐ฆ(*๐๐จ๐ฅ๐ฌ):Computes the sum for each numeric column.
๐๐ฏ๐ (*๐๐จ๐ฅ๐ฌ): Computes the average for each numeric column.
๐ฆ๐ข๐ง(*๐๐จ๐ฅ๐ฌ):Computes the minimum value for each column.
๐ฆ๐๐ฑ(*๐๐จ๐ฅ๐ฌ): Computes the maximum value for each column.
๐. ๐๐จ๐ข๐ง๐ข๐ง๐ ๐๐๐ญ๐๐
๐ซ๐๐ฆ๐๐ฌ:
๐ฃ๐จ๐ข๐ง(๐จ๐ญ๐ก๐๐ซ, ๐จ๐ง=๐๐จ๐ง๐, ๐ก๐จ๐ฐ=๐๐จ๐ง๐): Joins with another DataFrame using the given join expression.
๐ฎ๐ง๐ข๐จ๐ง(): Combine two DataFrames with the same schema.
๐ข๐ง๐ญ๐๐ซ๐ฌ๐๐๐ญ(): Return common rows between DataFrames.
๐. ๐๐๐ฏ๐๐ง๐๐๐ ๐๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ญ๐ข๐จ๐ง๐ฌ:
๐ฐ๐ข๐ญ๐ก๐๐จ๐ฅ๐ฎ๐ฆ๐ง๐๐๐ง๐๐ฆ๐๐(): Rename a column.
๐๐ซ๐จ๐ฉ๐๐ฎ๐ฉ๐ฅ๐ข๐๐๐ญ๐๐ฌ(): Drop duplicate rows based on columns.
๐ฌ๐๐ฆ๐ฉ๐ฅ๐(): Sample a fraction of rows.
๐ฅ๐ข๐ฆ๐ข๐ญ(): Limit the number of rows.
๐. ๐๐ข๐ง๐๐จ๐ฐ ๐
๐ฎ๐ง๐๐ญ๐ข๐จ๐ง๐ฌ:
๐จ๐ฏ๐๐ซ(๐ฐ๐ข๐ง๐๐จ๐ฐ๐๐ฉ๐๐): Defines a window specification for window functions.
๐ซ๐จ๐ฐ_๐ง๐ฎ๐ฆ๐๐๐ซ().๐จ๐ฏ๐๐ซ(๐ฐ๐ข๐ง๐๐จ๐ฐ๐๐ฉ๐๐): Assigns a row number starting at 1 within a window partition.
rank().over(windowSpec): Provides the rank of rows within a window partition.
๐๐๐ญ๐ข๐จ๐ง๐ฌ:
Actions trigger the execution of the transformations and return a result to the driver program or write data to an external storage system.
1. Basic Actions:
show(): Display the top rows of the DataFrame.
collect(): Return all rows as an array.
count(): Count the number of rows.
take(): Return the first N rows as an array.
first(): Return the first row.
head(): Return the first N rows.
2. Writing Data:
write(): Write the DataFrame to external storage.
write.mode(): Specify save mode (e.g., overwrite, append).
save(): Save the DataFrame to a specified path.
toJSON(): Convert the DataFrame to a JSON dataset.
3. Other Actions:
foreach(): Apply a function to each row.
foreachPartition(): Apply a function to each partition.