I am experimenting with spark, on my local machine. So, is there some tool/api available to check the performance of the code I write?
For eg. I write:
val startTime = System.nanoTime()
invoicesDF
.select(
count("*").as("Total Number Of Invoices"),
sum("Quantity").as("Total Quantity"),
avg("UnitPrice").as("Avg Unit Price"),
countDistinct("InvoiceNo").as("Number Of Unique Invoices")
)
.show()
println((System.nanoTime() - startTime) / 1000000000.0)
I am using System.nanoTime to do these calculations, but am not sure of this is correct way.
Btw, this takes time of around 2.7 seconds, using 3 threads, and file having 541900 records. Is it a good performance? It is a 16GB, 4 core, 2.40GHz Intel processor.