Checking spark performance locally

mghildiy — Sat, 15 Oct 2022 04:57:00 GMT

I am experimenting with spark, on my local machine. So, is there some tool/api available to check the performance of the code I write?

For eg. I write:

val startTime = System.nanoTime()
  invoicesDF
    .select(
      count("*").as("Total Number Of Invoices"),
      sum("Quantity").as("Total Quantity"),
      avg("UnitPrice").as("Avg Unit Price"),
      countDistinct("InvoiceNo").as("Number Of Unique Invoices")
    )
    .show()
 
  println((System.nanoTime() - startTime) / 1000000000.0)

I am using System.nanoTime to do these calculations, but am not sure of this is correct way.

Btw, this takes time of around 2.7 seconds, using 3 threads, and file having 541900 records. Is it a good performance? It is a 16GB, 4 core, 2.40GHz Intel processor.

Re: Checking spark performance locally

Hubert-Dudek — Thu, 20 Oct 2022 16:45:28 GMT

Please check the details about your code (task in jobs) in Spark UI.

topic Checking spark performance locally in Data Engineering

Checking spark performance locally

Re: Checking spark performance locally