cancel
Showing results for 
Search instead for 
Did you mean: 
Khoros Community Forums Support (Not for Databricks Product Questions)
Please use this forum to alert the Community administrators to issues with the Khoros platform and moderation concerns. This is not a forum for Databricks product questions.
cancel
Showing results for 
Search instead for 
Did you mean: 

Numpy performance on different clusters

Maximus1
New Contributor II

I've been running some performance tests with Databricks, but I am struggling to make sense of the results.

def g(a, b):
    N = 1000
    t1 = time.time()
    for _ in range(N):
        a**5 + 2 * b
    t2 = time.time()
    return (t2 - t1) / N

a = np.random.rand(2**24)
b = np.random.rand(2**24)

Evaluating g(a, b) returns around 0.15 on a modest "Standard_D8ds_v5" cluster with 8 cores on the driver, while it return around 0.61 on a powerful "Standard_E32_v3" with 32 cores on the driver. In other words, the same calculation takes around 4 times longer on the powerful cluster.

Considering that Numpy uses BLAS, which is supposed to bypass Python's GIL lock, I am struggling to find an explanation for what I am seeing.

Can somebody makes sense of this and suggest a way to improve the performance on the powerful cluster?

PS I am aware that I could distribute the work in the loop of my function over multiple workers, using a tool like Ray Clusters, but that is not what I am after here.

5 REPLIES 5

-werners-
Esteemed Contributor III

mutlicore is not that relevant here. pure cpu power does, D8ds has a more recent/powerful cpu.

Maximus1
New Contributor II

Thanks for the reply!

How come that multicore is not relevant here? Numpy does support multi-threading and uses multiple cores on a standard PC.

Where can I see the power of the CPUs of the cluster that I am using? A factor 4 speed difference still seems quite large...

-werners-
Esteemed Contributor III

because of the for loop.
if you would have vectorized the whole code, so not using a loop np could go wild.

About the vm specs: You can find all this on the Azure docs.
The big difference depends i guess on what the 0.15 is. Seconds, minutes, hours, days?
To get a reliable measurement you should run it several times and then compare.

Maximus1
New Contributor II

Well, that's the point. I'm doing a calculation with arrays of size 2^24 (so roughly 17 million) and the loop is ensuring that the calculation is being repeated several times (in this case 1000 times). In the end, the function returns the average time it took to do the calculation.

Because of the array size, I would in fact expect Numpy to go wild, but it doesn't.

Alyceveum25
New Contributor III

thank you ver much