I have a CPU computationally demanding code, which takes on a workstation with single Xeon E5-2667v2, 223 hours to finish. I would like to determine how many of such CPUs would I need to finish the calculation, say in about 12 hours. Comparing single and multiple CPU charts, I notice that the benchmark value is not simply a factor of 2. Could anyone give some suggestions on how to determine the number of needed CPUs to get down to the desired time for the specific calculation? If we use more CPUs, more RAM is also needed. So any suggestions on how to determine the increase of the RAM would be welcome too.
Announcement
Collapse
No announcement yet.
How to determining number of CPUs needed
Collapse
X
-
Is the software you are using single threaded or multi-threaded? And if it is multi-threaded, how many threads does it support while running? If it is a single thread operation, then moving to multiple CPUs probably won't make a differences and you'll just need to find a CPU with a higher single thread performance to decrease your current run time. The single thread rating for Xeon E5-2667 v2 is around 2027, but you won't find anything that will decrease your run time by 95% to 12 hours.
-
Just adding more cores and CPU in the same box typically leads to diminishing returns. Eventually memory cache and RAM bandwidth place a bottle neck on the overall throughput (if the disk or network bottlenecks don't do that first).
Switching to a cluster of machines typically means writing new network code and then you have problems with network latency and throughput. As the network is typically vastly slower that internal RAM access.
If the task is purely computational and doesn't need the disk or network, then I would suggest having a look at running the task on a GPU instead of the CPU. Four high end video cards like the Tesla, (with associated GPU code) could well give you 20x performance increase you are looking for.
Intel Phi might also be an option. 72 cores and 385GB of RAM.
Comment
-
Thank you for the comment. Im aware of GPU and analysed it but the code is not "1D" so it could not be transferred to each CUDA core as it's time dependent/dynamic problem.
The task is purely computational and Im able to program it for multiple workers/processors/cores. Intel Phi could be an option, was not aware of it, so thank you. Is there a way to (at least roughly) estimate how much time would I gain by using the Intel Phi in comparison to my E5-2667v2? After all, everything is compensation between time and money
Comment
-
Your E5-2667v2 is an 8 core CPU. Can you limit your code to do a short run on 1 thread? If so take measurements of computations / second for 1 thread, then 2 threads, ..... to 8 threads. Then draw a graph, which might start to be non linear in shape (e.g. logarithmic). You could try going to 16 threads, but then you are into hyper-threaded cores, which complicates matters. Then extrapolate the line on your graph out to reach the performance level required. If it is really non linear, then you might find you just can't get a CPU with enough cores to meet your performance goals.
We are not the best people to talk about Intel Phi. We've never actually used one. I only know that it does exist and was designed for tasks similar to this.
Comment
Comment