c++ - Which loops should I parallelize, the outer or the inner ones -


I am writing an image processing filter, and I want to speed up computation using openmp. My pseudo code structure is as follows:

 for  (in every pixel in the image) {// some stuff here (for any combination of parameters) {// do other stuff here and Filter}}  

The code is filtering every pixel using different parameters, and selecting optimal.

My question is, what is faster: In the middle, to parallel the loop first, the processor or sequentially to reach the pixels and parallel the selection of different parameters.

I think the question can be one more general: what is fast, giving a large amount of operation to every thread, or making multiple threads with some operations.

I do not care about the details of the implementation now, and I think I can handle them with my previous expertise using OpenMP Thank you!

Your goal is to distribute data similar to the available processor, you should equate the image (outer loop) equally One thread should be divided into the processor core per process. To see the experiment with fine and coarse grain equality, what are the best results, if your number of threads is more than the number of available cores, you will start seeing performance declines.


Comments