Parallelizing merge sort doesn’t help, or does it?
Merge sort not being a computationally heavy algorithm (lots of moving of data, but not computing much with it) you might expect not much improvement in parallelizing it. I did the experiment and found that on my 8 core processor dividing the work into 4 to 8 threads does speed things up considerably (about a factor 3). I believe this to be the case because it allows for more efficient use of the memory bus and the cache.
Then adding more and more threads leads to significant slowdown.
The linear slowdown as shown in the graph continues at least up to 750 threads. I was expecting an additional slowdown (an increase in slope of the line) when there were so many threads their pages of memory would get swapped out in between them being active (4 KB per page, 8192 KB cache size, at least 5 pages per thread [for the 3 different parts of the array it is sorting, code, and stack], means this would happen before 410 threads). I am not quite sure yet why that does not happen.
Code is again available in my github repository.
