dc.description.abstract | The problem addressed in this paper is that we want to sort an integer array a [] of length n on a multi core machine with k cores. Amdahl’s law tells us that the inherent sequential part of any algorithm will in the end dominate and limit the speedup we get from parallelisation of that algorithm. This paper introduces PARL, a parallel left radix sorting algorithm for use on ordinary shared memory multi core machines, that has just one simple statement in its sequential part. It can be seen as a major rework of the Partitioned Parallel Radix Sort (PPR) that was developed for use on a network of communicating machines with separate memories. The PARL algorithm, which was developed independently of the PPR algorithm, has in principle some of the same phases as PPR, but also many significant differences as described in this paper. On a 32 core server, a speedup of 5-12 times is achieved compared with the same sequential ARL algorithm when sorting more than 100 000 numbers and half that speedup on a 4 core PC work station and on two dual core laptops. Since the original sequential ARL algorithm in addition is 3-5 times faster than the standard Java Arrays.sort algorithm, this parallelisation translates to a significant speedup of approx. 10 to 30 for ordinary user programs sorting larger arrays. The reason that we don’t get better results, i.e. a speedup equal to the number of cores when the number of cores exceeds 2, is chiefly explained by a limited memory bandwidth. This thread pool implementation of PARL is also user friendly in that the user calling this PARL algorithm does not have to deal with creating threads themselves; to sort their data, they just create a sorting object and make a call to a thread safe method in that object. | eng |