This work analyzes the possible performance benefits one could obtain by
employing a Self-Similar type of GPU thread map on data-parallel m-simplex
domains, which is the geometrical representation of several interaction
problems. The main contributions of this work are (1) the proposal of a new
block-space map H: Zm -> Zm based on a self-similar set of sub-orthotopes, and
(2) its analysis in terms of performance and thread space, from which we obtain
that H(omega) is time and space efficient for 2-simplices and only time
efficient for 3-simplices unless the theoretical model is relaxed to allow
concurrent parallel spaces. Experimental tests on a 2-simplex domain support
the theoretical results, giving up to 30% of speedup over the standard
approach. We also show how the map can utilize GPU tensor cores and further
accelerate through fast matrix-multiply-accumulate operations. Finally, we show
that extending the map to general m-simplices is a non-trivial optimization
problem and depends of the choice of two parameters r,beta, for which we
provide some insights in order to obtain a H(omega) map that can be m! times
more space efficient than a bounding-box approach.