This work presents and studies the efficiency problem of mapping GPU threads onto simplex domains. A non-linear map lambda(w) is formulated based on a block-space enumeration principle that reduces the number of thread-blocks by a factor of approximately 2x and 6x for 2-simplex and 3-simplex domains, respectively, when compared to the standard approach. Performance results show that lambda(w) is competitive and even the fastest map when ran in recent GPU architectures such as the Tesla V100, where it reaches up to 1.5x of speedup in 2-simplex tests. In 3-simplex tests, it reaches up to 2.3x of speedup for small workloads and up to 1.25x for larger ones. The results obtained make lambda(w) a useful GPU optimization technique with applications on parallel problems that define all-pairs, all-triplets or nearest neighbors interactions in a 2-simplex or 3-simplex domain.