Still, note that the analysis as written is correct if you consider the case of clusters of different probability mass, p_1, p_2, ..., p_k (which add up > 1). In this case the masses at depth h are not all p^h, but all the product of h choices between the p_i's. Assuming again F(x)=x to gain intuition, the sum turns out to be (p_1+p_2+...+p_k)^h + rk^h, which is actually a natural extension of the uniform case, where the sum of the p_i's is pk > 1.
The difference, however, is that now the tree height is log_{1/pmax} n >= log_k n, and past this latter depth one can assume there are n clusters. Here the extra terms that were wrong appear, demonstrating that it is just a bad idea to have clusters of different size, and supporting the idea of balancing just as claimed in the paper.