Much of their success is due to the use of words as source symbols and a byte-oriented target alphabet. This approach represented a break with traditional statistical compressors, which use characters as source symbols and a bit-oriented target alphabet.
In this work, we go one step beyond by using phrases as source symbols. We present two new semistatic modelers we combined with a dense coding scheme to obtain two new compressors: Pair-Based End-Tagged Dense Code (PETDC), where source symbols can be either words or pairs of words and Phrase-Based End-Tagged Dense Code (PhETDC), which considers words and sequences of words (phrases). PETDC compresses English texts to 28-29% and PhETDC to around 23%, outperforming the optimal byte-oriented zero-order prefix-free word-based semistatic compressor by up to 8 percentage points. Moreover, PETDC and PhETDC still permit random access and efficient direct searches using fast Boyer-Moore algorithms.