Compression Mode 07 — Frequency Map since v2.0.0
This compression mode applies frequency-based byte substitution to compress strings with many repeated characters.
It replaces the most frequent characters with compact single-byte indices and uses an escape sequence for all other characters.
How It Works
- Count character frequencies in the input string.
- Select up to 254 most frequent characters.
- Store them as a frequency table prefix at the beginning of the compressed payload.
- Encode the input as a byte stream:
- frequent character → 1-byte index (
0–253) - other character → escape byte
0xFF+ UTF-16 bytes
- frequent character → 1-byte index (
- Pack bytes into UTF-16 characters (2 bytes per character).
A splitter string is inserted between the header and the encoded body and is chosen dynamically to avoid collisions with the input.
Header Character Usage
| Name | Usage |
|---|---|
| Code #1 | 07 |
| Code #2 | Splitter index |
| Code #3 | default |
| i? | false |
| o? | false |
| s? | false |
| b? | default |