Customizing Character Sets and Text Encodings
Coded Character Sets
A coded character set is a mapping of code point values to the character which they represent. Pion defines a coded
character set by creating subclasses of the CharacterSet class. Several character sets are provided by default:
EmptyCharacterSet: A character set that has no characters.UniversalCharacterSet: The character set of Unicode.
CharacterSet classes are responsible for validating code points, querying for character properties, and transcoding
characters between character sets. All character sets must implement, at a minimum, transcoding to and from the
UniversalCharacterSet, which acts as the least common denominator between all other character sets. A character set
may implement transcoding to and/or from other character sets, which can be used to optimize transcoding by avoiding the
intermediate conversion to Unicode.
Character sets can also declare that they contain and/or are contained by another character set. This allows character sets to be treated as subsets or supersets of other sets, which allows for short circuiting some transcoding and comparison operations between sets.
Text Encoding Schemes
The TextEncoding class serves as the base class for text encoding.
Using Custom Encodings
Custom text encodings can be used with SimpleString without any additional steps being required. OpaqueString can
only use a custom encoding if the encoding has been registered. Registration of an encoding is done by in-place
constructing it via the TextEncoding::Register call.
int main() {
TextEncoding::template Register<MyCustomEncoding>();
// Continue program.
return 0;
}
A maximum of 32 custom encodings can be registered. If a 33rd registration is attempted a
MaximumTextEncodingSchemesReachedException is thrown. An Attempt tag instance can be passed to the registration
function to get a Try result with the success state rather than potentially throwing an exception. It is not possible
to unregister a text encoding once it has been registered. Registration is thread-safe and reentrant, however it is
strongly recommended that all registrations be done during program initialization.
Custom text encodings cannot be used with OpaqueString in a constant-evaluated context, however they can with
SimpleString.