メインコンテンツまでスキップ

Customizing Character Sets and Text Encodings

Coded Character Sets

A coded character set is a mapping of code point values to the character which they represent. Pion defines a coded character set by creating subclasses of the CharacterSet class. Several character sets are provided by default:

  • EmptyCharacterSet: A character set that has no characters.
  • UniversalCharacterSet: The character set of Unicode.

CharacterSet classes are responsible for validating code points, querying for character properties, and transcoding characters between character sets. All character sets must implement, at a minimum, transcoding to and from the UniversalCharacterSet, which acts as the least common denominator between all other character sets. A character set may implement transcoding to and/or from other character sets, which can be used to optimize transcoding by avoiding the intermediate conversion to Unicode.

Character sets can also declare that they contain and/or are contained by another character set. This allows character sets to be treated as subsets or supersets of other sets, which allows for short circuiting some transcoding and comparison operations between sets.

Text Encoding Schemes

The TextEncoding class serves as the base class for text encoding.

Using Custom Encodings

Custom text encodings can be used with SimpleString without any additional steps being required. OpaqueString can only use a custom encoding if the encoding has been registered. Registration of an encoding is done by in-place constructing it via the TextEncoding::Register call.

int main() {
TextEncoding::template Register<MyCustomEncoding>();

// Continue program.

return 0;
}

A maximum of 32 custom encodings can be registered. If a 33rd registration is attempted a MaximumTextEncodingSchemesReachedException is thrown. An Attempt tag instance can be passed to the registration function to get a Try result with the success state rather than potentially throwing an exception. It is not possible to unregister a text encoding once it has been registered. Registration is thread-safe and reentrant, however it is strongly recommended that all registrations be done during program initialization.

Custom text encodings cannot be used with OpaqueString in a constant-evaluated context, however they can with SimpleString.