class Poco::UTF16Encoding
Overview
UTF-16 text encoding, as defined in RFC 2781. More…
#include <UTF16Encoding.h> class UTF16Encoding: public Poco::TextEncoding { public: // enums enum ByteOrderType; // construction UTF16Encoding(ByteOrderType byteOrder = NATIVE_BYTE_ORDER); UTF16Encoding(int byteOrderMark); // methods ByteOrderType getByteOrder() const; void setByteOrder(ByteOrderType byteOrder); void setByteOrder(int byteOrderMark); virtual const char* canonicalName() const; virtual bool isA(const std::string& encodingName) const; virtual const CharacterMap& characterMap() const; virtual int convert(const unsigned char* bytes) const; virtual int convert( int ch, unsigned char* bytes, int length ) const; virtual int queryConvert( const unsigned char* bytes, int length ) const; virtual int sequenceLength( const unsigned char* bytes, int length ) const; };
Inherited Members
public: // typedefs typedef SharedPtr<TextEncoding> Ptr; typedef int CharacterMap[256]; // enums enum { MAX_SEQUENCE_LENGTH = 6, }; // fields static const std::string GLOBAL; // methods virtual const char* canonicalName() const = 0; virtual bool isA(const std::string& encodingName) const = 0; virtual const CharacterMap& characterMap() const = 0; virtual int convert(const unsigned char* bytes) const; virtual int queryConvert( const unsigned char* bytes, int length ) const; virtual int sequenceLength( const unsigned char* bytes, int length ) const; virtual int convert( int ch, unsigned char* bytes, int length ) const; static TextEncoding& byName(const std::string& encodingName); static TextEncoding::Ptr find(const std::string& encodingName); static void add(TextEncoding::Ptr encoding); static void add( TextEncoding::Ptr encoding, const std::string& name ); static void remove(const std::string& encodingName); static TextEncoding::Ptr global(TextEncoding::Ptr encoding); static TextEncoding& global(); protected: // methods static TextEncodingManager& manager();
Detailed Documentation
UTF-16 text encoding, as defined in RFC 2781.
When converting from UTF-16 to Unicode, surrogates are reported as they are - in other words, surrogate pairs are not combined into one Unicode character. When converting from Unicode to UTF-16, however, characters outside the 16-bit range are converted into a low and high surrogate.
Construction
UTF16Encoding(ByteOrderType byteOrder = NATIVE_BYTE_ORDER)
Creates and initializes the encoding for the given byte order.
UTF16Encoding(int byteOrderMark)
Creates and initializes the encoding for the byte-order indicated by the given byte-order mark, which is the Unicode character 0xFEFF.
Methods
ByteOrderType getByteOrder() const
Returns the byte-order currently in use.
void setByteOrder(ByteOrderType byteOrder)
Sets the byte order.
void setByteOrder(int byteOrderMark)
Sets the byte order according to the given byte order mark, which is the Unicode character 0xFEFF.
virtual const char* canonicalName() const
Returns the canonical name of this encoding, e.g.
“ISO-8859-1”. Encoding name comparisons are case insensitive.
virtual bool isA(const std::string& encodingName) const
Returns true if the given name is one of the names of this encoding.
For example, the “ISO-8859-1” encoding is also known as “Latin-1”.
Encoding name comparision are be case insensitive.
virtual const CharacterMap& characterMap() const
Returns the CharacterMap for the encoding.
The CharacterMap should be kept in a static member. As characterMap() can be called frequently, it should be implemented in such a way that it just returns a static map. If the map is built at runtime, this should be done in the constructor.
virtual int convert(const unsigned char* bytes) const
The convert function is used to convert multibyte sequences; bytes will point to a byte sequence of n bytes where sequenceLength(bytes, length) == -n, with length >= n.
The convert function must return the Unicode scalar value represented by this byte sequence or -1 if the byte sequence is malformed. The default implementation returns (int) bytes[0].
virtual int convert( int ch, unsigned char* bytes, int length ) const
Transform the Unicode character ch into the encoding’s byte sequence.
The method returns the number of bytes used. The method must not use more than length characters. Bytes and length can also be null - in this case only the number of bytes required to represent ch is returned. If the character cannot be converted, 0 is returned and the byte sequence remains unchanged. The default implementation simply returns 0.
virtual int queryConvert( const unsigned char* bytes, int length ) const
The queryConvert function is used to convert single byte characters or multibyte sequences; bytes will point to a byte sequence of length bytes.
The queryConvert function must return the Unicode scalar value represented by this byte sequence or -1 if the byte sequence is malformed or -n where n is number of bytes requested for the sequence, if lenght is shorter than the sequence. The length of the sequence might not be determined by the first byte, in which case the conversion becomes an iterative process: First call with length == 1 might return -2, Then a second call with lenght == 2 might return -4 Eventually, the third call with length == 4 should return either a Unicode scalar value, or -1 if the byte sequence is malformed. The default implementation returns (int) bytes[0].
virtual int sequenceLength( const unsigned char* bytes, int length ) const
The sequenceLength function is used to get the lenth of the sequence pointed by bytes.
The length paramater should be greater or equal to the length of the sequence.
The sequenceLength function must return the lenght of the sequence represented by this byte sequence or a negative value -n if length is shorter than the sequence, where n is the number of byte requested to determine the length of the sequence. The length of the sequence might not be determined by the first byte, in which case the conversion becomes an iterative process as long as the result is negative: First call with length == 1 might return -2, Then a second call with lenght == 2 might return -4 Eventually, the third call with length == 4 should return 4. The default implementation returns 1.