Lesson 3 | The Reader and Writer classes |
Objective | Examine the Reader and Writer classes for converting character-based data. |
java.io.Reader
and java.io.Writer
classes are abstract superclasses for classes that read and write character-based data. char
is a two-byte Unicode character; but in other character sets that you may have to read or write, characters can have varying widths.
ASCII and ISO Latin-1 use one-byte characters. Unicode uses two-byte characters. UTF-8 uses characters of varying width between one and three bytes. Readers and writers know how to handle all these character sets and many more seamlessly.
The encoding and decoding classes themselves are hidden in the sun
packages. They are used internally by the InputStreamReader
and OutputStreamWriter
classes to convert the bytes used by the streams into chars
used by the readers and writers and vice versa.
String
constructors.
public String(byte bytes[], int offset, int length, String encoding) throws UnsupportedEncodingException public String(byte bytes[], String encoding) throws UnsupportedEncodingException
t
that you know is encoded with the ISO 8859-9 characters set (essentially ASCII plus Turkish),
you would convert it into Unicode like this:
String s = new String(t, "8859-9");
chars
.
Name | Character Set | Description |
---|---|---|
8859_1 | ISO 8859-1 | (Latin-1) Western European languages |
8859_2 | ISO 8859-2 | (Latin Extended-A) In combination with the Latin-1 character set, this set provides for most Central European languages |
8859_3 | ISO 8859-3 | (Latin Extended-B) Esperanto |
8859_4 | ISO 8859-4 | (Latin Extended-C) Baltic |
8859_5 | ISO 8859-5 | Latin/Cyrillic |
8859_6 | ISO 8859-6 | Latin/Arabic |
8859_7 | ISO 8859-7 | Latin/Greek |
8859_8 | ISO 8859-8 | Latin/Hebrew |
8859_9 | ISO 8859-9 | Latin/Turkish |
Big5 | The Big 5 encoding for Chinese | |
CNS11643 | Chinese | |
Cp037 | EBCDIC American English | |
Cp273 | IBM273 | |
Cp277 | EBCDIC Danish/Norwegian | |
Cp278 | EBCDIC Finnish/Swedish | |
Cp280 | EBCDIC Italian | |
Cp284 | EBCDIC Spanish | |
Cp285 | EBCDIC UK English | |
Cp297 | EBCDIC French | |
Cp420 | EBCDIC Arabic 1 | |
Cp424 | EBCDIC Hebrew | |
Cp437 | Original DOS IBM PC character set | Mostly ASCII with some extra characters for drawing lines and boxes |
Cp500 | EBCDIC Flemish/Romulsch | |
Cp737 | DOS Greek | |
Cp775 | DOS Baltic | |
Cp850 | DOS Latin-1 | |
Cp852 | DOS Latin-2 | |
Cp855 | DOS Cyrillic | |
Cp856 | IBM856 | |
Cp857 | DOS Turkish | |
Cp860 | DOS Portuguese | |
Cp861 | DOS Icelandic | |
Cp862 | DOS Hebrew | |
Cp863 | DOS Canadian French | |
Cp864 | DOS Arabic | |
Cp865 | IBM865 | |
Cp866 | IBM866 | |
Cp868 | EBCDIC Arabic | |
Cp869 | DOS modern Greek | |
Cp870 | EBCDIC Serbian | |
Cp871 | EBCDIC Icelandic | |
Cp874 | Windows Thai | |
Cp875 | IBM875 | |
Cp918 | EBCDIC Arabic 2 | |
Cp921 | IBM921 | |
Cp922 | IBM922 | |
Cp1006 | IBM1006 | |
Cp1025 | IBM1025 | |
Cp1026 | IBM1026 | |
Cp1046 | IBM1046 | |
Cp1097 | IBM1097 | |
Cp1098 | IBM1098 | |
Cp1112 | IBM1112 | |
Cp1122 | IBM1122 | |
Cp1123 | IBM1123 | |
Cp1124 | IBM1124 | |
Cp1250 | Windows Eastern European | Essentially ISO Latin-2 |
Cp1251 | Windows Cyrillic | |
Cp1252 | Windows Western European | Essentially ISO Latin-1 |
Cp1253 | Windows Greek | |
Cp1254 | Windows Turkish | |
Cp1255 | Windows Hebrew | |
Cp1256 | Windows Arabic | |
Cp1257 | Windows Baltic | |
Cp1258 | Windows Vietnamese | |
EUCJIS | Japanese EUC | |
GB2312 | Chinese | |
JIS | Japanese Hiragana | |
JIS0208 | Japanese | |
KSC5601 | Korean | |
MacArabic | The Macintosh Arabic character set | |
MacCentralEurope | The Macintosh Central European character set | |
MacCroatian | The Macintosh Croatian character set | |
MacCyrillic | The Macintosh Cyrillic character set | |
MacDingbat | Zapf Dingbats | |
MacGreek | The Macintosh modern Greek character set | |
MacHebrew | The Macintosh Hebrew character set | |
MacIceland | The Macintosh Icelandic character set | |
MacRoman | The standard Macintosh U.S. English character set | |
MacRomania | The Macintosh Romanian character set | |
MacSymbol | The Adobe Symbol font | Includes a complete Greek alphabet in place of the usual Roman letters |
MacThai | The Macintosh Thai character set | |
MacTurkish | The Macintosh Turkish character set | |
MacUkraine | The Macintosh Ukrainian character set | |
SJIS | Windows Japanese | |
UTF8 | UCS Transformation Format | 8-bit form |
Unicode | Normal Unicode | |
UnicodeBig | Unicode | With big-endian byte order |
UnicodeLittle | Unicode | With little-endian byte order |
UnicodeBig- Unmarked | Unicode | With big-endian byte order but without an FEFF marking the start of Unicode text |
UnicodeLittle- Unmarked | Unicode | With little-endian byte order but without an FFFE marking the start of Unicode text |