Lesson 3	The Reader and Writer classes
Objective	Examine the Reader and Writer classes for converting character-based data.

Java Readers and Writers

The java.io.Reader and java.io.Writer classes are abstract superclasses for classes that read and write character-based data.
The subclasses are notable for handling the conversion between different character sets.
Input and output streams are fundamentally byte-based. However, readers and writers are based on characters. In Java, a char is a two-byte Unicode character; but in other character sets that you may have to read or write, characters can have varying widths. ASCII and ISO Latin-1 use one-byte characters. Unicode uses two-byte characters. UTF-8 uses characters of varying width between one and three bytes. Readers and writers know how to handle all these character sets and many more seamlessly. The encoding and decoding classes themselves are hidden in the sun packages. They are used internally by the InputStreamReader and OutputStreamWriter classes to convert the bytes used by the streams into chars used by the readers and writers and vice versa.
You can also convert byte arrays in a particular character set into Unicode strings using two String constructors.

Java String Constructors

public String(byte bytes[], int offset, int length, String encoding) throws UnsupportedEncodingException
 
public String(byte bytes[], String encoding) throws UnsupportedEncodingException

If you have a byte array t that you know is encoded with the ISO 8859-9 characters set (essentially ASCII plus Turkish), you would convert it into Unicode like this:

String s = new String(t, "8859-9");

The exact list of encodings available varies a little from platform to platform. Here are most of the important Character Set Encodings. This same set of encodings is used by readers that convert all bytes they read into Unicode chars.

Character Set Encodings

Name	Character Set	Description
8859_1	ISO 8859-1	(Latin-1) Western European languages
8859_2	ISO 8859-2	(Latin Extended-A) In combination with the Latin-1 character set, this set provides for most Central European languages
8859_3	ISO 8859-3	(Latin Extended-B) Esperanto
8859_4	ISO 8859-4	(Latin Extended-C) Baltic
8859_5	ISO 8859-5	Latin/Cyrillic
8859_6	ISO 8859-6	Latin/Arabic
8859_7	ISO 8859-7	Latin/Greek
8859_8	ISO 8859-8	Latin/Hebrew
8859_9	ISO 8859-9	Latin/Turkish
Big5	The Big 5 encoding for Chinese
CNS11643	Chinese
Cp037	EBCDIC American English
Cp273	IBM273
Cp277	EBCDIC Danish/Norwegian
Cp278	EBCDIC Finnish/Swedish
Cp280	EBCDIC Italian
Cp284	EBCDIC Spanish
Cp285	EBCDIC UK English
Cp297	EBCDIC French
Cp420	EBCDIC Arabic 1
Cp424	EBCDIC Hebrew
Cp437	Original DOS IBM PC character set	Mostly ASCII with some extra characters for drawing lines and boxes
Cp500	EBCDIC Flemish/Romulsch
Cp737	DOS Greek
Cp775	DOS Baltic
Cp850	DOS Latin-1
Cp852	DOS Latin-2
Cp855	DOS Cyrillic
Cp856	IBM856
Cp857	DOS Turkish
Cp860	DOS Portuguese
Cp861	DOS Icelandic
Cp862	DOS Hebrew
Cp863	DOS Canadian French
Cp864	DOS Arabic
Cp865	IBM865
Cp866	IBM866
Cp868	EBCDIC Arabic
Cp869	DOS modern Greek
Cp870	EBCDIC Serbian
Cp871	EBCDIC Icelandic
Cp874	Windows Thai
Cp875	IBM875
Cp918	EBCDIC Arabic 2
Cp921	IBM921
Cp922	IBM922
Cp1006	IBM1006
Cp1025	IBM1025
Cp1026	IBM1026
Cp1046	IBM1046
Cp1097	IBM1097
Cp1098	IBM1098
Cp1112	IBM1112
Cp1122	IBM1122
Cp1123	IBM1123
Cp1124	IBM1124
Cp1250	Windows Eastern European	Essentially ISO Latin-2
Cp1251	Windows Cyrillic
Cp1252	Windows Western European	Essentially ISO Latin-1
Cp1253	Windows Greek
Cp1254	Windows Turkish
Cp1255	Windows Hebrew
Cp1256	Windows Arabic
Cp1257	Windows Baltic
Cp1258	Windows Vietnamese
EUCJIS	Japanese EUC
GB2312	Chinese
JIS	Japanese Hiragana
JIS0208	Japanese
KSC5601	Korean
MacArabic	The Macintosh Arabic character set
MacCentralEurope	The Macintosh Central European character set
MacCroatian	The Macintosh Croatian character set
MacCyrillic	The Macintosh Cyrillic character set
MacDingbat	Zapf Dingbats
MacGreek	The Macintosh modern Greek character set
MacHebrew	The Macintosh Hebrew character set
MacIceland	The Macintosh Icelandic character set
MacRoman	The standard Macintosh U.S. English character set
MacRomania	The Macintosh Romanian character set
MacSymbol	The Adobe Symbol font	Includes a complete Greek alphabet in place of the usual Roman letters
MacThai	The Macintosh Thai character set
MacTurkish	The Macintosh Turkish character set
MacUkraine	The Macintosh Ukrainian character set
SJIS	Windows Japanese
UTF8	UCS Transformation Format	8-bit form
Unicode	Normal Unicode
UnicodeBig	Unicode	With big-endian byte order
UnicodeLittle	Unicode	With little-endian byte order
UnicodeBig- Unmarked	Unicode	With big-endian byte order but without an FEFF marking the start of Unicode text
UnicodeLittle- Unmarked	Unicode	With little-endian byte order but without an FFFE marking the start of Unicode text

Java I/O

FileReader: This class is used to read character files. Its read() methods are fairly low-level, allowing you to read single characters, the whole stream of characters, or a fixed number of characters. FileReaders are usually wrapped by higher-level objects such as BufferedReaders, which improve performance and provide more convenient ways to work with the data.
BufferedReader: This class is used to make lower-level Reader classes like FileReader more efficient and easier to use. Compared to FileReaders, BufferedReaders read relatively large chunks of data from a file at once and keep this data in a buffer. When you ask for the next character or line of data, it is retrieved from the buffer, which minimizes the number of times that time-intensive, file-read operations are performed. In addition, BufferedReader provides more convenient methods, such as readLine(), that allow you to get the next line of characters from a file.
FileWriter: This class is used to write to character files. Its write() methods allow you to write character(s) or strings to a file. FileWriters are usually wrapped by higher-level Writer objects, such as BufferedWriters or PrintWriters, which provide better performance and higher-level, more flexible methods to write data.
BufferedWriter: This class is used to make lower-level classes like FileWriters more efficient and easier to use. Compared to FileWriters, BufferedWriters write relatively large chunks of data to a file at once, minimizing the number of times that slow, file-writing operations are performed. The BufferedWriter class also provides a newLine() method to create platform-specific line separators automatically.
PrintWriter: This class has been enhanced significantly in Java 5. Because of newly created methods and constructors (like building a PrintWriter with a File or a String), you might find that you can use PrintWriter in places where you previously needed a Writer to be wrapped with a FileWriter and/or a BufferedWriter. New methods like format(), printf(), and append() make PrintWriters very flexible and powerful.

Reader Writer Classes
Every platform has a default character set that's used when no other is explicitly specified. On Windows that's likely to be ISO Latin-1. On the Mac it's likely to be MacRoman.

Java Readers and Writers

Java String Constructors

Character Set Encodings

Character Set Encodings

Name

Character Set

Description