JAVA中的字符编码操作 - JAVA

在JAVA源文件-->JAVAC编译-->Class--> Java运行-->getBytes()-->newString()-->显示的过程中，
每一步都有编码的转换过程，这个过程总是存在的，只是有的时候用默认的参数进行。

在编写JAVA源文件的时候要指定源文件的编码,这里是指源文件的文本以什么编码保存为操作系统中的文件。

JAVAC编译的时候要把源文件编译成class文件，先要读取源文件，这时候要以一种编码来解码读到的
文件，可以通过javac -encoding来指定,如果不指定则用系统默认编码。同时以unicode编码来生成class文件。
比如有一个java文件Test.java中定义了一个 String str="中文";，
然后源文件用utf-8保存，Test.java文件中"中文"的二进制
则为utf-8形式（-28 -72 -83 -26 -106 -121），这时候通过javac编译的时候
javac -encoding utf-8 Test.java,按照utf-8编码读入Test.java这个文件，编译成unicode编码的
class文件,"中文"的二进制则为unicode形式（78 45 101 -121）。
然后运行过程中，"中文"的二进制为unicode形式(78 45 101 -121),默认输入和输出的都是操作系统的默认编码。
如果这时候str.getBytes()，没有指定编码的时候，得到的bytes是由unicode转成系统默认编码，
如果指定编码，如str.getBytes("utf-8")，则由unicode转成utf-8.
new String(bytes[,encode])执行的时候，如果不指定编码，用操作系统的默认编码识别
bytes，如果指定编码，则用指定的编码识别bytes。得到的string在Java中仍然以unicode存在。
如果后面需要String.getBytes([encode])，系统要做一个Unicode字符-->encode字符-->bytes的转换。
以下面这个代码来详细的了解这些概念

public class IOTest {
	private static String str = "中文";
	public static void main(String[] args) throws Exception {
		
		System.out.println(System.getProperty("file.encoding"));
		testChar();
		
		printBytes(str.getBytes("utf-8"));
		printBytes(str.getBytes("unicode"));
		printBytes(str.getBytes("gb2312"));
		
		printBytes(str.getBytes("iso8859-1"));
		printBytes("ABC".getBytes("iso8859-1"));
		
		
		byte[] bytes = {-28, -72, -83, -26, -106, -121};
		System.out.println(getStringFromBytes(bytes,"utf-8"));
		
		byte[] bytes1 = { -2,-1,78, 45, 101, -121};
		System.out.println(getStringFromBytes(bytes1,"unicode"));
		System.out.println(new String(bytes1));
		
		
		readBytesFromFile("C:/D/charset/utf8.txt");
		readBytesFromFile("C:/D/charset/gb2312.txt");
		
		readStringFromFile("C:/D/charset/utf8.txt","utf8");
		readStringFromFile("C:/D/charset/gb2312.txt","gb2312");
		
	}

	public static void testChar() throws Exception {
		
		char c = '中';  
	    int i = c;  
	    System.out.println(i);	
	    System.out.println("\u4E2D");
	    printBytes("中".getBytes("unicode"));
	}

	public static void printBytes(byte[] bytes ) {
		for(int i=0; i
  
   = 10) {
				fin.read(readBytes);
				for (byte b : readBytes) {
					System.out.print(b + " ");
				}
			} else {
				byte[] lastbits = new byte[fin.available()];
				
				fin.read(lastbits);
				for (byte b : lastbits) {
					System.out.print(b + " ");
				}
				
				break;
			}
		}
		System.out.println("");
		fin.close();		
	}

	public static void readStringFromFile(String fileName,String charset) throws Exception {

		File file = new File(fileName);
		FileInputStream fis = new FileInputStream(file);
		InputStreamReader fr = new InputStreamReader(fis,charset);
		BufferedReader br = new BufferedReader(fr);
		String line;
		while ((line=br.readLine()) != null){
			System.out.println(line);
		}

		br.close();
		fr.close();
		fis.close();
	}

}

Java是支持多国编码的，在Java中，字符和字符串都是以Unicode进行存储的，每个字符占两个字节
如下面的代码：
char c = '中';
int i = c;
System.out.println(i); //20013
System.out.println("\u4E2D"); //中
printBytes("中".getBytes("unicode")); //-2 -1 78 45
20013对应的16进制为4E2D, 4E2D对应的10进制为78 45。
如何得到系统的默认编码：
System.out.println(System.getProperty("file.encoding"));
str以unicode编码可以转到兼容的其它编码
printBytes(str.getBytes("utf-8")

JAVA中的字符编码操作(一)