JAVA中的字符编码操作 - JAVA

); // -28 -72 -83 -26 -106 -121
printBytes(str.getBytes("unicode")); // -2 -1 78 45 101 -121
printBytes(str.getBytes("gb2312")); // -42 -48 -50 -60
不能转到iso8859-1，因为iso8859-1不能编码中文，输出63,63
printBytes(str.getBytes("iso8859-1")); // 63 63
通过bytes指定正确的编码可以还原到string
byte[] bytes = {-28, -72, -83, -26, -106, -121};
System.out.println(getStringFromBytes(bytes,"utf-8"));
byte[] bytes1 = { -2,-1,78, 45, 101, -121};
System.out.println(getStringFromBytes(bytes1,"unicode"));
System.out.println(new String(bytes1));//
bytes1是unicode的"中文"，系统的默认编码是utf-8，不指定的时候还原的string是烂码

来看一下文本文件的字节流
我们有一个utf8编码的文件，内容为“中文”,我们通过hex的方式查看文件，内容如下

readBytesFromFile("C:/D/charset/utf8.txt");读到的为下面的bytes，-17 -69 -65 -28 -72 -83 -26 -106 -121，其中-28 -72 -83 -26 -106 -12是"中文"的utf-8编码bytes，-17 -69 -65 utf8编码文件的文件的文件头头形式的bytes,e4=256-28=228

我们有一个gb2312编码的文件，内容为“中文”,我们通过hex的方式查看文件，内容如下

readBytesFromFile("C:/D/charset/gb2312.txt");读到的为下面的bytes，-42 -48 -50 -60是“中文”的gb2312编码。
D6 = 256-42 =214,D0=256-48=208

如果要读取文本内容需要制定正确的编码
readStringFromFile("C:/D/charset/utf8.txt","utf8");
readStringFromFile("C:/D/charset/gb2312.txt","gb2312");

JAVA中的字符编码操作(二)