从文本文件中读取的第一个字符：ï»¿

unicode java filereader character-encoding

2022-09-04 19:20:08

如果我编写此代码，我将此作为输出 --> 首先：ï»¿ 然后是其他行

try {
    BufferedReader br = new BufferedReader(new FileReader(
            "myFile.txt"));

    String line;
    while (line = br.readLine() != null) {
        System.out.println(line);
    }
    br.close();

} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

我该如何避免？

答案 1

您将在第一行获得字符ï»¿，因为此序列是UTF-8字节顺序标记（BOM）。如果文本文件以 BOM 开头，则它可能是由记事本等 Windows 程序生成的。

为了解决您的问题，我们选择以 UTF-8 的形式显式读取文件，而不是任何默认的系统字符编码（US-ASCII 等）：

BufferedReader in = new BufferedReader(
    new InputStreamReader(
        new FileInputStream("myFile.txt"),
        "UTF-8"));

然后在 UTF-8 中，字节序列 ï»¿ 解码为一个字符，即 U+FEFF。此字符是可选的 - 合法的 UTF-8 文件可能以它开头，也可能不以它开头。因此，只有当第一个字符是 U+FEFF 时，我们才会跳过它：

in.mark(1);
if (in.read() != 0xFEFF)
  in.reset();

现在，您可以继续执行其余代码。

答案 2

问题可能出在所使用的编码上。试试这个：

BufferedReader in = new BufferedReader(new InputStreamReader(
      new FileInputStream("yourfile"), "UTF-8"));