使用 FileWriter (Java) 以 UTF-8 格式写入文件?安全编码构造函数较长的示例

2022-08-31 11:42:16

我有以下代码,但是,我希望它作为UTF-8文件写入以处理外来字符。有没有办法做到这一点,是否需要有一个参数?

我真的很感激你对此的帮助。谢谢。

try {
  BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list"));
  writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv"));
  while( (line = reader.readLine()) != null) {
    //If the line starts with a tab then we just want to add a movie
    //using the current actor's name.
    if(line.length() == 0)
      continue;
    else if(line.charAt(0) == '\t') {
      readMovieLine2(0, line, surname.toString(), forename.toString());
    } //Else we've reached a new actor
    else {
      readActorName(line);
    }
  }
} catch (IOException e) {
  e.printStackTrace();
}

答案 1

安全编码构造函数

让 Java 正确通知您编码错误是很棘手的。您必须为每个构造函数使用最详细最不常用的四个备用构造函数,并在编码故障时收到适当的异常。InputStreamReaderOutputStreamWriter

对于文件 I/O,始终确保始终用作两者的第二个参数和花哨的编码器参数:OutputStreamWriterInputStreamReader

  Charset.forName("UTF-8").newEncoder()

还有其他更奇特的可能性,但三种更简单的可能性都不适用于异常处理。这些功能可以:

 OutputStreamWriter char_output = new OutputStreamWriter(
     new FileOutputStream("some_output.utf8"),
     Charset.forName("UTF-8").newEncoder() 
 );

 InputStreamReader char_input = new InputStreamReader(
     new FileInputStream("some_input.utf8"),
     Charset.forName("UTF-8").newDecoder() 
 );

至于跑步

 $ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere

问题是,它不会对字符流使用完整的编码器参数形式,因此您将再次错过编码问题。

较长的示例

下面是一个较长的示例,此示例管理进程而不是文件,其中我们将两个不同的输入字节流和一个输出字节流全部提升为具有完全异常处理的 UTF-8 字符流:

 // this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams
 Process
 slave_process = Runtime.getRuntime().exec("perl -CS script args");

 // fetch his stdin byte stream...
 OutputStream
 __bytes_into_his_stdin  = slave_process.getOutputStream();

 // and make a character stream with exceptions on encoding errors
 OutputStreamWriter
   chars_into_his_stdin  = new OutputStreamWriter(
                             __bytes_into_his_stdin,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newEncoder()
                         );

 // fetch his stdout byte stream...
 InputStream
 __bytes_from_his_stdout = slave_process.getInputStream();

 // and make a character stream with exceptions on encoding errors
 InputStreamReader
   chars_from_his_stdout = new InputStreamReader(
                             __bytes_from_his_stdout,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()
                         );

// fetch his stderr byte stream...
 InputStream
 __bytes_from_his_stderr = slave_process.getErrorStream();

 // and make a character stream with exceptions on encoding errors
 InputStreamReader
   chars_from_his_stderr = new InputStreamReader(
                             __bytes_from_his_stderr,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()
                         );

现在,您有三个字符流,它们都会在编码错误时引发异常,分别称为 、 和 。chars_into_his_stdinchars_from_his_stdoutchars_from_his_stderr

这只比你需要的问题稍微复杂一些,我在这个答案的前半部分给出了解决方案。关键点是这是检测编码错误的唯一方法。

只是不要让我开始吃例外。PrintStream


答案 2

Ditch 和 ,它们毫无用处,因为它们不允许您指定编码。相反,请使用FileWriterFileReader

new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)

new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);