GZIPInputStream to String

2022-09-01 05:38:12

我正在尝试将HTTP响应的gzip压缩正文转换为明文。我已获取此响应的字节数组并将其转换为 ByteArrayInputStream。然后,我将其转换为GZIPInputStream。我现在想读取GZIPInputStream并将最终解压缩的HTTP响应正文存储为明文字符串。

此代码将最终解压缩的内容存储在 OutputStream 中,但我想将内容存储为字符串:

public static int sChunk = 8192;
ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
byte[] buffer = new byte[sChunk];
int length;
while ((length = gzis.read(buffer, 0, sChunk)) != -1) {
        out.write(buffer, 0, length);
}

答案 1

要解码 InputStream 中的字节,可以使用 InputStreamReader。然后,BufferedReader将允许您逐行读取流。

您的代码将如下所示:

ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);

String readed;
while ((readed = in.readLine()) != null) {
    System.out.println(readed);
}

答案 2

您更应该将响应作为 InputStream 而不是 作为 获取。然后,您可以使用GZIPInputStream将其解压缩,并使用InputStreamReader将其作为字符数据读取,最后将其作为字符数据写入使用StringWriterbyte[]String

String body = null;
String charset = "UTF-8"; // You should determine it based on response header.

try (
    InputStream gzippedResponse = response.getInputStream();
    InputStream ungzippedResponse = new GZIPInputStream(gzippedResponse);
    Reader reader = new InputStreamReader(ungzippedResponse, charset);
    Writer writer = new StringWriter();
) {
    char[] buffer = new char[10240];
    for (int length = 0; (length = reader.read(buffer)) > 0;) {
        writer.write(buffer, 0, length);
    }
    body = writer.toString();
}

// ...

另请参阅:


如果你的最终目的是将响应解析为HTML,那么我强烈建议只使用HTML解析器,如Jsoup。然后,它就像这样简单:

String html = Jsoup.connect("http://google.com").get().html();

推荐