用于实际 UTF-8 兼容性检查的小型解决方案:
public static final boolean isUTF8(final byte[] inputBytes) {
final String converted = new String(inputBytes, StandardCharsets.UTF_8);
final byte[] outputBytes = converted.getBytes(StandardCharsets.UTF_8);
return Arrays.equals(inputBytes, outputBytes);
}
您可以检查测试结果:
@Test
public void testEnconding() {
byte[] invalidUTF8Bytes1 = new byte[]{(byte)0b10001111, (byte)0b10111111 };
byte[] invalidUTF8Bytes2 = new byte[]{(byte)0b10101010, (byte)0b00111111 };
byte[] validUTF8Bytes1 = new byte[]{(byte)0b11001111, (byte)0b10111111 };
byte[] validUTF8Bytes2 = new byte[]{(byte)0b11101111, (byte)0b10101010, (byte)0b10111111 };
assertThat(isUTF8(invalidUTF8Bytes1)).isFalse();
assertThat(isUTF8(invalidUTF8Bytes2)).isFalse();
assertThat(isUTF8(validUTF8Bytes1)).isTrue();
assertThat(isUTF8(validUTF8Bytes2)).isTrue();
assertThat(isUTF8("\u24b6".getBytes(StandardCharsets.UTF_8))).isTrue();
}
从 https://codereview.stackexchange.com/questions/59428/validating-utf-8-byte-array 复制测试用例