一个简单的方法似乎是这个:
/**
* HTML encode of UTF8 string i.e. symbols with code more than 127 aren't encoded
* Use Apache Commons Text StringEscapeUtils if it is possible
*
* <pre>
* escapeHtml("\tIt's timeto hack & fun\r<script>alert(\"PWNED\")</script>")
* .equals("	It's time to hack & fun <script>alert("PWNED")</script>")
* </pre>
*/
public static String escapeHtml(String rawHtml) {
int rawHtmlLength = rawHtml.length();
// add 30% for additional encodings
int capacity = (int) (rawHtmlLength * 1.3);
StringBuilder sb = new StringBuilder(capacity);
for (int i = 0; i < rawHtmlLength; i++) {
char ch = rawHtml.charAt(i);
if (ch == '<') {
sb.append("<");
} else if (ch == '>') {
sb.append(">");
} else if (ch == '"') {
sb.append(""");
} else if (ch == '&') {
sb.append("&");
} else if (ch < ' ' || ch == '\'') {
// non printable ascii symbols escaped as numeric entity
// single quote ' in html doesn't have ' so show it as numeric entity '
sb.append("&#").append((int)ch).append(';');
} else {
// any non ASCII char i.e. upper than 127 is still UTF
sb.append(ch);
}
}
return sb.toString();
}
但是,如果您确实需要转义所有非ASCII符号,即您将以7位编码传输编码文本,则将最后一个替换为:
} else {
// encode non ASCII characters if needed
int c = (ch & 0xFFFF);
if (c > 127) {
sb.append("&#").append(c).append(';');
} else {
sb.append(ch);
}
}