删除 ✅ ,
我有一些字符串,其中包含各种不同的表情符号/图像/符号。
并非所有字符串都是英语 - 其中一些是其他非拉丁语语言,例如:
▓ railway??
→ Cats and dogs
I'm on 我有一些字符串,其中包含各种不同的表情符号/图像/符号。
并非所有字符串都是英语 - 其中一些是其他非拉丁语语言,例如:
▓ railway??
→ Cats and dogs
I'm on Instead of blacklisting some elements, how about creating a whitelist of the characters you do wish to keep? This way you don't need to worry about every new emoji being added.
String characterFilter = "[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]";
String emotionless = aString.replaceAll(characterFilter,"");
So:
[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s] is a range representing all numeric (\\p{N}), letter (\\p{L}), mark (\\p{M}), punctuation (\\p{P}), whitespace/separator (\\p{Z}), other formatting (\\p{Cf}) and other characters above U+FFFF in Unicode (\\p{Cs}), and newline (\\s) characters. \\p{L} specifically includes the characters from other alphabets such as Cyrillic, Latin, Kanji, etc. ^ in the regex character set negates the match.Example:
String str = "hello world _# 皆さん、こんにちは! 私はジョンと申します。