将任何可转换的 utf8 字符音译为 ascii 等效字符
有没有好的解决方案可以很好地进行这种音译?
我尝试过使用,但非常烦人,它的行为并不像人们预期的那样。iconv()
- 使用将尝试替换它所能替换的东西,使所有不可转换的内容都为“?”
//TRANSLIT
- 使用不会在文本中留下“?”,但也不会音译,并且在找到不可转换的字符时也会提高,因此您必须使用带有@错误抑制器的iconv
//IGNORE
E_NOTICE
- 使用(正如一些人在PHP论坛中建议的那样)实际上与(在php版本5.3.2和5.3.13上自己尝试过)相同。
//IGNORE//TRANSLIT
//IGNORE
- 也使用是相同的
//TRANSLIT//IGNORE
//TRANSLIT
它还使用当前区域设置进行音译。
警告 - 大量文本和代码紧随其后!
以下是一些示例:
$text = 'Regular ascii text + čćžšđ + äöüß + éĕěėëȩ + æø€ + $ + ¶ + @';
echo '<br />original: ' . $text;
echo '<br />regular: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> regular: Regular ascii text + ????? + ???ss + ?????? + ae?EUR + $ + ? + @
setlocale(LC_ALL, 'en_GB');
echo '<br />en_GB: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
setlocale(LC_ALL, 'en_GB.UTF8'); // will this work?
echo '<br />en_GB.UTF8: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB.UTF8: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
Ok, that did convert č ć š ä ö ü ß é ĕ ě ė ë ȩ and æ, 但为什么不是 đ 和 ø?
// now specific locales
setlocale(LC_ALL, 'hr_Hr'); // this should fix croatian đ, right?
echo '<br />hr_Hr: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// wrong > hr_Hr: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
setlocale(LC_ALL, 'sv_SE'); // so this will fix swedish ø?
echo '<br />sv_SE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// will not > sv_SE: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
//this is interesting
setlocale(LC_ALL, 'de_DE');
echo '<br />de_DE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> de_DE: Regular ascii text + cczs? + aeoeuess + eeeeee + ae?EUR + $ + ? + @
// actually this is what any german would expect since ä ö ü really is same as ae oe ue
让我们试试://IGNORE
echo '<br />ignore: ' . iconv("UTF-8", "ASCII//IGNORE", $text);
//> ignore: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 49"
// with translit?
echo '<br />ignore/translit: ' . iconv("UTF-8", "ASCII//IGNORE//TRANSLIT", $text);
//same as ignore only> ignore/translit: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 54"
// translit/ignore?
echo '<br />translit/ignore: ' . iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $text);
//same as translit only> translit/ignore: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
使用这个家伙的解决方案也不能像想要的那样工作:Regular ascii text + YYYYY + aous + eYYYeY + aoY + $ + � + @
即使使用PECL intl Normalizer类(即使您有PHP>5.3.0,也并不总是可唤醒的,因为ICU软件包interm使用可能不适用于PHP,即在某些托管服务器上)会产生错误的结果:
echo '<br />normalize: ' .preg_replace('/\p{Mn}/u', '', Normalizer::normalize($text, Normalizer::FORM_KD));
//>normalize: Regular ascii text + cczsđ + aouß + eeeeee + æø€ + $ + ¶ + @
那么,有没有其他方法可以做到这一点,或者唯一正确的做法是自己做或定义音译表?preg_replace()
str_replace()
附录:我在2008年的ZF维基上发现了关于Zend_Filter_Transliterate提案的辩论,但由于在某些语言中无法转换(即中文),但对于任何基于拉丁语和西里尔语的语言IMO,此选项应该存在。