如何检测是否必须对字符串应用 UTF-8 解码或编码？

php encoding utf-8

2022-08-30 18:02:46

我有一个从第三方网站获取的提要，有时我必须申请，有时我必须申请才能获得所需的可见输出。utf8_decodeutf8_encode

如果错误地应用了两次相同的东西/或者使用了错误的方法，我得到了更丑陋的东西，这就是我想改变的。

如何检测何时必须对字符串应用哪些内容？

实际上，内容返回 UTF-8，但其中有些部分不是。

答案 1

我不能说我可以依靠.不久前，我有一些奇怪的误报。mb_detect_encoding()

我发现在每种情况下都能很好地工作的最普遍方法是：

if (preg_match('!!u', $string))
{
   // This is UTF-8
}
else
{
   // Definitely not UTF-8
}

答案 2

function str_to_utf8 ($str) {
    $decoded = utf8_decode($str);
    if (mb_detect_encoding($decoded , 'UTF-8', true) === false)
        return $str;
    return $decoded;
}

var_dump(str_to_utf8("« Chrétiens d'Orient » : la RATP fait marche arrière"));
//string '« Chrétiens d'Orient » : la RATP fait marche arrière' (length=56)
var_dump(str_to_utf8("Â« ChrÃ©tiens d'Orient Â» : la RATP fait marche arriÃ¨re"));
//string '« Chrétiens d'Orient » : la RATP fait marche arrière' (length=56)