如何在Java中解脱XML

2022-09-01 10:30:36

我需要解开一个包含转义XML标签的xml字符串:

<
>
&
etc...

我确实找到了一些可以执行此任务的库,但我宁愿使用可以执行此任务的单一方法。

有人可以帮忙吗?

干杯,巴斯亨德里克斯


答案 1
StringEscapeUtils.unescapeXml(xml)

(commons-langdownload)


答案 2

下面是一个取消外观化 XML 的简单方法。它处理预定义的 XML 实体和十进制数字实体 (&#nnnn;)。修改它以处理十六进制实体 (&#xhhhh;)应该很简单。

public static String unescapeXML( final String xml )
{
    Pattern xmlEntityRegex = Pattern.compile( "&(#?)([^;]+);" );
    //Unfortunately, Matcher requires a StringBuffer instead of a StringBuilder
    StringBuffer unescapedOutput = new StringBuffer( xml.length() );

    Matcher m = xmlEntityRegex.matcher( xml );
    Map<String,String> builtinEntities = null;
    String entity;
    String hashmark;
    String ent;
    int code;
    while ( m.find() ) {
        ent = m.group(2);
        hashmark = m.group(1);
        if ( (hashmark != null) && (hashmark.length() > 0) ) {
            code = Integer.parseInt( ent );
            entity = Character.toString( (char) code );
        } else {
            //must be a non-numerical entity
            if ( builtinEntities == null ) {
                builtinEntities = buildBuiltinXMLEntityMap();
            }
            entity = builtinEntities.get( ent );
            if ( entity == null ) {
                //not a known entity - ignore it
                entity = "&" + ent + ';';
            }
        }
        m.appendReplacement( unescapedOutput, entity );
    }
    m.appendTail( unescapedOutput );

    return unescapedOutput.toString();
}

private static Map<String,String> buildBuiltinXMLEntityMap()
{
    Map<String,String> entities = new HashMap<String,String>(10);
    entities.put( "lt", "<" );
    entities.put( "gt", ">" );
    entities.put( "amp", "&" );
    entities.put( "apos", "'" );
    entities.put( "quot", "\"" );
    return entities;
}