在 Java 中合并两个 XML 文件

parsing java xml api

2022-09-03 14:22:45

我有两个结构相似的XML文件，我希望将它们合并为一个文件。目前我正在使用我在本教程中遇到的EL4J XML合并。但是，它不会像我预期的那样合并，例如，主要问题是它没有将两个文件合并到一个包含1，2，3和4的元素中。相反，它只是丢弃 1 和 2 或 3 和 4，具体取决于首先合并哪个文件。

因此，如果任何有XML合并经验的人都能告诉我我可能做错了什么，或者有人知道一个好的Java XML API能够根据需要合并文件，我将不胜感激。

非常感谢您提前提供的帮助

编辑：

真的可以做一些很好的建议，所以增加了一个赏金。我已经尝试了jdigital的建议，但仍然遇到XML合并的问题。

以下是我尝试合并的 XML 文件的结构类型的示例。

<run xmloutputversion="1.02">
    <info type="a" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="up" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="1">
                <state value="test" />
                <service value="gamma" />
            </result>
            <result id="2">
                <state value="test4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

<run xmloutputversion="1.02">
    <info type="b" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="down" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="3">
                <state value="testagain" />
                <service value="gamma2" />
            </result>
            <result id="4">
                <state value="testagain4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

预期输出

<run xmloutputversion="1.02">
    <info type="a" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="down" reason="somereason"/>
        <status state="up" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="1">
                <state value="test" />
                <service value="gamma" />
            </result>
            <result id="2">
                <state value="test4" />
                <service value="gamma4" />
            </result>
            <result id="3">
                <state value="testagain" />
                <service value="gamma2" />
            </result>
            <result id="4">
                <state value="testagain4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

答案 1

不是很优雅，但你可以用DOM解析器和XPath做到这一点：

public class MergeXmlDemo {

  public static void main(String[] args) throws Exception {
    // proper error/exception handling omitted for brevity
    File file1 = new File("merge1.xml");
    File file2 = new File("merge2.xml");
    Document doc = merge("/run/host/results", file1, file2);
    print(doc);
  }

  private static Document merge(String expression,
      File... files) throws Exception {
    XPathFactory xPathFactory = XPathFactory.newInstance();
    XPath xpath = xPathFactory.newXPath();
    XPathExpression compiledExpression = xpath
        .compile(expression);
    return merge(compiledExpression, files);
  }

  private static Document merge(XPathExpression expression,
      File... files) throws Exception {
    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
        .newInstance();
    docBuilderFactory
        .setIgnoringElementContentWhitespace(true);
    DocumentBuilder docBuilder = docBuilderFactory
        .newDocumentBuilder();
    Document base = docBuilder.parse(files[0]);

    Node results = (Node) expression.evaluate(base,
        XPathConstants.NODE);
    if (results == null) {
      throw new IOException(files[0]
          + ": expression does not evaluate to node");
    }

    for (int i = 1; i < files.length; i++) {
      Document merge = docBuilder.parse(files[i]);
      Node nextResults = (Node) expression.evaluate(merge,
          XPathConstants.NODE);
      while (nextResults.hasChildNodes()) {
        Node kid = nextResults.getFirstChild();
        nextResults.removeChild(kid);
        kid = base.importNode(kid, true);
        results.appendChild(kid);
      }
    }

    return base;
  }

  private static void print(Document doc) throws Exception {
    TransformerFactory transformerFactory = TransformerFactory
        .newInstance();
    Transformer transformer = transformerFactory
        .newTransformer();
    DOMSource source = new DOMSource(doc);
    Result result = new StreamResult(System.out);
    transformer.transform(source, result);
  }

}

这假设您可以同时在 RAM 中保存至少两个文档。

答案 2

我使用 XSLT 来合并 XML 文件。它允许我调整合并操作，以便将内容拼凑在一起或在特定级别进行合并。这是更多的工作（XSLT语法有点特殊），但非常灵活。这里需要的一些东西

a）包括一个附加文件 b）复制原始文件 1：1 c）设计合并点时避免或不使用重复

a）一开始我有

<xsl:param name="mDocName">yoursecondfile.xml</xsl:param>
<xsl:variable name="mDoc" select="document($mDocName)" />

这允许使用$mDoc指向第二个文件

b）复制源代码树 1：1 的说明是 2 个模板：

<!-- Copy everything including attributes as default action -->
<xsl:template match="*">
    <xsl:element name="{name()}">
         <xsl:apply-templates select="@*" />
        <xsl:apply-templates />
    </xsl:element>
</xsl:template>

<xsl:template match="@*">
    <xsl:attribute name="{name()}"><xsl:value-of select="." /></xsl:attribute>
</xsl:template>

没有别的，你可以得到你的第一个源文件的1：1副本。适用于任何类型的 XML。合并部件是特定于文件的。假设您有具有事件 ID 属性的事件元素。您不希望出现重复的 ID。模板将如下所示：

 <xsl:template match="events">
    <xsl:variable name="allEvents" select="descendant::*" />
    <events>
        <!-- copies all events from the first file -->
        <xsl:apply-templates />
        <!-- Merge the new events in. You need to adjust the select clause -->
        <xsl:for-each select="$mDoc/logbook/server/events/event">
            <xsl:variable name="curID" select="@id" />
            <xsl:if test="not ($allEvents[@id=$curID]/@id = $curID)">
                <xsl:element name="event">
                    <xsl:apply-templates select="@*" />
                    <xsl:apply-templates />
                </xsl:element>
            </xsl:if>
        </xsl:for-each>
    </properties>
</xsl:template>

当然，您可以比较其他内容，例如标签名称等。此外，合并的深度也取决于您。如果您没有要比较的键，则构造将变得更容易，例如对于 log：

 <xsl:template match="logs">
     <xsl:element name="logs">
          <xsl:apply-templates select="@*" />
          <xsl:apply-templates />
          <xsl:apply-templates select="$mDoc/logbook/server/logs/log" />
    </xsl:element>

要在 Java 中运行 XSLT，请使用：

    Source xmlSource = new StreamSource(xmlFile);
    Source xsltSource = new StreamSource(xsltFile);
    Result xmlResult = new StreamResult(resultFile);
    TransformerFactory transFact = TransformerFactory.newInstance();
    Transformer trans = transFact.newTransformer(xsltSource);
    // Load Parameters if we have any
    if (ParameterMap != null) {
       for (Entry<String, String> curParam : ParameterMap.entrySet()) {
            trans.setParameter(curParam.getKey(), curParam.getValue());
       }
    }
    trans.transform(xmlSource, xmlResult);

或者您下载 Saxon SAX 解析器并从命令行执行（Linux shell 示例）：

#!/bin/bash
notify-send -t 500 -u low -i gtk-dialog-info "Transforming $1 with $2 into $3 ..."
# That's actually the only relevant line below
java -cp saxon9he.jar net.sf.saxon.Transform -t -s:$1 -xsl:$2 -o:$3
notify-send -t 1000 -u low -i gtk-dialog-info "Extraction into $3 done!"

断续器