如何使用Open nlp的分块解析器提取名词短语

java nlp stanford-nlp opennlp

2022-09-04 21:43:10

我是自然语言处理的新手。我需要从文本中提取名词短语。到目前为止，我已经使用open nlp的分块解析器来解析我的文本以获得树结构。但是我无法从树结构中提取名词短语，在开放nlp中是否有任何正则表达式模式，以便我可以使用它来提取名词短语。

以下是我正在使用的代码

    InputStream is = new FileInputStream("en-parser-chunking.bin");
    ParserModel model = new ParserModel(is);
    Parser parser = ParserFactory.create(model);
    Parse topParses[] = ParserTool.parseLine(line, parser, 1);
        for (Parse p : topParses){
                 p.show();}

在这里，我得到的输出是

（TOP （S （S （ADJP （JJ WELCOME）（PP （TO）（NP （NNP Big）（NNP Data.）））））（S （NP （PRP We））（VP （VP （VBP are）（VP （VBG working）（PP （IN ON）（NP （NNP Natural）（NNP Language）（NNP Processing.can）））（NP （DT some）（CD one）（NN help））（NP （PRP us））（PP （IN）（S （VP （VBG extracting）（NP （DT the）（NN noun）（NNS phrases））（PP （IN from）（NP （DT the）（NN tree）（WP stucture.））））

有人可以帮我获得名词短语，如NP，NNP，NN等。有人可以告诉我是否需要使用任何其他NP Chunker来获取名词短语？是否有任何正则表达式模式来实现相同的目标。

请帮帮我。

提前致谢

哎呀。

答案 1

对象是一棵树;您可以使用和来导航树。ParsegetParent()getChildren()getType()

List<Parse> nounPhrases;

public void getNounPhrases(Parse p) {
    if (p.getType().equals("NP")) {
         nounPhrases.add(p);
    }
    for (Parse child : p.getChildren()) {
         getNounPhrases(child);
    }
}

答案 2

如果您只想要名词短语，请使用句子分块器而不是树解析器。代码是这样的（你需要从你得到解析器模型的同一位置获取模型）

public void chunk() {
    InputStream modelIn = null;
    ChunkerModel model = null;

    try {
      modelIn = new FileInputStream("en-chunker.bin");
      model = new ChunkerModel(modelIn);
    }
    catch (IOException e) {
      // Model loading failed, handle the error
      e.printStackTrace();
    }
    finally {
      if (modelIn != null) {
        try {
          modelIn.close();
        }
        catch (IOException e) {
        }
      }
    }

//After the model is loaded a Chunker can be instantiated.


    ChunkerME chunker = new ChunkerME(model);



    String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s",
      "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
      "extending", "its", "contract", "with", "Boeing", "Co.", "to",
      "provide", "structural", "parts", "for", "Boeing", "'s", "747",
      "jetliners", "."};

    String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN",
      "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
      "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
      "."};

    String tag[] = chunker.chunk(sent, pos);
  }

然后查看所需类型的标记数组

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api