如何找出一个句子是否是一个问题(询问性)?

是否有开源Java库/算法来查找特定文本是否是一个问题?
我正在开发一个问答系统,该系统需要分析用户输入的文本是否是问题。
我认为这个问题可以通过使用开源NLP库来解决,但它显然比简单的词性标记更复杂。因此,如果有人可以通过使用现有的开源NLP库来告诉算法,那也很好。
如果您知道使用数据挖掘来解决此问题的库/工具包,也请告诉我。虽然很难获得足够的数据用于训练目的,但我将能够使用堆栈交换数据进行训练。


答案 1

在问题的语法解析中,正确的结构将采用以下形式:

(SBARQ (WH+ (W+) ...)
       (SQ ...*
           (V+) ...*)
       (?))

因此,使用任何可用的语法解析器,具有具有嵌入式SQ(可选)的SBARQ节点的树将是输入是一个问题的指标。WH+节点(WHNP/WHADVP/WHADJP)包含问题干(谁/什么/何时/何地/为什么/如何),SQ包含倒置短语。

即:

(SBARQ 
  (WHNP 
    (WP What)) 
  (SQ 
    (VBZ is) 
    (NP 
      (DT the) 
      (NN question)))
  (. ?))

当然,有很多前面的子句会导致解析中的错误(可以解决),就像写得很差的问题一样。例如,这篇文章的标题“如何找出一个句子是否是一个问题?”将有一个SBARQ,但没有一个SQ。


答案 2

许多准问题/信息请求都是以陈述的语法形式提出的;例如:“我想知道是谁偷了我的自行车”。

我会放弃从其结构中确定用户输入是否是一个问题的所有希望,而只是假设它是一个问题,除非它明确不是一个问题。您可以采用迭代的交互式方法,以便系统可以优化其对用户输入的“理解”:

     User: I would like to know how many angels fit on the head of a pin. 
     System: Do you have a question about angels?
     User: Yes.          
     System: Do you want to know if angels are fit?
     User: No.
     System: Do you want to know if angels have heads?
     User: Possibly.         
     System: Do you want to know if angels have pins?
     User: No.         
     System: Do you want to know if angels are numerous?
     User: No.
     System: Do you want to know the dimensions of an angel?
     User: Yes.
     System: Do you mean angels in paintings?
     User: No.
     System: Do you mean angels in myth and religious writing?
     User: Yes.
     System: Angels are metaphysical beings.


     User: I hear that Pennsylvania was founded by William Penn. Is that true?
     System: Do you have a question about hearing?
     User: No.
     System: Do you have a question about Pennsylvania?
     User: Yes.         
     System: Pennsylvania was founded by William Penn.
     User: When?         
     System: 1682.
     User: What does the name mean?
     System: What name?
     User: Pennsylvania!
     System: Do you want to know the meaning of Pennsylvania?
     User: Yes.
     System: Pennsylvania means Penn's Woods.