当前排名最高的答案可以正常工作,但它需要在内存中加载整个结果列表,这可能会导致大型结果集出现内存问题,并且在任何情况下都是不必要的。
我创建了一个Java类,它实现了一个很好的overs,允许迭代所有结果。在内部,它通过发出包含该字段的查询来处理分页,并且它只在内存中保留一页结果。Iterator
SearchHit
from:
用法:
// build your query here -- no need for setFrom(int)
SearchRequestBuilder requestBuilder = client.prepareSearch(indexName)
.setTypes(typeName)
.setQuery(QueryBuilders.matchAllQuery())
SearchHitIterator hitIterator = new SearchHitIterator(requestBuilder);
while (hitIterator.hasNext()) {
SearchHit hit = hitIterator.next();
// process your hit
}
请注意,在创建 时,您不需要调用 ,因为这将由 中间完成。如果要指定页面的大小(即每页的搜索命中数),可以调用 ,否则将使用 ElasticSearch 的默认值。SearchRequestBuilder
setFrom(int)
SearchHitIterator
setSize(int)
搜索热门Iterator:
import java.util.Iterator;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.search.SearchHit;
public class SearchHitIterator implements Iterator<SearchHit> {
private final SearchRequestBuilder initialRequest;
private int searchHitCounter;
private SearchHit[] currentPageResults;
private int currentResultIndex;
public SearchHitIterator(SearchRequestBuilder initialRequest) {
this.initialRequest = initialRequest;
this.searchHitCounter = 0;
this.currentResultIndex = -1;
}
@Override
public boolean hasNext() {
if (currentPageResults == null || currentResultIndex + 1 >= currentPageResults.length) {
SearchRequestBuilder paginatedRequestBuilder = initialRequest.setFrom(searchHitCounter);
SearchResponse response = paginatedRequestBuilder.execute().actionGet();
currentPageResults = response.getHits().getHits();
if (currentPageResults.length < 1) return false;
currentResultIndex = -1;
}
return true;
}
@Override
public SearchHit next() {
if (!hasNext()) return null;
currentResultIndex++;
searchHitCounter++;
return currentPageResults[currentResultIndex];
}
}
事实上,意识到拥有这样一个类是多么方便,我想知道为什么ElasticSearch的Java客户端不提供类似的东西。