在 Elasticsearch 中返回一个查询中的所有记录

2022-09-04 05:50:00

我有一个弹性搜索数据库,并希望获取我网站页面上的所有记录。我写了一个bean,它连接到弹性搜索节点,搜索记录并返回一些响应。我的简单java代码,用于搜索,是

SearchResponse response = getClient().prepareSearch(indexName)
    .setTypes(typeName)              
    .setQuery(queryString("\*:*"))
    .setExplain(true)
    .execute().actionGet();

但是Elasticsearch将默认大小设置为10,我有10个命中作为响应。我的数据库中有 10 多条记录。如果我为搜索设置大小变得非常慢,这不是我想要的。Integer.MAX_VALUE

如何在可接受的时间内在一个操作中获取所有记录,而无需设置响应大小?


答案 1
public List<Map<String, Object>> getAllDocs(){
        int scrollSize = 1000;
        List<Map<String,Object>> esData = new ArrayList<Map<String,Object>>();
        SearchResponse response = null;
        int i = 0;
        while( response == null || response.getHits().hits().length != 0){
            response = client.prepareSearch(indexName)
                    .setTypes(typeName)
                       .setQuery(QueryBuilders.matchAllQuery())
                       .setSize(scrollSize)
                       .setFrom(i * scrollSize)
                    .execute()
                    .actionGet();
            for(SearchHit hit : response.getHits()){
                esData.add(hit.getSource());
            }
            i++;
        }
        return esData;
}

答案 2

当前排名最高的答案可以正常工作,但它需要在内存中加载整个结果列表,这可能会导致大型结果集出现内存问题,并且在任何情况下都是不必要的。

我创建了一个Java类,它实现了一个很好的overs,允许迭代所有结果。在内部,它通过发出包含该字段的查询来处理分页,并且它只在内存中保留一页结果IteratorSearchHitfrom:

用法:

// build your query here -- no need for setFrom(int)
SearchRequestBuilder requestBuilder = client.prepareSearch(indexName)
                                            .setTypes(typeName)
                                            .setQuery(QueryBuilders.matchAllQuery()) 

SearchHitIterator hitIterator = new SearchHitIterator(requestBuilder);
while (hitIterator.hasNext()) {
    SearchHit hit = hitIterator.next();

    // process your hit
}

请注意,在创建 时,您不需要调用 ,因为这将由 中间完成。如果要指定页面的大小(即每页的搜索命中数),可以调用 ,否则将使用 ElasticSearch 的默认值。SearchRequestBuildersetFrom(int)SearchHitIteratorsetSize(int)

搜索热门Iterator:

import java.util.Iterator;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.search.SearchHit;

public class SearchHitIterator implements Iterator<SearchHit> {

    private final SearchRequestBuilder initialRequest;

    private int searchHitCounter;
    private SearchHit[] currentPageResults;
    private int currentResultIndex;

    public SearchHitIterator(SearchRequestBuilder initialRequest) {
        this.initialRequest = initialRequest;
        this.searchHitCounter = 0;
        this.currentResultIndex = -1;
    }

    @Override
    public boolean hasNext() {
        if (currentPageResults == null || currentResultIndex + 1 >= currentPageResults.length) {
            SearchRequestBuilder paginatedRequestBuilder = initialRequest.setFrom(searchHitCounter);
            SearchResponse response = paginatedRequestBuilder.execute().actionGet();
            currentPageResults = response.getHits().getHits();

            if (currentPageResults.length < 1) return false;

            currentResultIndex = -1;
        }

        return true;
    }

    @Override
    public SearchHit next() {
        if (!hasNext()) return null;

        currentResultIndex++;
        searchHitCounter++;
        return currentPageResults[currentResultIndex];
    }

}

事实上,意识到拥有这样一个类是多么方便,我想知道为什么ElasticSearch的Java客户端不提供类似的东西。


推荐