如何使用Lucene的新InfinationIngInfixSuggester API实现自动建议?附录

我是Lucene的绿手,我想实现自动建议,就像谷歌一样,当我输入像“G”这样的字符时,它会给我一个列表,你可以尝试一下你自己。

我在整个网络上搜索过。没有人这样做过,它给了我们一些新的工具在包建议

但我需要一个例子来告诉我如何做到这一点

有人可以帮忙吗?


答案 1

我会给你一个非常完整的例子,告诉你如何使用.在此示例中,我们将假装我们是亚马逊,并且想要自动完成产品搜索字段。我们将利用Lucene建议系统的功能来实现以下目标:AnalyzingInfixSuggester

  1. 排名结果:我们将首先推荐最受欢迎的匹配产品。
  2. 区域限制结果:我们只会建议在客户所在国家/地区销售的商品。
  3. 产品照片:我们会将产品照片 URL 存储在建议索引中,以便我们可以在搜索结果中显示它们,而无需进行额外的数据库查找。

首先,我将定义一个简单的类来保存有关 Product 中产品的信息.java:

import java.util.Set;

class Product implements java.io.Serializable
{
    String name;
    String image;
    String[] regions;
    int numberSold;

    public Product(String name, String image, String[] regions,
                   int numberSold) {
        this.name = name;
        this.image = image;
        this.regions = regions;
        this.numberSold = numberSold;
    }
}

要使用 的方法来索引 中的记录,您需要向它传递一个实现接口的对象。An 可以访问每条记录的上下文有效负载权重AnalyzingInfixSuggesterbuildorg.apache.lucene.search.suggest.InputIteratorInputIterator

关键是您实际要搜索并自动完成的文本。在我们的示例中,它将是产品的名称。

上下文是一组额外的任意数据,可用于筛选记录。在我们的示例中,上下文是我们将向其交付特定产品的国家/地区的一组 ISO 代码。

负载是要存储在记录索引中的其他任意数据。在此示例中,我们实际上将序列化每个实例,并将生成的字节存储为有效负载。然后,当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,如图像URL。Product

权重用于对建议结果进行排序;首先返回权重较高的结果。我们将使用给定产品的销售数量作为其权重。

以下是 ProductIterator.java的内容:

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;


class ProductIterator implements InputIterator
{
    private Iterator<Product> productIterator;
    private Product currentProduct;

    ProductIterator(Iterator<Product> productIterator) {
        this.productIterator = productIterator;
    }

    public boolean hasContexts() {
        return true;
    }

    public boolean hasPayloads() {
        return true;
    }

    public Comparator<BytesRef> getComparator() {
        return null;
    }

    // This method needs to return the key for the record; this is the
    // text we'll be autocompleting against.
    public BytesRef next() {
        if (productIterator.hasNext()) {
            currentProduct = productIterator.next();
            try {
                return new BytesRef(currentProduct.name.getBytes("UTF8"));
            } catch (UnsupportedEncodingException e) {
                throw new Error("Couldn't convert to UTF-8");
            }
        } else {
            return null;
        }
    }

    // This method returns the payload for the record, which is
    // additional data that can be associated with a record and
    // returned when we do suggestion lookups.  In this example the
    // payload is a serialized Java object representing our product.
    public BytesRef payload() {
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(currentProduct);
            out.close();
            return new BytesRef(bos.toByteArray());
        } catch (IOException e) {
            throw new Error("Well that's unfortunate.");
        }
    }

    // This method returns the contexts for the record, which we can
    // use to restrict suggestions.  In this example we use the
    // regions in which a product is sold.
    public Set<BytesRef> contexts() {
        try {
            Set<BytesRef> regions = new HashSet();
            for (String region : currentProduct.regions) {
                regions.add(new BytesRef(region.getBytes("UTF8")));
            }
            return regions;
        } catch (UnsupportedEncodingException e) {
            throw new Error("Couldn't convert to UTF-8");
        }
    }

    // This method helps us order our suggestions.  In this example we
    // use the number of products of this type that we've sold.
    public long weight() {
        return currentProduct.numberSold;
    }
}

在我们的驱动程序中,我们将执行以下操作:

  1. 在 RAM 中创建索引目录。
  2. 创建一个 .StandardTokenizer
  3. 使用 RAM 目录和分词器创建一个。AnalyzingInfixSuggester
  4. 使用 索引许多产品。ProductIterator
  5. 打印一些示例查找的结果。

这是驱动程序,SuggestProducts.java:

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;

public class SuggestProducts
{
    // Get suggestions given a prefix and a region.
    private static void lookup(AnalyzingInfixSuggester suggester, String name,
                               String region) {
        try {
            List<Lookup.LookupResult> results;
            HashSet<BytesRef> contexts = new HashSet<BytesRef>();
            contexts.add(new BytesRef(region.getBytes("UTF8")));
            // Do the actual lookup.  We ask for the top 2 results.
            results = suggester.lookup(name, contexts, 2, true, false);
            System.out.println("-- \"" + name + "\" (" + region + "):");
            for (Lookup.LookupResult result : results) {
                System.out.println(result.key);
                Product p = getProduct(result);
                if (p != null) {
                    System.out.println("  image: " + p.image);
                    System.out.println("  # sold: " + p.numberSold);
                }
            }
        } catch (IOException e) {
            System.err.println("Error");
        }
    }

    // Deserialize a Product from a LookupResult payload.
    private static Product getProduct(Lookup.LookupResult result)
    {
        try {
            BytesRef payload = result.payload;
            if (payload != null) {
                ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
                ObjectInputStream in = new ObjectInputStream(bis);
                Product p = (Product) in.readObject();
                return p;
            } else {
                return null;
            }
        } catch (IOException|ClassNotFoundException e) {
            throw new Error("Could not decode payload :(");
        }
    }

    public static void main(String[] args) {
        try {
            RAMDirectory index_dir = new RAMDirectory();
            StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
            AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
                Version.LUCENE_48, index_dir, analyzer);

            // Create our list of products.
            ArrayList<Product> products = new ArrayList<Product>();
            products.add(
                new Product(
                    "Electric Guitar",
                    "http://images.example/electric-guitar.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Electric Train",
                    "http://images.example/train.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Acoustic Guitar",
                    "http://images.example/acoustic-guitar.jpg",
                    new String[]{"US", "ZA"},
                    80));
            products.add(
                new Product(
                    "Guarana Soda",
                    "http://images.example/soda.jpg",
                    new String[]{"ZA", "IE"},
                    130));

            // Index the products with the suggester.
            suggester.build(new ProductIterator(products.iterator()));

            // Do some example lookups.
            lookup(suggester, "Gu", "US");
            lookup(suggester, "Gu", "ZA");
            lookup(suggester, "Gui", "CA");
            lookup(suggester, "Electric guit", "US");
        } catch (IOException e) {
            System.err.println("Error!");
        }
    }
}

下面是驱动程序的输出:

-- "Gu" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gu" (ZA):
Guarana Soda
  image: http://images.example/soda.jpg
  # sold: 130
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gui" (CA):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
-- "Electric guit" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100

附录

有一种方法可以避免写一个你可能会发现更容易的完整。您可以编写一个从其 和 方法返回的存根。将它的实例传递给 的方法:InputIteratorInputIteratornullnextpayloadcontextsAnalyzingInfixSuggesterbuild

suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));

然后,对于要编制索引的每个项目,调用 add 方法:AnalyzingInfixSuggester

suggester.add(text, contexts, weight, payload)

索引所有内容后,调用 :refresh

suggester.refresh();

如果要对大量数据编制索引,则可以使用此方法对多个线程显著加快索引速度:调用 ,然后对项目使用多个线程,最后调用 。buildaddrefresh

[编辑于 2015-04-23 以演示有效负载中的反序列化信息。LookupResult


答案 2

推荐