如何使用Lucene的新InfinationIngInfixSuggester API实现自动建议?附录
我是Lucene的绿手,我想实现自动建议,就像谷歌一样,当我输入像“G”这样的字符时,它会给我一个列表,你可以尝试一下你自己。
我在整个网络上搜索过。没有人这样做过,它给了我们一些新的工具在包建议
但我需要一个例子来告诉我如何做到这一点
有人可以帮忙吗?
我是Lucene的绿手,我想实现自动建议,就像谷歌一样,当我输入像“G”这样的字符时,它会给我一个列表,你可以尝试一下你自己。
我在整个网络上搜索过。没有人这样做过,它给了我们一些新的工具在包建议
但我需要一个例子来告诉我如何做到这一点
有人可以帮忙吗?
我会给你一个非常完整的例子,告诉你如何使用.在此示例中,我们将假装我们是亚马逊,并且想要自动完成产品搜索字段。我们将利用Lucene建议系统的功能来实现以下目标:AnalyzingInfixSuggester
首先,我将定义一个简单的类来保存有关 Product 中产品的信息.java:
import java.util.Set;
class Product implements java.io.Serializable
{
String name;
String image;
String[] regions;
int numberSold;
public Product(String name, String image, String[] regions,
int numberSold) {
this.name = name;
this.image = image;
this.regions = regions;
this.numberSold = numberSold;
}
}
要使用 的方法来索引 中的记录,您需要向它传递一个实现接口的对象。An 可以访问每条记录的键、上下文、有效负载和权重。AnalyzingInfixSuggester
build
org.apache.lucene.search.suggest.InputIterator
InputIterator
关键是您实际要搜索并自动完成的文本。在我们的示例中,它将是产品的名称。
上下文是一组额外的任意数据,可用于筛选记录。在我们的示例中,上下文是我们将向其交付特定产品的国家/地区的一组 ISO 代码。
负载是要存储在记录索引中的其他任意数据。在此示例中,我们实际上将序列化每个实例,并将生成的字节存储为有效负载。然后,当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,如图像URL。Product
权重用于对建议结果进行排序;首先返回权重较高的结果。我们将使用给定产品的销售数量作为其权重。
以下是 ProductIterator.java的内容:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;
class ProductIterator implements InputIterator
{
private Iterator<Product> productIterator;
private Product currentProduct;
ProductIterator(Iterator<Product> productIterator) {
this.productIterator = productIterator;
}
public boolean hasContexts() {
return true;
}
public boolean hasPayloads() {
return true;
}
public Comparator<BytesRef> getComparator() {
return null;
}
// This method needs to return the key for the record; this is the
// text we'll be autocompleting against.
public BytesRef next() {
if (productIterator.hasNext()) {
currentProduct = productIterator.next();
try {
return new BytesRef(currentProduct.name.getBytes("UTF8"));
} catch (UnsupportedEncodingException e) {
throw new Error("Couldn't convert to UTF-8");
}
} else {
return null;
}
}
// This method returns the payload for the record, which is
// additional data that can be associated with a record and
// returned when we do suggestion lookups. In this example the
// payload is a serialized Java object representing our product.
public BytesRef payload() {
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(currentProduct);
out.close();
return new BytesRef(bos.toByteArray());
} catch (IOException e) {
throw new Error("Well that's unfortunate.");
}
}
// This method returns the contexts for the record, which we can
// use to restrict suggestions. In this example we use the
// regions in which a product is sold.
public Set<BytesRef> contexts() {
try {
Set<BytesRef> regions = new HashSet();
for (String region : currentProduct.regions) {
regions.add(new BytesRef(region.getBytes("UTF8")));
}
return regions;
} catch (UnsupportedEncodingException e) {
throw new Error("Couldn't convert to UTF-8");
}
}
// This method helps us order our suggestions. In this example we
// use the number of products of this type that we've sold.
public long weight() {
return currentProduct.numberSold;
}
}
在我们的驱动程序中,我们将执行以下操作:
StandardTokenizer
AnalyzingInfixSuggester
ProductIterator
这是驱动程序,SuggestProducts.java:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;
public class SuggestProducts
{
// Get suggestions given a prefix and a region.
private static void lookup(AnalyzingInfixSuggester suggester, String name,
String region) {
try {
List<Lookup.LookupResult> results;
HashSet<BytesRef> contexts = new HashSet<BytesRef>();
contexts.add(new BytesRef(region.getBytes("UTF8")));
// Do the actual lookup. We ask for the top 2 results.
results = suggester.lookup(name, contexts, 2, true, false);
System.out.println("-- \"" + name + "\" (" + region + "):");
for (Lookup.LookupResult result : results) {
System.out.println(result.key);
Product p = getProduct(result);
if (p != null) {
System.out.println(" image: " + p.image);
System.out.println(" # sold: " + p.numberSold);
}
}
} catch (IOException e) {
System.err.println("Error");
}
}
// Deserialize a Product from a LookupResult payload.
private static Product getProduct(Lookup.LookupResult result)
{
try {
BytesRef payload = result.payload;
if (payload != null) {
ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
ObjectInputStream in = new ObjectInputStream(bis);
Product p = (Product) in.readObject();
return p;
} else {
return null;
}
} catch (IOException|ClassNotFoundException e) {
throw new Error("Could not decode payload :(");
}
}
public static void main(String[] args) {
try {
RAMDirectory index_dir = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
Version.LUCENE_48, index_dir, analyzer);
// Create our list of products.
ArrayList<Product> products = new ArrayList<Product>();
products.add(
new Product(
"Electric Guitar",
"http://images.example/electric-guitar.jpg",
new String[]{"US", "CA"},
100));
products.add(
new Product(
"Electric Train",
"http://images.example/train.jpg",
new String[]{"US", "CA"},
100));
products.add(
new Product(
"Acoustic Guitar",
"http://images.example/acoustic-guitar.jpg",
new String[]{"US", "ZA"},
80));
products.add(
new Product(
"Guarana Soda",
"http://images.example/soda.jpg",
new String[]{"ZA", "IE"},
130));
// Index the products with the suggester.
suggester.build(new ProductIterator(products.iterator()));
// Do some example lookups.
lookup(suggester, "Gu", "US");
lookup(suggester, "Gu", "ZA");
lookup(suggester, "Gui", "CA");
lookup(suggester, "Electric guit", "US");
} catch (IOException e) {
System.err.println("Error!");
}
}
}
下面是驱动程序的输出:
-- "Gu" (US):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
Acoustic Guitar
image: http://images.example/acoustic-guitar.jpg
# sold: 80
-- "Gu" (ZA):
Guarana Soda
image: http://images.example/soda.jpg
# sold: 130
Acoustic Guitar
image: http://images.example/acoustic-guitar.jpg
# sold: 80
-- "Gui" (CA):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
-- "Electric guit" (US):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
有一种方法可以避免写一个你可能会发现更容易的完整。您可以编写一个从其 和 方法返回的存根。将它的实例传递给 的方法:InputIterator
InputIterator
null
next
payload
contexts
AnalyzingInfixSuggester
build
suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));
然后,对于要编制索引的每个项目,调用 add
方法:AnalyzingInfixSuggester
suggester.add(text, contexts, weight, payload)
索引所有内容后,调用 :refresh
suggester.refresh();
如果要对大量数据编制索引,则可以使用此方法对多个线程显著加快索引速度:调用 ,然后对项目使用多个线程,最后调用 。build
add
refresh
[编辑于 2015-04-23 以演示有效负载中的反序列化信息。LookupResult