Hadoop 中的辅助排序

java mapreduce hadoop hadoop2 hadoop-partitioning

2022-09-03 08:15:55

我正在开发一个hadoop项目，在多次访问各种博客并阅读文档之后，我意识到我需要使用hadoop框架提供的辅助排序功能。

我的输入格式为：

DESC(String) Price(Integer) and some other Text

我希望化简器中的值是 Price 的降序。此外，在比较DESC时，我有一个方法，它采用两个字符串和一个百分比，如果两个字符串之间的相似性等于或大于百分比，那么我应该将它们视为相等。

问题是在Reduce Job完成后，我可以看到一些类似于其他字符串的DESC，但它们位于不同的组中。

这是我的比较组合键的方法

public int compareTo(VendorKey o) {
    int result =-
    result = compare(token, o.token, ":") >= percentage ? 0:1;
    if (result == 0) {
        return pid> o.pid  ?-1: pid < o.pid ?1:0;
    }
    return result;
}

和比较比较器分组的方法

public int compare(WritableComparable a, WritableComparable b) {
    VendorKey one = (VendorKey) a;
    VendorKey two = (VendorKey) b;
    int result = ClusterUtil.compare(one.getToken(), two.getToken(), ":") >= one.getPercentage() ? 0 : 1;
    // if (result != 0)
    // return two.getToken().compareTo(one.getToken());
    return result;
}

答案 1

您的方法似乎违反了要求等于的通用协定。compareTosgn(x.compareTo(y))-sgn(y.compareTo(x))

答案 2

在 customWriteable 之后，为一个具有复合键和 NullWriteable 值的基本分区程序提供一个。例如：

public class SecondarySortBasicPartitioner extends
    Partitioner<CompositeKeyWritable, NullWritable> {

    public int getPartition(CompositeKeyWritable key, NullWritable value,
            int numReduceTasks) {

        return (key.DEPT().hashCode() % numReduceTasks);
    }
}

在此之后，指定键排序比较器，并使用2个复合键可写变量进行分组。