为什么Arrays.binarySearch与遍历数组相比没有提高性能?
我尝试了解决Hackerland无线电发射机编程挑战的机会。
总而言之,挑战如下:
Hackerland是一个拥有n栋房屋的一维城市,其中每个房屋i位于x轴上的某个x i处。市长想在城市房屋的屋顶上安装无线电发射器。每个发射器都有一个范围k,这意味着它可以将信号传输到≤k单位距离之外的所有房屋。
给定一张哈克兰的地图和k的值,你能找到覆盖每个房子所需的最小发射器数量吗?
我的实现如下:
package biz.tugay;
import java.util.*;
public class HackerlandRadioTransmitters {
public static int minNumOfTransmitters(int[] houseLocations, int transmitterRange) {
// Sort and remove duplicates..
houseLocations = uniqueHouseLocationsSorted(houseLocations);
int towerCount = 0;
for (int nextHouseNotCovered = 0; nextHouseNotCovered < houseLocations.length; ) {
final int towerLocation = HackerlandRadioTransmitters.findNextTowerIndex(houseLocations, nextHouseNotCovered, transmitterRange);
towerCount++;
nextHouseNotCovered = HackerlandRadioTransmitters.nextHouseNotCoveredIndex(houseLocations, towerLocation, transmitterRange);
if (nextHouseNotCovered == -1) {
break;
}
}
return towerCount;
}
public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
int towerIndex = houseNotCoveredIndex;
int loop = 0;
while (true) {
loop++;
if (towerIndex == houseLocations.length - 1) {
break;
}
if (farthestHouseLocationAllowed >= houseLocations[towerIndex + 1]) {
towerIndex++;
continue;
}
break;
}
System.out.println("findNextTowerIndex looped : " + loop);
return towerIndex;
}
public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
int notCoveredHouseIndex = towerIndex + 1;
int loop = 0;
while (notCoveredHouseIndex < houseLocations.length) {
loop++;
final int locationOfHouseBeingChecked = houseLocations[notCoveredHouseIndex];
if (locationOfHouseBeingChecked > towerCoversUntil) {
break; // Tower does not cover the house anymore, break the loop..
}
notCoveredHouseIndex++;
}
if (notCoveredHouseIndex == houseLocations.length) {
notCoveredHouseIndex = -1;
}
System.out.println("nextHouseNotCoveredIndex looped : " + loop);
return notCoveredHouseIndex;
}
public static int[] uniqueHouseLocationsSorted(final int[] houseLocations) {
Arrays.sort(houseLocations);
final HashSet<Integer> integers = new HashSet<>();
final int[] houseLocationsUnique = new int[houseLocations.length];
int innerCounter = 0;
for (int houseLocation : houseLocations) {
if (integers.contains(houseLocation)) {
continue;
}
houseLocationsUnique[innerCounter] = houseLocation;
integers.add(houseLocationsUnique[innerCounter]);
innerCounter++;
}
return Arrays.copyOf(houseLocationsUnique, innerCounter);
}
}
我很确定这个实现是正确的。但请参阅函数中的详细信息:findNextTowerIndex 和 nextHouseNotCoveredIndex:它们逐个遍历数组!
我的一个测试如下:
static void test_01() throws FileNotFoundException {
final long start = System.currentTimeMillis();
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
}
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381);
assert minNumOfTransmitters == 1;
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
}
输入.txt可以从此处下载。(这不是这个问题中最重要的细节,但仍然..)所以我们有一个73382个房子的数组,我故意设置了发射器范围,所以我有很多循环的方法:
以下是我的机器中此测试的示例输出:
findNextTowerIndex looped : 38213
nextHouseNotCoveredIndex looped : 13785
Took: 359 milliseconds..
我也有这个测试,它不会断言任何东西,但只是保持时间:
static void test_02() throws FileNotFoundException {
final long start = System.currentTimeMillis();
for (int i = 0; i < 400; i ++) {
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
}
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
final int transmitterRange = ThreadLocalRandom.current().nextInt(1, 70000);
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
}
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
}
其中,我随机创建400个发射器范围,并运行程序400次。我将在我的机器中获得如下运行时间。
Took: 20149 milliseconds..
所以现在,我说,为什么我不使用二进制搜索而不是遍历数组,并按如下方式更改我的实现:
public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
int nextTowerIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, farthestHouseLocationAllowed);
if (nextTowerIndex < 0) {
nextTowerIndex = -nextTowerIndex;
nextTowerIndex = nextTowerIndex -2;
}
return nextTowerIndex;
}
public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
int nextHouseNotCoveredIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, towerCoversUntil);
if (-nextHouseNotCoveredIndex > houseLocations.length) {
return -1;
}
if (nextHouseNotCoveredIndex < 0) {
nextHouseNotCoveredIndex = - (nextHouseNotCoveredIndex + 1);
return nextHouseNotCoveredIndex;
}
return nextHouseNotCoveredIndex + 1;
}
我期待一个伟大的性能提升,因为现在我最多会循环log(N)次,而不是O(N)。。所以test_01输出:
Took: 297 milliseconds..
请记住,它是“已用:359毫秒”。以前。对于test_02:
Took: 18047 milliseconds..
因此,使用数组遍历实现,我总是在 20 秒左右获得值,在二进制搜索实现中始终获得 18 - 19 秒的值。
我期望使用Arrays.binarySearch获得更好的性能提升,但显然情况并非如此,为什么会这样?我错过了什么?我是否需要一个超过 73382 的数组才能看到好处,还是无关紧要?
编辑 #01
在@huck_cussler的评论之后,我尝试将我拥有的数据集(使用随机数)加倍和三倍,并尝试运行test02(当然,在测试本身中将数组大小增加三倍..)。对于线性实现,时代是这样的:
Took: 18789 milliseconds..
Took: 34396 milliseconds..
Took: 53504 milliseconds..
对于二进制搜索实现,我得到了如下值:
Took: 18644 milliseconds..
Took: 33831 milliseconds..
Took: 52886 milliseconds..