地图聚类算法

我目前的代码非常快,但我需要让它更快,这样我们才能容纳更多的标记。有什么建议吗?

笔记:

  • 当 SQL 语句按标记名称排序时,代码运行得最快 - 这本身在聚类标记方面执行非常部分的工作(同一位置的标记名称通常是,但并不总是相似的)。
  • 我无法对标记进行预聚类,因为它们可以动态搜索和筛选。
  • 我尝试过基于网格的聚类 - 但结果通常不是很好。
  • 我知道聚类在墨卡托投影上略微偏斜。
  • 我对商业群集服务不感兴趣。

代码:

$singleMarkers = array();
$clusterMarkers = array();

while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare marker against all remaining markers.
    foreach ($markers as $key => $compareMarker) {
        // This function returns the distance between two markers, at a defined zoom level.
        $pixels = pixelDistance($marker['lat'], $marker['lng'], $compareMarker['lat'], $compareMarker['lng'], $zoomLevel);
        // If two markers are closer than defined distance, remove compareMarker from array and add to cluster.
        if ($pixels < $distance) {
            unset($markers[$key]);
            $cluster[] = $compareMarker;
        }
    }

    // If a marker was added to cluster, also add the marker we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return sqrt(pow(($x1-$x2),2) + pow(($y1-$y2),2)) >> (21 - $zoom); //21 is the max zoom level
}

更新

下面是当前代码:

$singleMarkers = array();
$clusterMarkers = array();

// Minimum distance between markers to be included in a cluster, at diff. zoom levels
$DISTANCE = (10000000 >> $ZOOM) / 100000;

// Loop until all markers have been compared.
while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare against all markers which are left.
    foreach ($markers as $key => $target) {
        $pixels = abs($marker['lat']-$target['lat']) + abs($marker['lng']-$target['lng']);

        // If the two markers are closer than given distance remove target marker from array and add it to cluster.
        if ($pixels < $DISTANCE) {
            unset($markers[$key]);
            $cluster[] = $target;
        }
    }

    // If a marker has been added to cluster, add also the one we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}

答案 1

你真的需要计算欧几里得距离吗?如果您只是比较距离的相对幅度,则可以使用曼哈顿距离,这将节省您两个呼叫和一个:pow()sqrt()

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return ($x1-$x2) + ($y1-$y2) >> (21 - $zoom);
}

不确定您是否需要该位进行计算,因此我将其保留了下来。但是,除非您确实需要在其他地方使用计算的距离值,否则您可能只需直接使用纬度/经度(无需乘以任何东西)并取曼哈顿距离,假设您预先计算以适合该度量值,这在计算方面将比强制所有距离以适合的单位和大小便宜得多。>> (21 - $zoom)$distance$distance

编辑:当我去年研究这个问题时,我在维基百科上发现了一些有用的东西 - 是的,它可能会发生;-)

我还可以强烈推荐《编程集体智慧:构建智能Web 2.0应用程序》一书,该书深入探讨了聚类,不仅应用于地理数据,还应用于数据分析的其他领域。


答案 2

扩展约翰所说的内容,我认为您应该尝试内联该函数。PHP中的函数调用非常慢,所以你应该从中获得一个不错的加速。


推荐