统一排列/分布数组项目

我有一个带有属性的多维关联数组。它看起来像这样:type

$data = array(
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "B"),
  array( "name" => "SomeName", "type" => "B"),
  array( "name" => "SomeName", "type" => "B"),
  array( "name" => "SomeName", "type" => "C"),
  array( "name" => "SomeName", "type" => "C")
);

我想重新排列它,使项目分布更均匀(如果可能的话,重复类型最少)。它应该看起来像这样:

array(
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "B"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "C"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "B"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "C"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "B")
);

到目前为止,我尝试的是找到每种类型的计数和总数:

$count_a = 5;
$count_b = 3;
$count_c = 2;
$total = 10;

以及每种类型的比率:

$ratio_a = 0.5; //(5/10)
$ratio_b = 0.3; //(3/10)
$ratio_c = 0.2; //(2/10)

我只是被困在这里。我是否应该尝试使用数字创建新属性,然后根据它进行排序?或者也许以某种方式使用模运算符?我还尝试将项目分成3个不同的数组,如果这样会更容易。index


答案 1

这是一个尽可能避免重复模式的解决方案。

因为它会产生AAAAABBBCCABABABACAC;

因为它会产生AAAAABBBCCCABCABABACAC;

除了按类型计数排序外,它还以线性时间运行(它接受未排序的数据数组)。结果在 中。有关说明,请参见下文。$distributed_data

法典

$data = array(
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "A"),
  array( "name" => "SomeName", "type" => "B"),
  array( "name" => "SomeName", "type" => "B"),
);

$distributed_data = array();
$counts = array();
$size = sizeof($data);

// Count values
foreach ($data as $entry) {
  $counts[$entry["type"]] = isset($counts[$entry["type"]]) ? $counts[$entry["type"]] + 1 : 1;
}

// Set counter
for ($i = 0; $i < $size; $i++) {
  $data[$i]["count"] = $counts[$data[$i]["type"]];
}

// Sort by count
usort($data, function($entry1, $entry2) {
    return $entry2["count"] <=> $entry1["count"];
});

// Generate the distributed array
$max_length = $data[0]["count"];
$rows = ceil($size / $max_length);
$last_row = ($size - 1) % $max_length + 1;
$row_cycle = $rows;

$row = 0;
$col = 0;
for ($i = 0; $i < $size; $i++) {
  if ($i == $rows * $last_row) {
    $row_cycle -= 1;
  }

  $distributed_data[$i] = $data[$row * $max_length + $col];

  $row = ($row + 1) % $row_cycle;
  if ($row == 0) {
    $col++;
  }
}

解释

首先,根据每种类型的重复次数对条目进行排序。例如: 成为。CBBCAABBBBAACC

然后想象一个表的列数与最频繁出现的列数一样多(例如,如果有,则最频繁出现的是 4,该表将有 4 列)。AAAABBCC

然后将所有条目从左到右写入表中,并根据需要跳转到新行。

例如,你会得到一个这样的表格:AAAAABBBCCC

Table example

要生成最终的分布式数组,只需自上而下地读取条目,并根据需要移动到新列。

在上面的示例中,您将获得 .ABCABABACAC

获取重复条目的唯一方法是在一列中有两个相同的字符,或者在移动到右侧的列时遇到相同的字符。

第一种情况不会发生,因为字符组需要换行,而这不会发生,因为没有比列数更长的字符组(这就是我们定义表的方式)。

第二种情况只有在第二行未满时才会发生。例如: 使第二行保留两个空单元格。AAAABB


答案 2

算法:

function distribute($data) {
    $groups = [];
    foreach ($data as $row) {
        $groups[$row['type']][] = $row;
    }
    $groupSizes = array_map('count', $groups);
    asort($groupSizes);

    $result = [];
    foreach ($groupSizes as $type => $groupSize) {
        if (count($result) == 0) {
            $result = $groups[$type];
        } elseif (count($result) >= count($groups[$type])) {
            $result = merge($result, $groups[$type]);
        } else {
            $result = merge($groups[$type], $result);
        }
    }
    return $result;
}

function merge($a, $b) {
    $c1 = count($a);
    $c2 = count($b);
    $result = [];
    $i1 = $i2 = 0;
    while ($i1 < $c1) {
        $result[] = $a[$i1++];
        while ($i2 < $c2 && ($i2+1)/($c2+1) < ($i1+1)/($c1+1)) {
            $result[] = $b[$i2++];
        }
    }
    return $result;
}

主要思想是将数据拆分为组,并将下一个最小的组合并到结果中(从空结果开始)。

合并两个数组时,项目按浮点键排序,该浮点键在此行中计算(在流上)

while ($i2 < $c2 && ($i2+1)/($c2+1) < ($i1+1)/($c1+1))

floatKey = (index + 1) / (groupSize + 1)

(然而,这部分可以改进,因此到“角”(和)的距离将是两个项目之间距离的一半)。01

在领带时,来自较大组的项目是第一位的。

示例:合并和 的键将是 - 的 anf。结果将是AAAABBA0.2, 0.4, 0.6, 0.8B0.33, 0.66

A(0.2), B(0.33), A(0.4), A(0.6), B(0.66), A(0.8)

测试:

$testData = [
    'AAAAABBBCC',
    'AAAAABBBCCC',
    'ABBCCC',
    'AAAAAABBC',
    'AAAAAABBBBCCD',
    'AAAAAAAAAABC',
    'hpp',
    'stackoverflow',
    'ACCD', // :-)
];

$results = [];

foreach ($testData as $dataStr) {
    $a = str_split($dataStr);
    $data = [];
    foreach ($a as $type) {
        $data[] = ['type' => $type];
    }
    $result = distribute($data);
    $resultStr = implode(array_column($result, 'type'));
    $results[$dataStr] = $resultStr;
}
var_export($results);

测试结果:

'AAAAABBBCC' => 'BACABACABA',
'AAAAABBBCCC' => 'CABACABACAB',
'ABBCCC' => 'BCACBC',
'AAAAAABBC' => 'ABAACAABA',
'AAAAAABBBBCCD' => 'BACABADABACAB',
'AAAAAAAAAABC' => 'AAACAAAABAAA',
'hpp' => 'php',
'stackoverflow' => 'sakeofwlrovct',
'ACCD' => 'ACDC',

测试演示:http://rextester.com/BWBD90255

您可以轻松地向演示中添加更多测试用例。