在 Java 中对 UTF-16 字符串中的字符进行排序

2022-09-03 06:11:26

断续器

Java 使用两个字符来表示 UTF-16。使用 Arrays.sort(不稳定排序)会弄乱字符排序。我应该将char[]转换为int[]还是有更好的方法?

Java 表示一个字符为 UTF-16。但类本身是包装的(16 位)。对于 UTF-16,它将是一个包含两个 s(32 位)的数组。Charactercharchar

使用内置排序对 UTF-16 字符字符串进行排序会使数据变得混乱。(Arrays.sort使用双透视快速排序,Collections.sort使用Arrays.sort来完成繁重的工作。

具体来说,你是将char[]转换为int[],还是有更好的排序方法?

import java.util.Arrays;

public class Main {
    public static void main(String[] args) {
        int[] utfCodes = {128513, 128531, 128557};
        String emojis = new String(utfCodes, 0, 3);
        System.out.println("Initial String: " + emojis);

        char[] chars = emojis.toCharArray();
        Arrays.sort(chars);
        System.out.println("Sorted String: " + new String(chars));
    }
}

输出:

Initial String: 						

答案 1

I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.

Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.

public static void main(String[] args) {
    int[] utfCodes = {128531, 128557, 128513};
    String emojis = new String(utfCodes, 0, 3);
    System.out.println("Initial String: " + emojis);

    int[] codePoints = emojis.codePoints().sorted().toArray();
    System.out.println("Sorted String: " + new String(codePoints, 0, 3));
}

Initial String:


答案 2