必要时用于URL编码的Java库(如浏览器)

2022-09-03 01:14:53

如果我把URL放到Web浏览器的地址栏,它会用. 也将编码为 。http://localhost:9000/space testhttp://localhost:9000/space%20testhttp://localhost:9000/specÁÉÍtesthttp://localhost:9000/spec%C3%81%C3%89%C3%8Dtest

如果将编码的URL放到地址栏(即 和)它们保持不变(它们不会被双重编码)。http://localhost:9000/space%20testhttp://localhost:9000/spec%C3%81%C3%89%C3%8Dtest

是否有任何Java API或库可以进行此编码?URL来自用户,所以我不知道它们是否被编码。

(如果没有,那么在输入字符串中搜索并在找不到时进行编码就足够了,或者是否有任何特殊情况无法正常工作?%

编辑:

URLEncoder.encode("space%20test", "UTF-8")返回不是我想要的,因为它是双重编码的。space%2520test

编辑 2:

此外,浏览器处理部分编码的URL,例如,好吧,没有对它们进行双重编码。在这种情况下,服务器将收到以下 URL:。它与 的编码形式相同。http://localhost:9000/specÁÉ%C3%8Dtesthttp://localhost:9000/spec%C3%81%C3%89%C3%8Dtest...specÁÉÍtest


答案 1

每个 Web 开发人员必须了解的有关 URL 编码的信息

网址编码说明

为什么我需要网址编码?

The URL specification RFC 1738 specifies that only a small set of characters 
can be used in a URL. Those characters are:

A to Z (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
a to z (abcdefghijklmnopqrstuvwxyz)
0 to 9 (0123456789)
$ (Dollar Sign)
- (Hyphen / Dash)
_ (Underscore)
. (Period)
+ (Plus sign)
! (Exclamation / Bang)
* (Asterisk / Star)
' (Single Quote)
( (Open Bracket)
) (Closing Bracket)

网址编码的工作原理是什么?

All offending characters are replaced by a % and a two digit hexadecimal value 
that represents the character in the proper ISO character set. Here are a 
couple of examples:

$ (Dollar Sign) becomes %24
& (Ampersand) becomes %26
+ (Plus) becomes %2B
, (Comma) becomes %2C
: (Colon) becomes %3A
; (Semi-Colon) becomes %3B
= (Equals) becomes %3D
? (Question Mark) becomes %3F
@ (Commercial A / At) becomes %40

简单示例:

import java.util.logging.Level;
import java.util.logging.Logger;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;

public class TextHelper {
    private static ScriptEngine engine = new ScriptEngineManager()
        .getEngineByName("JavaScript");

/**
 * Encoding if need escaping %$&+,/:;=?@<>#%
 *
 * @param str should be encoded
 * @return encoded Result 
 */
public static String escapeJavascript(String str) {
    try {
        return engine.eval(String.format("escape(\"%s\")", 
            str.replaceAll("%20", " "))).toString()
                .replaceAll("%3A", ":")
                .replaceAll("%2F", "/")
                .replaceAll("%3B", ";")
                .replaceAll("%40", "@")
                .replaceAll("%3C", "<")
                .replaceAll("%3E", ">")
                .replaceAll("%3D", "=")
                .replaceAll("%26", "&")
                .replaceAll("%25", "%")
                .replaceAll("%24", "$")
                .replaceAll("%23", "#")
                .replaceAll("%2B", "+")
                .replaceAll("%2C", ",")
                .replaceAll("%3F", "?");
    } catch (ScriptException ex) {
        Logger.getLogger(TextHelper.class.getName())
            .log(Level.SEVERE, null, ex);
        return null;
    }
}

答案 2

使用 java java.net.URLEncoder#encode()

String page = "space test";
String ecodedURL = "http://localhost:9000/" + URLEncoder.encode(page, "UTF-8");

注意:对整个 URL 进行编码会导致意外情况,例如在 !http://http%3A%2F%2F

编辑:要防止对URL进行两次编码,您可以检查URL是否包含,因为它仅对编码有效。但是,如果用户错误地搞砸了编码(例如,仅对URL进行部分编码或在URL中使用a而不用于编码某些内容),那么使用此方法就没什么可做的了......%%