URLConnection不允许我访问Http错误(404,500等)的数据
2022-09-01 21:48:53
我正在制作一个爬虫,并且需要从流中获取数据,无论它是否是200。CURL正在这样做,以及任何标准浏览器。
以下内容实际上不会获取请求的内容,即使存在一些请求,也会引发一个带有 http 错误状态代码的异常。无论如何,我想要输出,有没有办法?我更喜欢使用此库,因为它实际上会执行持久连接,这非常适合我正在执行的爬网类型。
package test;
import java.net.*;
import java.io.*;
public class Test {
public static void main(String[] args) {
try {
URL url = new URL("http://github.com/XXXXXXXXXXXXXX");
URLConnection connection = url.openConnection();
DataInputStream inStream = new DataInputStream(connection.getInputStream());
String inputLine;
while ((inputLine = inStream.readLine()) != null) {
System.out.println(inputLine);
}
inStream.close();
} catch (MalformedURLException me) {
System.err.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.err.println("IOException: " + ioe);
}
}
}
工作,谢谢:这是我想到的 - 只是作为一个粗略的概念证明:
import java.net.*;
import java.io.*;
public class Test {
public static void main(String[] args) {
//InputStream error = ((HttpURLConnection) connection).getErrorStream();
URL url = null;
URLConnection connection = null;
String inputLine = "";
try {
url = new URL("http://verelo.com/asdfrwdfgdg");
connection = url.openConnection();
DataInputStream inStream = new DataInputStream(connection.getInputStream());
while ((inputLine = inStream.readLine()) != null) {
System.out.println(inputLine);
}
inStream.close();
} catch (MalformedURLException me) {
System.err.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.err.println("IOException: " + ioe);
InputStream error = ((HttpURLConnection) connection).getErrorStream();
try {
int data = error.read();
while (data != -1) {
//do something with data...
//System.out.println(data);
inputLine = inputLine + (char)data;
data = error.read();
//inputLine = inputLine + (char)data;
}
error.close();
} catch (Exception ex) {
try {
if (error != null) {
error.close();
}
} catch (Exception e) {
}
}
}
System.out.println(inputLine);
}
}