java爬虫获取网页源码2种方式(纯净版)

java爬虫获取网页源码2种方式(纯净版)第一种:URLpackage InternetTest;import java.io.ByteArrayOutputStream;import

大家好,欢迎来到IT知识分享网。java爬虫获取网页源码2种方式(纯净版)"

第一种:URL

package InternetTest;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
public class a44 {
    public static void main(String[] args) throws Exception {
        URL url = new URL("http://www.baidu.com");
        HttpURLConnection conn = (HttpURLConnection)url.openConnection();
        conn.setRequestMethod("GET");
        conn.setConnectTimeout(5 * 1024);
        InputStream inStream =  conn.getInputStream();
        ByteArrayOutputStream outStream = new ByteArrayOutputStream();
        byte[] buffer = new byte[1024];
        int len = 0;
        while ((len = inStream.read(buffer)) != -1) {
            outStream.write(buffer, 0, len);
        }
        inStream.close();
        byte[] data =outStream.toByteArray();
        String htmlSource = new String(data);
        System.out.println(htmlSource);
    }
}

第二种:HttpClient

package InternetTest;
import org.apache.http.HttpEntity;
import org.apache.http.HttpStatus;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.utils.HttpClientUtils;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class a45 {
    public static void main(String[] args) throws Exception{
        String url1 = "http://www.baidu.com";
        CloseableHttpClient closeableHttpClient = HttpClients.createDefault();
        CloseableHttpResponse closeableHttpResponse = null;
        HttpGet request = new HttpGet(url1);
        closeableHttpResponse = closeableHttpClient.execute(request);
        if(closeableHttpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {
            HttpEntity httpEntity = closeableHttpResponse.getEntity();
            String html = EntityUtils.toString(httpEntity, "utf-8");
            System.out.println(html);
        } else {
            System.out.println(EntityUtils.toString(closeableHttpResponse.getEntity(), "utf-8"));
        }
        HttpClientUtils.closeQuietly(closeableHttpResponse);
        HttpClientUtils.closeQuietly(closeableHttpClient);
    }
}

免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/44181.html

(0)
上一篇 2024-04-20 19:15
下一篇 2024-04-21 07:45

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信