JAVA爬虫教程-Java板块造梦空间论坛-技术交流-造梦空间论坛

JAVA爬虫教程

一、Selenium简介

Selenium是一个用于Web应用程序自动化测试工具。Selenium测试直接运行在浏览器中,就像真正的用户在
操作一样。支持的浏览器包括IE(7, 8, 9, 10, 11),Mozilla Firefox,Safari,Google Chrome,Opera等。
适用于自动化测试,js动态爬虫(破解反爬虫)等领域。

Selenium的核心Selenium Core基于JsUnit,
完全由JavaScript编写,因此可以用于任何支持JavaScript的浏览器上。
selenium可以模拟真实浏览器,自动化测试工具,支持多种浏览器。

二、Selenium组成

Selenium IDE:嵌入到Firefox浏览器中的一个插件,实现简单的浏览器操作录制与回放功能,主要用于快速创建BUG及重现脚本,可转化为多种语言
Selenium RC: 核心组件,支持多种不同语言编写自动化测试脚本,通过其服务器作为代理服务器去访问应用,达到测试的目的
Selenium WebDriver(重点):一个浏览器自动化框架,它接受命令并将它们发送到浏览器。它是通过特定于浏览器的驱动程序实现的。它直接与浏览器通信并对其进行控制。Selenium WebDriver支持各种编程语言,如Java、C# 、PHP、Python、Perl、Ruby
Selenium grid:测试辅助工具,用于做分布式测试,可以并行执行多个测试任务,提升测试效率。

三、Selenium特点

开源、免费
多浏览器支持:FireFox、Chrome、IE、Opera、Edge;
多平台支持:Linux、Windows、MAC;
多语言支持:Java、Python、Ruby、C#、JavaScript、C++;
对Web页面有良好的支持;
简单(API 简单)、灵活(用开发语言驱动);
支持分布式测试用例执行。

四、案例演示

1.下载驱动包
谷歌下载地址:http://chromedriver.storage.googleapis.com/index.htm

如果在下载地址中找不到与自己浏览器完全匹配的版本,可以用相近的版本

2.创建项目并导入依赖

<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.141.59</version>
</dependency>
<dependency>
   <groupId>org.seleniumhq.selenium</groupId>
   <artifactId>selenium-java</artifactId>
   <version>3.141.59</version>
</dependency>
<dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>3.141.59</version> </dependency>

3.基础配置

//设置驱动
System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
//创建驱动
WebDriver driver=new ChromeDriver();
//与将要爬取的网站建立连接
driver.get("https://www.baidu.com/");
//设置驱动
        System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
        //创建驱动
        WebDriver driver=new ChromeDriver();
        //与将要爬取的网站建立连接
        driver.get("https://www.baidu.com/");
//设置驱动 System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe"); //创建驱动 WebDriver driver=new ChromeDriver(); //与将要爬取的网站建立连接 driver.get("https://www.baidu.com/");

4.案例演示

1.元素选择方式

class:

List<WebElement> elements = driver.findElements(By.className("hotsearch-item"));
for (WebElement element : elements) {
System.out.println(element.getText());
}
List<WebElement> elements = driver.findElements(By.className("hotsearch-item"));
        for (WebElement element : elements) {
            System.out.println(element.getText());
        }
List<WebElement> elements = driver.findElements(By.className("hotsearch-item")); for (WebElement element : elements) { System.out.println(element.getText()); }

ID:

WebElement kw = driver.findElement(By.id("kw"));
System.out.println(kw.getAttribute("name"));
WebElement kw = driver.findElement(By.id("kw"));
        System.out.println(kw.getAttribute("name"));
WebElement kw = driver.findElement(By.id("kw")); System.out.println(kw.getAttribute("name"));

Name:

WebElement tn = driver.findElement(By.name("tn"));
System.out.println(tn.getAttribute("value"));
WebElement tn = driver.findElement(By.name("tn"));
        System.out.println(tn.getAttribute("value"));
WebElement tn = driver.findElement(By.name("tn")); System.out.println(tn.getAttribute("value"));

tag:

List<WebElement> input = driver.findElements(By.tagName("input"));
for(WebElement webElement:input){
String value = webElement.getAttribute("value");
System.out.println(value);
}
List<WebElement> input = driver.findElements(By.tagName("input"));
        for(WebElement webElement:input){
            String value = webElement.getAttribute("value");
            System.out.println(value);
        }
List<WebElement> input = driver.findElements(By.tagName("input")); for(WebElement webElement:input){ String value = webElement.getAttribute("value"); System.out.println(value); }

link:

List<WebElement> elements = driver.findElements(By.linkText("地图"));
for (WebElement element : elements){
System.out.println(element.getText());
}
List<WebElement> elements = driver.findElements(By.linkText("地图"));
        for (WebElement element : elements){
            System.out.println(element.getText());
        }
List<WebElement> elements = driver.findElements(By.linkText("地图")); for (WebElement element : elements){ System.out.println(element.getText()); }

Partial link选择(a标签文本内容模糊匹配):

List<WebElement> elements = driver.findElements(By.partialLinkText("中国"));
for (WebElement element : elements){
System.out.println(element.getText());
}
List<WebElement> elements = driver.findElements(By.partialLinkText("中国"));
        for (WebElement element : elements){
            System.out.println(element.getText());
        }
List<WebElement> elements = driver.findElements(By.partialLinkText("中国")); for (WebElement element : elements){ System.out.println(element.getText()); }

css选择器:

List<WebElement> elements = driver.findElements(By.cssSelector("#hotsearch-content-wrapper > li:nth-child(even)"));
for (WebElement element : elements){
System.out.println(element.getText());
List<WebElement> elements = driver.findElements(By.cssSelector("#hotsearch-content-wrapper > li:nth-child(even)"));
        for (WebElement element : elements){
            System.out.println(element.getText());
List<WebElement> elements = driver.findElements(By.cssSelector("#hotsearch-content-wrapper > li:nth-child(even)")); for (WebElement element : elements){ System.out.println(element.getText());

xpath选择:

WebElement element = driver.findElement(By.xpath("//*[@id=\"kw\"]"));
System.out.println(element.getAttribute("class"));
WebElement element = driver.findElement(By.xpath("//*[@id=\"kw\"]"));
        System.out.println(element.getAttribute("class"));
WebElement element = driver.findElement(By.xpath("//*[@id=\"kw\"]")); System.out.println(element.getAttribute("class"));

2.在文本框中输入内容

WebElement kw = driver.findElement(By.id("kw"));
kw.sendKeys("java");
WebElement button=driver.findElement(By.id("su"));
button.click();
WebElement kw = driver.findElement(By.id("kw"));
        kw.sendKeys("java");
        WebElement button=driver.findElement(By.id("su"));
        button.click();
WebElement kw = driver.findElement(By.id("kw")); kw.sendKeys("java"); WebElement button=driver.findElement(By.id("su")); button.click();

3.获取单个元素

WebElement element = driver.findElement(By.name("ie"));
System.out.println(element.getAttribute("value"));
WebElement element = driver.findElement(By.name("ie"));
        System.out.println(element.getAttribute("value"));
WebElement element = driver.findElement(By.name("ie")); System.out.println(element.getAttribute("value"));

4.获取多个元素

List<WebElement> elements = driver.findElements(By.partialLinkText("大"));
for (WebElement element : elements){
System.out.println(element.getText());
}
List<WebElement> elements = driver.findElements(By.partialLinkText("大"));
        for (WebElement element : elements){
            System.out.println(element.getText());
        }
List<WebElement> elements = driver.findElements(By.partialLinkText("大")); for (WebElement element : elements){ System.out.println(element.getText()); }

5.爬取京东商品信息

1.初始化设置

public static void main(String[] args) {
//将驱动加载到Java的JVM虚拟机中
System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
//初始化驱动
WebDriver driver = new ChromeDriver();
//设置爬取网站
driver.get("https://www.jd.com/");
}
public static void main(String[] args) {
        //将驱动加载到Java的JVM虚拟机中
        System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
        //初始化驱动
        WebDriver driver = new ChromeDriver();
        //设置爬取网站
        driver.get("https://www.jd.com/");
    }
public static void main(String[] args) { //将驱动加载到Java的JVM虚拟机中 System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe"); //初始化驱动 WebDriver driver = new ChromeDriver(); //设置爬取网站 driver.get("https://www.jd.com/"); }

2.获取京东网站首页查询按钮并完成点击事件(进入页面自动查询)

//获取京东网站首页查询条件输入框
WebElement key = driver.findElement(By.id("key"));
key.sendKeys("人妻");
//获取京东网站首页查询按钮并完成点击事件
WebElement button = driver.findElement(By.cssSelector("button.button"));
button.click();
//获取京东网站首页查询条件输入框
WebElement key = driver.findElement(By.id("key"));
key.sendKeys("人妻");
//获取京东网站首页查询按钮并完成点击事件
WebElement button = driver.findElement(By.cssSelector("button.button"));
button.click();
//获取京东网站首页查询条件输入框 WebElement key = driver.findElement(By.id("key")); key.sendKeys("人妻"); //获取京东网站首页查询按钮并完成点击事件 WebElement button = driver.findElement(By.cssSelector("button.button")); button.click();

3.设置滚动条移动到最下面

//滚动前先睡眠一会
sleep(3);
//设置滚动条移动到最下面
((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");
sleep(2);
//滚动前先睡眠一会
        sleep(3);
        //设置滚动条移动到最下面
        ((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");
        sleep(2);
//滚动前先睡眠一会 sleep(3); //设置滚动条移动到最下面 ((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)"); sleep(2);

4.获取商品

*[@id="J_goodsList"]/ul/li[3]
//获取查询页面中的所有商品
List<WebElement> elements = driver.findElements(By.xpath("//*[@id=\"J_goodsList\"]/ul/li"));
for (WebElement element : elements) {
String price = element.findElement(By.className("p-price")).getText();
String name = element.findElement(By.className("p-name")).getText();
System.out.println("【"+price+"】-"+name);
}
*[@id="J_goodsList"]/ul/li[3]
        //获取查询页面中的所有商品
        List<WebElement> elements = driver.findElements(By.xpath("//*[@id=\"J_goodsList\"]/ul/li"));
        for (WebElement element : elements) {
            String price = element.findElement(By.className("p-price")).getText();
            String name = element.findElement(By.className("p-name")).getText();
            System.out.println("【"+price+"】-"+name);
        }
*[@id="J_goodsList"]/ul/li[3] //获取查询页面中的所有商品 List<WebElement> elements = driver.findElements(By.xpath("//*[@id=\"J_goodsList\"]/ul/li")); for (WebElement element : elements) { String price = element.findElement(By.className("p-price")).getText(); String name = element.findElement(By.className("p-name")).getText(); System.out.println("【"+price+"】-"+name); }

5.完整代码

package com.zhq.selenium;
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import java.util.List;
public class Demo2 {
public static void main(String[] args) {
//将驱动加载到Java的JVM虚拟机中
System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
/************************** 方式一:不打开浏览器 **************************/
//定义浏览器参数
//ChromeOptions chromeOptions = new ChromeOptions();
//设置不打开浏览器
//chromeOptions.addArguments("--headless");
//初始化驱动
//WebDriver driver = new ChromeDriver(chromeOptions);
/************************** 方式二:打开浏览器 **************************/
//初始化驱动
WebDriver driver = new ChromeDriver();
//设置爬取网站
driver.get("https://www.jd.com/");
//获取京东网站首页查询条件输入框
WebElement key = driver.findElement(By.id("key"));
key.sendKeys("人妻");
//获取京东网站首页查询按钮并完成点击事件
WebElement button = driver.findElement(By.cssSelector("button.button"));
button.click();
//滚动前先睡眠一会
sleep(3);
//设置滚动条移动到最下面
((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");
sleep(2);
*[@id="J_goodsList"]/ul/li[3]
//获取查询页面中的所有商品
List<WebElement> elements = driver.findElements(By.xpath("//*[@id=\"J_goodsList\"]/ul/li"));
for (WebElement element : elements) {
String price = element.findElement(By.className("p-price")).getText();
String name = element.findElement(By.className("p-name")).getText();
System.out.println("【"+price+"】-"+name);
}
}
public static void sleep(int num){
try{
Thread.sleep(num * 1000L);
}catch (InterruptedException e){
e.printStackTrace();
}
}
}
package com.zhq.selenium;
 
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
 
import java.util.List;
 
public class Demo2 {
    public static void main(String[] args) {
        //将驱动加载到Java的JVM虚拟机中
        System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
        /************************** 方式一:不打开浏览器 **************************/
        //定义浏览器参数
        //ChromeOptions chromeOptions = new ChromeOptions();
        //设置不打开浏览器
        //chromeOptions.addArguments("--headless");
        //初始化驱动
        //WebDriver driver = new ChromeDriver(chromeOptions);
 
        /************************** 方式二:打开浏览器 **************************/
        //初始化驱动
        WebDriver driver = new ChromeDriver();
        //设置爬取网站
        driver.get("https://www.jd.com/");
        //获取京东网站首页查询条件输入框
        WebElement key = driver.findElement(By.id("key"));
        key.sendKeys("人妻");
        //获取京东网站首页查询按钮并完成点击事件
        WebElement button = driver.findElement(By.cssSelector("button.button"));
        button.click();
        //滚动前先睡眠一会
        sleep(3);
        //设置滚动条移动到最下面
        ((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");
        sleep(2);
 
        *[@id="J_goodsList"]/ul/li[3]
        //获取查询页面中的所有商品
        List<WebElement> elements = driver.findElements(By.xpath("//*[@id=\"J_goodsList\"]/ul/li"));
        for (WebElement element : elements) {
            String price = element.findElement(By.className("p-price")).getText();
            String name = element.findElement(By.className("p-name")).getText();
            System.out.println("【"+price+"】-"+name);
        }
    }
    public static void sleep(int num){
        try{
            Thread.sleep(num * 1000L);
        }catch (InterruptedException e){
            e.printStackTrace();
        }
    }
}
package com.zhq.selenium; import org.openqa.selenium.By; import org.openqa.selenium.JavascriptExecutor; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.chrome.ChromeOptions; import java.util.List; public class Demo2 { public static void main(String[] args) { //将驱动加载到Java的JVM虚拟机中 System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe"); /************************** 方式一:不打开浏览器 **************************/ //定义浏览器参数 //ChromeOptions chromeOptions = new ChromeOptions(); //设置不打开浏览器 //chromeOptions.addArguments("--headless"); //初始化驱动 //WebDriver driver = new ChromeDriver(chromeOptions); /************************** 方式二:打开浏览器 **************************/ //初始化驱动 WebDriver driver = new ChromeDriver(); //设置爬取网站 driver.get("https://www.jd.com/"); //获取京东网站首页查询条件输入框 WebElement key = driver.findElement(By.id("key")); key.sendKeys("人妻"); //获取京东网站首页查询按钮并完成点击事件 WebElement button = driver.findElement(By.cssSelector("button.button")); button.click(); //滚动前先睡眠一会 sleep(3); //设置滚动条移动到最下面 ((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)"); sleep(2); *[@id="J_goodsList"]/ul/li[3] //获取查询页面中的所有商品 List<WebElement> elements = driver.findElements(By.xpath("//*[@id=\"J_goodsList\"]/ul/li")); for (WebElement element : elements) { String price = element.findElement(By.className("p-price")).getText(); String name = element.findElement(By.className("p-name")).getText(); System.out.println("【"+price+"】-"+name); } } public static void sleep(int num){ try{ Thread.sleep(num * 1000L); }catch (InterruptedException e){ e.printStackTrace(); } } }

6.爬取图片

某些网站会有反爬虫技术可能访问不到 

package com.zhq.selenium;
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import sun.net.www.protocol.http.HttpURLConnection;
import java.io.*;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
public class Demo3 {
//定义Driver驱动
public static WebDriver driver=null;
//定义List集合,用于存储爬取数据中的图片路径
public static List<String> imgs=new ArrayList<>();
static{
//将驱动加载到Java的JVM虚拟机中
System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
//初始化驱动
driver = new ChromeDriver();
}
public static void sleep(int num){
try {
Thread.sleep(num*1000L);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public static void getImgs(){
//设置爬取网站
driver.get("http://www.gaoimg.com/");
sleep(3);
//设置滚动条移动到最下面
((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");
sleep(2);
//定义爬取的节点
///html/body/div[8]/ul/li[2]/a/img
*[@id="inspiration__content-item-3"]/div/a/div[3]/img
*[@id="inspiration__content-item-0"]/div[2]/a[1]/div[3]/img
//#inspiration__content-item-0 > div.inspiration__content-item-list > a > div.s-c__ct > img
List<WebElement> elements = driver.findElements(By.cssSelector("body > div.tuijiantupian > div.flex-images > div > a > img"));
//循环遍历所有img元素节点
for (WebElement element : elements) {
String src = element.getAttribute("src");
if(null!=src)
imgs.add(src);
}
}
public static void saveImg(){
try{
String path="D:\\images\\";
URL url=null;
for (String img : imgs) {
url=new URL(img);
InputStream is=new DataInputStream(url.openStream());
String fileName=path+ UUID.randomUUID().toString().replace("-","")+".jpg";
OutputStream out=new FileOutputStream(new File(fileName));
byte[] bytes=new byte[1024];
int len=0;
while((len=is.read(bytes))!=-1){
out.write(bytes,0,len);
}
is.close();
out.close();
}
}catch (Exception e){
e.printStackTrace();
}
}
public static void main(String[] args) {
try {
//爬取图片路径
getImgs();
//循环打印图片路径
for (String img : imgs) {
System.out.println(img);
}
//保存图片
saveImg();
} catch (Exception e) {
e.printStackTrace();
} finally {
//一定要记得下载完图片之后释放资源
if(null!=driver)
driver.quit();
}
}
}
package com.zhq.selenium;
 
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import sun.net.www.protocol.http.HttpURLConnection;
 
import java.io.*;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
 
public class Demo3 {
 
    //定义Driver驱动
    public static WebDriver driver=null;
 
    //定义List集合,用于存储爬取数据中的图片路径
    public static List<String> imgs=new ArrayList<>();
 
    static{
        //将驱动加载到Java的JVM虚拟机中
        System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe");
        //初始化驱动
        driver = new ChromeDriver();
    }
 
    public static void sleep(int num){
        try {
            Thread.sleep(num*1000L);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
 
    public static void getImgs(){
        //设置爬取网站
        driver.get("http://www.gaoimg.com/");
        sleep(3);
        //设置滚动条移动到最下面
        ((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");
        sleep(2);
 
        //定义爬取的节点
        ///html/body/div[8]/ul/li[2]/a/img
        *[@id="inspiration__content-item-3"]/div/a/div[3]/img
        *[@id="inspiration__content-item-0"]/div[2]/a[1]/div[3]/img
        //#inspiration__content-item-0 > div.inspiration__content-item-list > a > div.s-c__ct > img
 
        List<WebElement> elements = driver.findElements(By.cssSelector("body > div.tuijiantupian > div.flex-images > div > a > img"));
        //循环遍历所有img元素节点
        for (WebElement element : elements) {
            String src = element.getAttribute("src");
            if(null!=src)
                imgs.add(src);
        }
    }
 
    public static void saveImg(){
        try{
            String path="D:\\images\\";
            URL url=null;
            for (String img : imgs) {
                url=new URL(img);
                InputStream is=new DataInputStream(url.openStream());
                String fileName=path+ UUID.randomUUID().toString().replace("-","")+".jpg";
                OutputStream out=new FileOutputStream(new File(fileName));
                byte[] bytes=new byte[1024];
                int len=0;
                while((len=is.read(bytes))!=-1){
                    out.write(bytes,0,len);
                }
                is.close();
                out.close();
            }
        }catch (Exception e){
            e.printStackTrace();
        }
    }
 
    public static void main(String[] args) {
        try {
            //爬取图片路径
            getImgs();
            //循环打印图片路径
            for (String img : imgs) {
                System.out.println(img);
            }
            //保存图片
            saveImg();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            //一定要记得下载完图片之后释放资源
            if(null!=driver)
                driver.quit();
        }
    }
}
package com.zhq.selenium; import org.openqa.selenium.By; import org.openqa.selenium.JavascriptExecutor; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; import sun.net.www.protocol.http.HttpURLConnection; import java.io.*; import java.net.URL; import java.net.URLConnection; import java.util.ArrayList; import java.util.List; import java.util.UUID; public class Demo3 { //定义Driver驱动 public static WebDriver driver=null; //定义List集合,用于存储爬取数据中的图片路径 public static List<String> imgs=new ArrayList<>(); static{ //将驱动加载到Java的JVM虚拟机中 System.setProperty("webdriver.chrome.driver","C:\\Users\\Administrator\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe"); //初始化驱动 driver = new ChromeDriver(); } public static void sleep(int num){ try { Thread.sleep(num*1000L); } catch (InterruptedException e) { e.printStackTrace(); } } public static void getImgs(){ //设置爬取网站 driver.get("http://www.gaoimg.com/"); sleep(3); //设置滚动条移动到最下面 ((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)"); sleep(2); //定义爬取的节点 ///html/body/div[8]/ul/li[2]/a/img *[@id="inspiration__content-item-3"]/div/a/div[3]/img *[@id="inspiration__content-item-0"]/div[2]/a[1]/div[3]/img //#inspiration__content-item-0 > div.inspiration__content-item-list > a > div.s-c__ct > img List<WebElement> elements = driver.findElements(By.cssSelector("body > div.tuijiantupian > div.flex-images > div > a > img")); //循环遍历所有img元素节点 for (WebElement element : elements) { String src = element.getAttribute("src"); if(null!=src) imgs.add(src); } } public static void saveImg(){ try{ String path="D:\\images\\"; URL url=null; for (String img : imgs) { url=new URL(img); InputStream is=new DataInputStream(url.openStream()); String fileName=path+ UUID.randomUUID().toString().replace("-","")+".jpg"; OutputStream out=new FileOutputStream(new File(fileName)); byte[] bytes=new byte[1024]; int len=0; while((len=is.read(bytes))!=-1){ out.write(bytes,0,len); } is.close(); out.close(); } }catch (Exception e){ e.printStackTrace(); } } public static void main(String[] args) { try { //爬取图片路径 getImgs(); //循环打印图片路径 for (String img : imgs) { System.out.println(img); } //保存图片 saveImg(); } catch (Exception e) { e.printStackTrace(); } finally { //一定要记得下载完图片之后释放资源 if(null!=driver) driver.quit(); } } }

博主也不太擅长教学发帖子,有错误乃是常事...(斯~,看的人估计少的可怜( ´͈ ⌵ `͈ )σண♡

请登录后发表评论

© 造梦空间论坛
❤富强❤