Python使用Chrome插件实现爬虫过程图解

做电商时，消费者对商品的评论是很重要的，但是不会写代码怎么办？这里有个Chrome插件可以做到简单的数据爬取，一句代码都不用写。下面给大家展示部分抓取后的数据：

可以看到，抓取的地址，评论人，评论内容，时间，产品颜色都

本文来源gao!%daima.com搞$代*!码$网9

已经抓取下来了。那么，爬取这些数据需要哪些工具呢？就两个：

1. Chrome浏览器；

2. 插件：Web Scraper

插件下载地址：https://chromecj.com/productivity/2018-05/942.html

最后，如果你想自己动手抓取一下，这里是这次抓取的详细过程：

1. 首先，复制如下的代码，对，你不需要写代码，但是为了便于上手，复制代码还是需要的，后续可以自己定制和选择，不需要写代码。

{
  "_id": "jdreview",
  "startUrl": [
    "https://item.jd.com/100000680365.html#comment"
  ],
  "selectors": [
    {
      "id": "user",
      "type": "SelectorText",
      "selector": "div.user-info",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "comments",
      "type": "SelectorText",
      "selector": "div.comment-column > p.comment-con",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "time",
      "type": "SelectorText",
      "selector": "div.comment-message:nth-of-type(5) span:nth-of-type(4), div.order-info span:nth-of-type(4)",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": "0"
    },
    {
      "id": "color",
      "type": "SelectorText",
      "selector": "div.order-info span:nth-of-type(1)",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "main",
      "type": "SelectorElementClick",
      "selector": "div.comment-item",
      "parentSelectors": [
        "_root"
      ],
      "multiple": true,
      "delay": "10000",
      "clickElementSelector": "div.com-table-footer a.ui-pager-next",
      "clickType": "clickMore",
      "discardInitialElements": false,
      "clickElementUniquenessType": "uniqueHTMLText"
    }
  ]
}

2. 然后打开chrome浏览器，在任意页面同时按下Ctrl+Shift+i，在弹出的窗口中找到Web Scraper，如下：

3. 如下

4. 如图，粘贴上述的代码：

5. 如图，如果需要定制网址，注意替代一下，网址后面的#comment是直达评论的链接，不能去掉：

6. 如图：

7. 如图：

8. 如图，点击Scrape后，会自动运行打开需要抓取得页面，不要关闭窗口，静静等待完成，完成后右下方会提示完成，一般1000条以内的评论不会有问题：

搞代码网（gaodaima.com）提供的所有资源部分来自互联网，如果有侵犯您的版权或其他权益，请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected]‍，我们会在看到邮件的第一时间内为您处理，或直接联系QQ：872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接：Python使用Chrome插件实现爬虫过程图解

Hi，您需要填写昵称和邮箱！