• 欢迎访问搞代码网站,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站!
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏搞代码吧

Python爬虫练习:爬取网站动漫图片

python 搞java代码 3年前 (2022-05-21) 28次浏览 已收录 0个评论

前言

有一段没用 python 了,我也不知道自己为什么对 python 越来越淡,可能自己还是比较喜欢 android ,毕竟自己第一次接触编程就是 android,为了android学java,然后接触的python,这次也是因为android,我要用一次python来帮我爬数据

正文

目标网站 https://divnil.com

首先看看这网站是怎样加载数据的;打开网站后发现底部有下一页的按钮,ok,爬这个网站就很简单了;

我们目标是获取每张图片的高清的源地址,并且下载图片到桌面;先随便打开一张图片看看详细;emmm,只有一张图

看起来还挺清晰的,单击新窗口打开图片

 

然后下载图片,说实话,这图片很小,我很担心不是高清原图(管他的);

 

PS:一定要禁用广告拦截插件,不然加载不出图,我就在这被坑T_T;

接着分析我们从何入手

1、先去主页面获取每个图片的详细页面的链接

这链接还是比较好获取的,直接 F12 审核元素,或者右键查看代码,手机上chrome和firefox在url前面加上 “view-source”

比如: view-source:https://www.baidu.com/

 

2、从详细页面获取图片大图地址
随便打开一个图片详细页面如图:

 

接着按 F12 审核元素,我们需要定位该图片的链接,首先单击左上角的这玩意儿,像一个鼠标的图标:

 

接着只需要单击网页上的图片就能定位到代码了:

 

3、用大图地址下载该图片

这个很简单,看代码

先安装 Requests 和 BeautifulSoup 库

pip install requests bs4

www#gaodaima.com来源gao@daima#com搞(%代@#码网搞代码

 

导入库

<span style="color: #0000ff">import</span> requestsfrom bs4 <span style="color: #0000ff">import</span> BeautifulSoupimport sys

 

请求获取网页源代码

url = <span style="color: #800000">"</span><span style="color: #800000">https://divnil.com/wallpaper/iphone8/%E3%82%A2%E3%83%8B%E3%83%A1%E3%81%AE%E5%A3%81%E7%B4%99_2.html</span><span style="color: #800000">"</span><span style="color: #000000">
headers </span>=<span style="color: #000000"> {
</span><span style="color: #800000">"</span><span style="color: #800000">User-Agent</span><span style="color: #800000">"</span>: <span style="color: #800000">"</span><span style="color: #800000">Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0</span><span style="color: #800000">"</span><span style="color: #000000">,
}
resp </span>= requests.get(url, headers=<span style="color: #000000">headers)
</span><span style="color: #0000ff">if</span> resp.status_code !=<span style="color: #000000"> requests.codes.OK:
</span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">Request Error, Code: %d</span><span style="color: #800000">"</span>%<span style="color: #000000"> resp.status_code)
sys.exit()</span>

 

然后解析出所有图片的详细地址

soup = BeautifulSoup(resp.text, <span style="color: #800000">"</span><span style="color: #800000">html.parser</span><span style="color: #800000">"</span><span style="color: #000000">)
contents </span>= soup.findAll(<span style="color: #800000">"</span><span style="color: #800000">div</span><span style="color: #800000">"</span>, id=<span style="color: #800000">"</span><span style="color: #800000">contents</span><span style="color: #800000">"</span><span style="color: #000000">)[0]
wallpapers </span>= contents.findAll(<span style="color: #800000">"</span><span style="color: #800000">a</span><span style="color: #800000">"</span>, rel=<span style="color: #800000">"</span><span style="color: #800000">wallpaper</span><span style="color: #800000">"</span><span style="color: #000000">)
links </span>=<span style="color: #000000"> []
</span><span style="color: #0000ff">for</span> wallpaper <span style="color: #0000ff">in</span><span style="color: #000000"> wallpapers:
 links.append(wallpaper[</span><span style="color: #800000">"</span><span style="color: #800000">href</span><span style="color: #800000">"</span>])

 

接着在详细网页里获取那个看似高清的图片的不确定是否为真实图片链接并下载(/滑稽)

<span style="color: #0000ff">import</span><span style="color: #000000"> os

head </span>= <span style="color: #800000">"</span><span style="color: #800000">https://divnil.com/wallpaper/iphone8/</span><span style="color: #800000">"</span>
<span style="color: #0000ff">if</span> os.path.exists(<span style="color: #800000">"</span><span style="color: #800000">./Divnil</span><span style="color: #800000">"</span>) !=<span style="color: #000000"> True:
 os.mkdir(</span><span style="color: #800000">"</span><span style="color: #800000">./Divnil</span><span style="color: #800000">"</span><span style="color: #000000">)

</span><span style="color: #0000ff">for</span> url <span style="color: #0000ff">in</span><span style="color: #000000"> links:
 url </span>= head +<span style="color: #000000"> url
 resp </span>= requests.get(url, headers=<span style="color: #000000">headers)
 </span><span style="color: #0000ff">if</span>  resp.status_code !=<span style="color: #000000"> requests.codes.OK:
   </span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">URL: %s REQUESTS ERROR. CODE: %d</span><span style="color: #800000">"</span> %<span style="color: #000000"> (url, resp.status_code))
   </span><span style="color: #0000ff">continue</span><span style="color: #000000">
 soup </span>= BeautifulSoup(resp.text, <span style="color: #800000">"</span><span style="color: #800000">html.parser</span><span style="color: #800000">"</span><span style="color: #000000">)
 img </span>=  soup.find(<span style="color: #800000">"</span><span style="color: #800000">div</span><span style="color: #800000">"</span>, id=<span style="color: #800000">"</span><span style="color: #800000">contents</span><span style="color: #800000">"</span>).contents.find(<span style="color: #800000">"</span><span style="color: #800000">img</span><span style="color: #800000">"</span>, id=<span style="color: #800000">"</span><span style="color: #800000">main_content</span><span style="color: #800000">"</span><span style="color: #000000">)
 img_url </span>= head + img[<span style="color: #800000">"</span><span style="color: #800000">"original</span><span style="color: #800000">"</span>].replace(<span style="color: #800000">"</span><span style="color: #800000">../</span><span style="color: #800000">"</span>, <span style="color: #800000">""</span><span style="color: #000000">)
 img_name </span>= img[<span style="color: #800000">"</span><span style="color: #800000">alt</span><span style="color: #800000">"</span><span style="color: #000000">]
 </span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">start download %s ...</span><span style="color: #800000">"</span> %<span style="color: #000000"> img_url)

 resp </span>= requests.get(img_url, headers=<span style="color: #000000">headers)
 </span><span style="color: #0000ff">if</span> resp.status_code !=<span style="color: #000000"> requests.codes.OK:
   </span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">IMAGE %s DOWNLOAD FAILED.</span><span style="color: #800000">"</span> %<span style="color: #000000"> img_name)

 with open(</span><span style="color: #800000">"</span><span style="color: #800000">./Divnil/</span><span style="color: #800000">"</span> + img_name + <span style="color: #800000">"</span><span style="color: #800000">.jpg</span><span style="color: #800000">"</span>, <span style="color: #800000">"</span><span style="color: #800000">wb</span><span style="color: #800000">"</span><span style="color: #000000">) as f:
   f.write(resp.content)</span>

 

完成,贴上所有代码

<span style="color: #0000ff">import</span><span style="color: #000000"> requests
</span><span style="color: #0000ff">from</span> bs4 <span style="color: #0000ff">import</span><span style="color: #000000"> BeautifulSoup
</span><span style="color: #0000ff">import</span><span style="color: #000000"> sys
</span><span style="color: #0000ff">import</span><span style="color: #000000"> os


</span><span style="color: #0000ff">class</span><span style="color: #000000"> Divnil:

   </span><span style="color: #0000ff">def</span> <span style="color: #800080">__init__</span><span style="color: #000000">(self):
       self.url </span>= <span style="color: #800000">"</span><span style="color: #800000">https://divnil.com/wallpaper/iphone8/%E3%82%A2%E3%83%8B%E3%83%A1%E3%81%AE%E5%A3%81%E7%B4%99.html</span><span style="color: #800000">"</span><span style="color: #000000">
       self.head </span>= <span style="color: #800000">"</span><span style="color: #800000">https://divnil.com/wallpaper/iphone8/</span><span style="color: #800000">"</span><span style="color: #000000">
       self.headers </span>=<span style="color: #000000"> {
           </span><span style="color: #800000">"</span><span style="color: #800000">User-Agent</span><span style="color: #800000">"</span>: <span style="color: #800000">"</span><span style="color: #800000">Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0</span><span style="color: #800000">"</span><span style="color: #000000">,
       }
   

   </span><span style="color: #0000ff">def</span><span style="color: #000000"> getImageInfoUrl(self):

       resp </span>= requests.get(self.url, headers=<span style="color: #000000">self.headers)
       </span><span style="color: #0000ff">if</span> resp.status_code !=<span style="color: #000000"> requests.codes.OK:
           </span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">Request Error, Code: %d</span><span style="color: #800000">"</span>%<span style="color: #000000"> resp.status_code)
           sys.exit()

       soup </span>= BeautifulSoup(resp.text, <span style="color: #800000">"</span><span style="color: #800000">html.parser</span><span style="color: #800000">"</span><span style="color: #000000">)
       
       contents </span>= soup.find(<span style="color: #800000">"</span><span style="color: #800000">div</span><span style="color: #800000">"</span>, id=<span style="color: #800000">"</span><span style="color: #800000">contents</span><span style="color: #800000">"</span><span style="color: #000000">)
       wallpapers </span>= contents.findAll(<span style="color: #800000">"</span><span style="color: #800000">a</span><span style="color: #800000">"</span>, rel=<span style="color: #800000">"</span><span style="color: #800000">wallpaper</span><span style="color: #800000">"</span><span style="color: #000000">)
       
       self.links </span>=<span style="color: #000000"> []
       </span><span style="color: #0000ff">for</span> wallpaper <span style="color: #0000ff">in</span><span style="color: #000000"> wallpapers:
           self.links.append(wallpaper[</span><span style="color: #800000">"</span><span style="color: #800000">href</span><span style="color: #800000">"</span><span style="color: #000000">])

   
   </span><span style="color: #0000ff">def</span><span style="color: #000000"> downloadImage(self):

       </span><span style="color: #0000ff">if</span> os.path.exists(<span style="color: #800000">"</span><span style="color: #800000">./Divnil</span><span style="color: #800000">"</span>) !=<span style="color: #000000"> True:
           os.mkdir(</span><span style="color: #800000">"</span><span style="color: #800000">./Divnil</span><span style="color: #800000">"</span><span style="color: #000000">)

       </span><span style="color: #0000ff">for</span> url <span style="color: #0000ff">in</span><span style="color: #000000"> self.links:
           
           url </span>= self.head +<span style="color: #000000"> url
           
           resp </span>= requests.get(url, headers=<span style="color: #000000">self.headers)
           </span><span style="color: #0000ff">if</span>  resp.status_code !=<span style="color: #000000"> requests.codes.OK:
               </span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">URL: %s REQUESTS ERROR. CODE: %d</span><span style="color: #800000">"</span> %<span style="color: #000000"> (url, resp.status_code))
               </span><span style="color: #0000ff">continue</span><span style="color: #000000">
           
           soup </span>= BeautifulSoup(resp.text, <span style="color: #800000">"</span><span style="color: #800000">html.parser</span><span style="color: #800000">"</span><span style="color: #000000">)
           
           img </span>= soup.find(<span style="color: #800000">"</span><span style="color: #800000">div</span><span style="color: #800000">"</span>, id=<span style="color: #800000">"</span><span style="color: #800000">contents</span><span style="color: #800000">"</span>).find(<span style="color: #800000">"</span><span style="color: #800000">img</span><span style="color: #800000">"</span>, id=<span style="color: #800000">"</span><span style="color: #800000">main_content</span><span style="color: #800000">"</span><span style="color: #000000">)
           img_url </span>= self.head + img[<span style="color: #800000">"</span><span style="color: #800000">original</span><span style="color: #800000">"</span>].replace(<span style="color: #800000">"</span><span style="color: #800000">../</span><span style="color: #800000">"</span>, <span style="color: #800000">""</span><span style="color: #000000">)
           img_name </span>= img[<span style="color: #800000">"</span><span style="color: #800000">alt</span><span style="color: #800000">"</span><span style="color: #000000">]
           
           </span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">start download %s ...</span><span style="color: #800000">"</span> %<span style="color: #000000"> img_url)

           resp </span>= requests.get(img_url, headers=<span style="color: #000000">self.headers)
           </span><span style="color: #0000ff">if</span> resp.status_code !=<span style="color: #000000"> requests.codes.OK:
               </span><span style="color: #0000ff">print</span>(<span style="color: #800000">"</span><span style="color: #800000">IMAGE %s DOWNLOAD FAILED.</span><span style="color: #800000">"</span> %<span style="color: #000000"> img_name)
               </span><span style="color: #0000ff">continue</span>

           <span style="color: #0000ff">if</span> <span style="color: #800000">"</span><span style="color: #800000">/</span><span style="color: #800000">"</span> <span style="color: #0000ff">in</span><span style="color: #000000"> img_name:
               img_name </span>= img_name.split(<span style="color: #800000">"</span><span style="color: #800000">/</span><span style="color: #800000">"</span>)[1<span style="color: #000000">]

           with open(</span><span style="color: #800000">"</span><span style="color: #800000">./Divnil/</span><span style="color: #800000">"</span> + img_name + <span style="color: #800000">"</span><span style="color: #800000">.jpg</span><span style="color: #800000">"</span>, <span style="color: #800000">"</span><span style="color: #800000">wb</span><span style="color: #800000">"</span><span style="color: #000000">) as f:
               f.write(resp.content)


   </span><span style="color: #0000ff">def</span><span style="color: #000000"> main(self):
       self.getImageInfoUrl()
       self.downloadImage()


</span><span style="color: #0000ff">if</span> <span style="color: #800080">__name__</span> == <span style="color: #800000">"</span><span style="color: #800000">__main__</span><span style="color: #800000">"</span><span style="color: #000000">:
   divnil </span>=<span style="color: #000000"> Divnil()
   divnil.main()</span>

 

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理。

作者| zckun

来源|简书


搞代码网(gaodaima.com)提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected],我们会在看到邮件的第一时间内为您处理,或直接联系QQ:872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:Python爬虫练习:爬取网站动漫图片

喜欢 (0)
[搞代码]
分享 (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址