本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理
这篇文章主要介绍了Python如何爬取b站热门视频并导入Excel,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
代码如下
<span><a href="https://www.gaodaima.com/tag/import" title="查看更多关于import的文章" target="_blank">import</a></span><span> requests </span><span>from</span> lxml <span>import</span><span> etree </span><span>import</span><span> xlwt </span><span>import</span><span> os </span><span>#</span><span> 爬取b站热门视频信息</span> <span>def</span><span> spider(): video_list </span>=<span> [] url </span>= <span>"</span><span>https://www.bilibili.com/ranking?spm_id_from=333.851.b_7072696d61727950616765546162.3</span><span>"</span><span> html </span>= requests.get(url, headers={<span>"</span><span>User-Agent</span><span>"</span>: <span>"</span><span>Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36</span><span>"</span><span>}).text html </span>=<span> etree.HTML(html) infolist </span>= html.xpath(<span>"</span><span>//li[@class="rank-item"]</span><span>"</span><span>) </span><span>for</span> item <span>in</span><span> infolist: rank </span>= <span>""</span>.join(item.xpath(<span>"</span><span>./div[@class="num"]/text()</span><span>"</span><span>)) video_link </span>= <span>""</span>.join(item.xpath(<span>"</span><span>.//div[@class="info"]/a/@href</span><span>"</span><span>)) title </span>= <span>""</span>.join(item.xpath(<span>"</span><span>.//div[@class="info"]/a/text()</span><span>"</span><span>)) payinfo </span>= <span>""</span>.join(item.xpath(<span>"</span><span>.//div[@class="detail"]/span/text()</span><span>"</span>)).split(<span>"</span><span>万</span><span>"</span><span>) play </span>= payinfo[0] + <span>"</span><span>万</span><span>"</span><span> comment </span>= payinfo[1<span>] </span><span>if</span> comment.isdigit() ==<span> False: comment </span>+= <span>"</span><span>万</span><span>"</span><span> upname </span>= <span>""</span>.join(item.xpath(<span>"</span><span>.//div[@class="detail"]/a/span/text()</span><span>"</span><span>)) uplink </span>= <span>"</span><span>http://</span><span>"</span> + <span>""</span>.join(item.xpath(<span>"</span><span>.//div[@class="detail"]/a/@href</span><span>"</span><span>)) hot </span>= <span>""</span>.join(item.xpath(<span>"</span><span>.//div[@class="pts"]/div/text()</span><span>"</span><span>)) video_list.append({ </span><span>"</span><span>rank</span><span>"</span><span>: rank, </span><span>"</span><span>videolink</span><span>"</span><span>: video_link, </span><span>"</span><span>title</span><span>"</span><span>: title, </span><span>"</span><span>play</span><span>"</span><span>: play, </span><span>"</span><span>comment</span><span>"</span><span>: comment, </span><span>"</span><span>upname</span><span>"</span><span>: upname, </span><span>"</span><span>uplink</span><span>"</span><span>: uplink, </span><span>"</span><span>hot</span><span>"</span><span>: hot }) </span><span>return</span><span> video_list </span><span>def</span><span> write_Excel(): </span><span>#</span><span> 将爬取的信息添加到Excel</span> video_list =<span> spider() workbook </span>= xlwt.Workbook() <span>#</span><span> 定义表格</span> sheet = workbook.add_sheet(<span>"</span><span>b站热门视频</span><span>"</span>) <span>#</span><span> 添加sheet的name</span> xstyle = xlwt.XFStyle() <span>#</span><span> 实例化表格样式对象</span> xstyle.alignment.horz = 0x02 <span>#</span><span> 字体居中</span> xstyle.alignment.vert = 0x01<span> head </span>= [<span>"</span><span>视频名</span><span>"</span>, <span>"</span><span>up主</span><span>"</span>,<span>"</span><span>排名</span><span>"</span>, <span>"</span><span>热度</span><span>"</span>,<span>"</span><span>播放量</span><span>"</span>,<span>"</span><span>评论数</span><span>"</span><span>] </span><span>for</span> h <span>in</span><span> range(len(head)): sheet.write(0, h, head[h], xstyle) i </span>= 1 <span>for</span> item <span>in</span><span> video_list: </span><span>#</span><span> 向单元格(视频名)添加该视频的超链接</span> <span>if</span> <span>"</span><span>"</span><span>"</span> <span>in</span> item[<span>"</span><span>title</span><span>"</span><span>]: item[</span><span>"</span><span>title</span><span>"</span>] = item[<span>"</span><span>title</span><span>"</span>].split(<span>"</span><span>"</span><span>"</span>)[1<span>] title_data </span>= <span>"</span><span>HYPERLINK("</span><span>"</span>+item[<span>"</span><span>videolink</span><span>"</span>]+<span>"</span><span>";"</span><span>"</span>+item[<span>"</span><span>title</span><span>"</span>]+<span>"</span><span>")</span><span>"</span> <span>#</span><span> 设置超链接</span> sheet.col(0).width = int(256 * len(title_data) * 3/5) <span>#</span><span> 设置列宽</span> <span> sheet.write(i, 0, xlwt.Formula(title_data), xstyle) name_data </span>= <span>"</span><span>HYPERLINK("</span><span>"</span>+item[<span>"</span><span>uplink</span><span>"</span>]+<span>"</span><span>";"</span><span>"</span>+item[<span>"</span><span>upname</span><span>"</span>]+<span>"</span><span>")</span><span>"</span><span> sheet.col(</span>1).width = int(256 * len(name_data) * 3/5<span>) sheet.write(i, </span>1<span>, xlwt.Formula(name_data), xstyle) sheet.write(i, </span>2, item[<span>"</span><span>rank</span><span>"</span><span>], xstyle) sheet.write(i, </span>3, item[<span>"</span><span>hot</span><span>"</span><span>], xstyle) sheet.write(i, </span>4, item[<span>"</span><span>play</span><span>"</span><span>], xstyle) sheet.write(i, </span>5, item[<span>"</span><span>comment</span><span>"</span><span>], xstyle) i </span>+= 1 <span>#</span><span> 如果文件存在,则将其删除</span> file = <span>"</span><span>b站热门视频信息.xls</span><span>"</span> <span>if</span><span> os.path.exists(file): os.remove(file) workbook.save(file) </span><span>if</span> <span>__name__</span> == <span>"</span><span>__main__</span><span>"</span><span>: write_Excel()</span>
www#gaodaima.com来源gaodaimacom搞#^代%!码&网搞代码
结果展示: