• 欢迎访问搞代码网站,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站!
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏搞代码吧

Python实现视频爬取

python 搞代码 4年前 (2022-01-09) 26次浏览 已收录 0个评论

Python可以用来做什么?公司里主要是爬取数据,并把爬回来的数据进行分析和挖掘,然而我们自己可以用它来爬取一些资源去使用,比如,想看的剧。本文中,小编将分享爬取视频的代码,大家存起来试试吧!

下载流式文件,requests库中请求的stream设为True就可以啦,文档在此。

先找一个视频地址试验一下:

# -*- coding: utf-8 -*-import requestsdef download_file(url, path):    with requests.get(url, stream=True) as r:        chunk_size = 1024        content_size = int(r.headers['content-length'])        print '下载开始'        with open(path, "wb") as f:            for chunk in r.iter_content(chunk_size=chunk_size):                f.write(chunk)if __name__ == '__main__':    url = '就在原帖...'    path = '想存哪都行'    download_file(url, path)

遭遇当头一棒:

AttributeError: __exit__

这文档也会骗人的么!

看样子是没有实现上下文需要的__exit__方法。既然只是为了保证要让r最后close以释放连接池,那就使用contextlib的closing特性好了:

# -*- coding: utf-8 -*-import requestsfrom contextlib import closingdef download_file(url, path):    with closing(requests.get(url, stream=True)) as r:        chunk_size = 1024        content_size = int(r.headers['content-length'])        print '下载开始'        with open(path, "wb") as f:            for chunk in r.iter_content(chunk_size=chunk_size):                f.write(chunk)

程序正常运行了,不过我盯着这文件,怎么大小不见变啊,到底是完成了多少了呢?还是要让下好的内容及时存进硬盘,还能省点内存是不是:

# -*- coding: utf-8 -*-import requestsfrom contextlib import closingimport osdef download_file(url, path):    with closing(requests.get(url, stream=True)) as r:        chunk_size = 1024        content_size = int(r.headers['content-length'])        print '下载开始'        with open(path, "wb") as f:            for chunk in r.iter_content(chunk_size=chunk_size):                f.write(chunk)                f.flush()                os.fsync(f.fileno())

文件以肉眼可见的速度在增大,真心疼我的硬盘,还是最后一次写入硬盘吧,程序中记个数就好了:

def download_file(url, path):    with closing(requests.get(url, stream=True)) as r:        chunk_size = 1024        content_size = int(r.headers['content-length'])        print '下载开始'        with open(path, "wb") as f:            n = 1            for chunk in r.iter_content(chunk_size=chunk_size):                loaded = n*1024.0/content_size                f.write(chunk)                print '已下载{0:%}'.format(loaded)                n += 1

结果就很直观了:

已下载2.579129%已下载2.581255%已下载2.583382%已下载2.<em>本文来源[email protected]搞@^&代*@码)网5</em>585508%

心怀远大理想的我怎么会只满足于这一个呢,写个类一起使用吧:

# -*- coding: utf-8 -*-import requestsfrom contextlib import closingimport timedef download_file(url, path):    with closing(requests.get(url, stream=True)) as r:        chunk_size = 1024*10        content_size = int(r.headers['content-length'])        print '下载开始'        with open(path, "wb") as f:            p = ProgressData(size = content_size, unit='Kb', block=chunk_size)            for chunk in r.iter_content(chunk_size=chunk_size):                f.write(chunk)                p.output()class ProgressData(object):    def __init__(self, block,size, unit, file_name='', ):        self.file_name = file_name        self.block = block/1000.0        self.size = size/1000.0        self.unit = unit        self.count = 0        self.start = time.time()    def output(self):        self.end = time.time()        self.count += 1        speed = self.block/(self.end-self.start) if (self.end-self.start)>0 else 0        self.start = time.time()        loaded = self.count*self.block        progress = round(loaded/self.size, 4)        if loaded >= self.size:            print u'%s下载完成\r\n'%self.file_name        else:            print u'{0}下载进度{1:.2f}{2}/{3:.2f}{4} 下载速度{5:.2%} {6:.2f}{7}/s'.\                  format(self.file_name, loaded, self.unit,\                  self.size, self.unit, progress, speed, self.unit)            print '%50s'%('/'*int((1-progress)*50))

搞代码网(gaodaima.com)提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected],我们会在看到邮件的第一时间内为您处理,或直接联系QQ:872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:Python实现视频爬取

喜欢 (0)
[搞代码]
分享 (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址