前言
本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理。
以下文章来源于青灯编程 ,作者:清风
Python爬虫、数据分析、网站开发等案例教程视频免费在线观看
<code><span class="hljs-attribute">https:<span class="hljs-comment">//space.bilibili.<a href="https://www.gaodaima.com/tag/com" title="查看更多关于com的文章" target="_blank">com</a>/523606542</span></span></code>
www#gaodaima.com来源gao($daima.com搞@代@#码(网搞代码
基本开发环境
- Python 3.6
- Pycharm
相关模块的使用
<code><span class="hljs-keyword"><a href="https://www.gaodaima.com/tag/import" title="查看更多关于import的文章" target="_blank">import</a> time <span class="hljs-keyword">import os <span class="hljs-keyword">import re <span class="hljs-keyword">import requests from selenium <span class="hljs-keyword">import webdriver from selenium.webdriver.chrome.options <span class="hljs-keyword">import Options</span></span></span></span></span></span></code>
目标网页分析
如何获取视频地址
西瓜视频有两种:
1、有水印视频
2、无水印视频
有水印视频
在网页源代码中
<code><span class="hljs-attribute">https:<span class="hljs-comment">//www.ixigua.com/embed?group_id=6817258591586615812</span></span></code>
这个链接点击进去是视频播放地址。
前端页面中已有视频真实地址
<code>//v9-xg-web-s.ixigua.com/ac99e1bf75dd0faa6854d9e5367fac3f/<span class="hljs-number">5fe894d7/video/tos/cn/tos-cn-ve-<span class="hljs-number">4/<span class="hljs-number">626cf09c0830417da4b70982950cedd9/?a=<span class="hljs-number">1768&br=<span class="hljs-number">3891&bt=<span class="hljs-number">1297&cd=<span class="hljs-number">0%7C0%7C0&cr=<span class="hljs-number">0&cs=<span class="hljs-number">0&cv=<span class="hljs-number">1&dr=<span class="hljs-number">0&ds=<span class="hljs-number">3&er=<span class="hljs-number">0&l=<span class="hljs-number">20201227210214010204050203275E2F92&lr=default&mime_type=video_mp4&qs=<span class="hljs-number">0&rc=anQ3aWdzNjd2dDMzZjczM0ApPDQ2NjU8aGU3NzplMzZoNWdfMWguMmA0NWFfLS02LS9zczIwXjBfY2A2MmIvXjMyLjI6Yw%3D%3D&vl=&vr=</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></code>
只要请求这个网址即可下载保存视频。
无水印视频
无水印的视频下载比较麻烦,首先它是音频和视频画面分离的
水印是没有水印,但是视频是没有声音的。
如何找音频和视频地址呢?
使用开发者工具,在XHR里面是有相对对应链接的
音频地址:
<code><span class="hljs-attribute">https:<span class="hljs-comment">//v9-xg-web-s.ixigua.com/79457295a8a89bf86bdcd157eb848175/5fe895f4/video/tos/cn/tos-cn-vd-0026/43771a1a38ea473d9cb5b8e7c0f651f3/media-audio-und-mp4a/?a=1768&br=0&bt=0&cd=0%7C0%7C0&cr=0&cs=0&cv=1&dr=0&ds=&er=0&l=20201227210659010028033025224FC377&lr=default&mime_type=video_mp4</span></span></code>
视频画面地址:
<code><span class="hljs-attribute">https:<span class="hljs-comment">//v9-xg-web-s.ixigua.com/9b4e18f3b29244557c83b8e88f13dd1b/5fe895f4/video/tos/cn/tos-cn-vd-0026/86a41ef8ebd3496585db455ae56b3ff3/media-video-avc1/?a=1768&br=12159&bt=4053&cd=0%7C0%7C0&cr=0&cs=0&cv=1&dr=0&ds=4&er=0&l=20201227210659010028033025224FC377&lr=default&mime_type=video_mp4</span></span></code>
所以如果想要爬取西瓜视频无水印版本的话,不仅要下载视频,还要下载音频,然后再合成视频和音频两个文件,和之前的爬取B视频有相似之处。
西瓜视频水印版本下载
1、获取源代码提取视频播放地址以及标题
<code><span class="hljs-function"><span class="hljs-keyword">def <span class="hljs-title">main<span class="hljs-params">(html_url): headers = { <span class="hljs-string">"user-agent": <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" } response = requests.get(url=html_url, headers=headers) response.encoding = response.apparent_encoding play_url = re.findall(<span class="hljs-string">""embedUrl":"(.*?)"", response.text)[<span class="hljs-number">0] title = re.findall(<span class="hljs-string">"<title data-react-helmet="true">(.*?)</title>", response.text)[<span class="hljs-number">0].replace(<span class="hljs-string">" - 西瓜视频", <span class="hljs-string">"")</span></span></span></span></span></span></span></span></span></span></span></span></code>
2、获取视频真实下载地址
这里使用selenium主要是因为,链接的变化规律问题。每次请求网页的参数都不一样,比较难以分析,但是前端网页中是有显示真实的视频地址,所以可以使用selenium直接提取。
<code><span class="hljs-function"><span class="hljs-keyword">def <span class="hljs-title">get_video_url<span class="hljs-params">(html_url): <span class="hljs-string">"""传入播放地址,获取视频下载地址""" chrome_options = Options() chrome_options.add_argument(<span class="hljs-string">"--headless") os.system(<span class="hljs-string">"taskkill /f /im chromedriver.exe") driver = webdriver.Chrome(executable_path=<span class="hljs-string">"chromedriver.exe", options=chrome_options) driver.get(html_url) driver.implicitly_wait(<span class="hljs-number">10) video_url = driver.find_element_by_css_selector(<span class="hljs-string">"#player_default video").get_attribute(<span class="hljs-string">"src") driver.close() <span class="hljs-keyword">return video_url</span></span></span></span></span></span></span></span></span></span></span></span></code>
3、视频下载保存
方式一:正常保存方式
<code><span class="hljs-function"><span class="hljs-keyword">def <span class="hljs-title">save<span class="hljs-params">(video_url, video_title): filename = <span class="hljs-string">"video" + video_title + <span class="hljs-string">".mp4" video_headers = { <span class="hljs-string">"user-agent": <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" } video_response = requests.get(url=video_url, headers=video_headers).content <span class="hljs-keyword">with open(filename, mode=<span class="hljs-string">"wb") <span class="hljs-keyword">as f: f.write(video_response) print(<span class="hljs-string">"正在下载保存:", video_title)</span></span></span></span></span></span></span></span></span></span></span></span></code>
运行效果:
方式二:实现下载进度条
<code>def progressbar(video_url, video_title): <span class="hljs-keyword">start = time.time() <span class="hljs-comment"># 下载开始时间 response = requests.get(video_url, stream=<span class="hljs-literal">True) <span class="hljs-comment"># stream=True必须写上 <span class="hljs-keyword">size = <span class="hljs-number">0 <span class="hljs-comment"># 初始化已下载大小 chunk_size = <span class="hljs-number">1024 <span class="hljs-comment"># 每次下载的数据大小 content_size = <span class="hljs-built_in">int(response.headers[<span class="hljs-string">"content-length"]) <span class="hljs-comment"># 下载文件总大小 try: <span class="hljs-keyword">if response.status_code == <span class="hljs-number">200: <span class="hljs-comment"># 判断是否响应成功 print(<span class="hljs-string">"Start download,[File size]:{size:.2f} MB".format( <span class="hljs-keyword">size=content_size / chunk_size / <span class="hljs-number">1024)) <span class="hljs-comment"># 开始下载,显示下载文件大小 filepath = <span class="hljs-string">"video" + video_title + <span class="hljs-string">".mp4" <span class="hljs-comment"># 设置图片name,注:必须加上扩展名 <span class="hljs-keyword">with <span class="hljs-keyword">open(filepath, <span class="hljs-string">"wb") <span class="hljs-keyword">as <span class="hljs-keyword">file: <span class="hljs-comment"># 显示进度条 <span class="hljs-keyword">for <span class="hljs-keyword">data <span class="hljs-keyword">in response.iter_content(chunk_size=chunk_size): file.write(<span class="hljs-keyword">data) <span class="hljs-keyword">size += <span class="hljs-keyword">len(<span class="hljs-keyword">data) print(<span class="hljs-string">"[下载进度]:%s%.2f%%" % (<span class="hljs-string">"▇" * <span class="hljs-built_in">int(<span class="hljs-keyword">size * <span class="hljs-number">50 / content_size), <span class="hljs-built_in">float(<span class="hljs-keyword">size / content_size * <span class="hljs-number">100)), <span class="hljs-keyword">end=<span class="hljs-string">" ") <span class="hljs-keyword">end = time.time() <span class="hljs-comment"># 下载结束时间 print(<span class="hljs-string">"Download completed!,times: %.2f秒" % (<span class="hljs-keyword">end - <span class="hljs-keyword">start)) <span class="hljs-comment"># 输出下载用时时间 print(f<span class="hljs-string">"视频【 {video_title} 】已经保存完毕") <span class="hljs-keyword">except: print(<span class="hljs-string">"Error")</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></code>
运行效果:
只要输入视频的ID即可下载视频,之后也可以做一个简单GUI桌面应用软件,之前文章都是有写过类似的。
完整代码
<code><span class="hljs-keyword">import time <span class="hljs-keyword">import os <span class="hljs-keyword">import re <span class="hljs-keyword">import requests <span class="hljs-keyword">from selenium <span class="hljs-keyword">import webdriver <span class="hljs-keyword">from selenium.webdriver.chrome.options <span class="hljs-keyword">import Options <span class="hljs-function"><span class="hljs-keyword">def <span class="hljs-title">get_video_url<span class="hljs-params">(html_url): <span class="hljs-string">"""传入播放地址,获取视频下载地址""" chrome_options = Options() chrome_options.add_argument(<span class="hljs-string">"--headless") os.system(<span class="hljs-string">"taskkill /f /im chromedriver.exe") driver = webdriver.Chrome(executable_path=<span class="hljs-string">"chromedriver.exe", options=chrome_options) driver.get(html_url) driver.implicitly_wait(<span class="hljs-number">10) video_url = driver.find_element_by_css_selector(<span class="hljs-string">"#player_default video").get_attribute(<span class="hljs-string">"src") driver.close() <span class="hljs-keyword">return video_url <span class="hljs-comment"># def save(video_url, video_title): <span class="hljs-comment"># filename = "video" + video_title + ".mp4" <span class="hljs-comment"># video_headers = { <span class="hljs-comment"># "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" <span class="hljs-comment"># } <span class="hljs-comment"># video_response = requests.get(url=video_url, headers=video_headers).content <span class="hljs-comment"># with open(filename, mode="wb") as f: <span class="hljs-comment"># f.write(video_response) <span class="hljs-comment"># print("正在下载保存:", video_title) <span class="hljs-function"><span class="hljs-keyword">def <span class="hljs-title">progressbar<span class="hljs-params">(video_url, video_title): start = time.time() <span class="hljs-comment"># 下载开始时间 response = requests.get(video_url, stream=<span class="hljs-literal">True) <span class="hljs-comment"># stream=True必须写上 size = <span class="hljs-number">0 <span class="hljs-comment"># 初始化已下载大小 chunk_size = <span class="hljs-number">1024 <span class="hljs-comment"># 每次下载的数据大小 content_size = int(response.headers[<span class="hljs-string">"content-length"]) <span class="hljs-comment"># 下载文件总大小 <span class="hljs-keyword">try: <span class="hljs-keyword">if response.status_code == <span class="hljs-number">200: <span class="hljs-comment"># 判断是否响应成功 print(<span class="hljs-string">"Start download,[File size]:{size:.2f} MB".format( size=content_size / chunk_size / <span class="hljs-number">1024)) <span class="hljs-comment"># 开始下载,显示下载文件大小 filepath = <span class="hljs-string">"video" + video_title + <span class="hljs-string">".mp4" <span class="hljs-comment"># 设置图片name,注:必须加上扩展名 <span class="hljs-keyword">with open(filepath, <span class="hljs-string">"wb") <span class="hljs-keyword">as file: <span class="hljs-comment"># 显示进度条 <span class="hljs-keyword">for data <span class="hljs-keyword">in response.iter_content(chunk_size=chunk_size): file.write(data) size += len(data) print(<span class="hljs-string">"[下载进度]:%s%.2f%%" % (<span class="hljs-string">"▇" * int(size * <span class="hljs-number">50 / content_size), float(size / content_size * <span class="hljs-number">100)), end=<span class="hljs-string">" ") end = time.time() <span class="hljs-comment"># 下载结束时间 print(<span class="hljs-string">"Download completed!,times: %.2f秒" % (end - start)) <span class="hljs-comment"># 输出下载用时时间 print(<span class="hljs-string">f"视频【 <span class="hljs-subst">{video_title} 】已经保存完毕") <span class="hljs-keyword">except: print(<span class="hljs-string">"Error") <span class="hljs-function"><span class="hljs-keyword">def <span class="hljs-title">main<span class="hljs-params">(html_url): headers = { <span class="hljs-string">"cookie": <span class="hljs-string">"输入你自己的cookie", <span class="hljs-string">"referer": <span class="hljs-string">"https://www.ixigua.com/?wid_try=1", <span class="hljs-string">"user-agent": <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" } response = requests.get(url=html_url, headers=headers) response.encoding = response.apparent_encoding play_url = re.findall(<span class="hljs-string">""embedUrl":"(.*?)"", response.text)[<span class="hljs-number">0] title = re.findall(<span class="hljs-string">"<title data-react-helmet="true">(.*?)</title>", response.text)[<span class="hljs-number">0].replace(<span class="hljs-string">" - 西瓜视频", <span class="hljs-string">"") video_url = get_video_url(play_url) progressbar(video_url, title) <span class="hljs-keyword">if __name__ == <span class="hljs-string">"__main__": video_id = input(<span class="hljs-string">"请输入你要下载的视频ID:") url = <span class="hljs-string">f"https://www.ixigua.com/<span class="hljs-subst">{video_id}" main(url)</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></code>