并发编程: https://www.processon.com/mindmap/5f636bac0791295dccc46f28
<span>from</span> concurrent.futures <span><a href="https://www.gaodaima.com/tag/import" title="查看更多关于import的文章" target="_blank">import</a></span><span> ThreadPoolExecutor,ProcessPoolExecutor </span><span>import</span><span> requests </span><span>import</span><span> os </span><span>def</span><span> <a href="https://www.gaodaima.com/tag/get" title="查看更多关于get的文章" target="_blank">get</a>_page(<a href="https://www.gaodaima.com/tag/url" title="查看更多关于url的文章" target="_blank">url</a>): </span><span>print</span>(<span>"</span><span><进程%s> get %s</span><span>"</span> %<span>(os.getpid(),url)) respone</span>=<span>requests.get(url) </span><span>if</span> respone.status_code == 200<span>: </span><span>return</span> {<span>"</span><span>url</span><span>"</span>:url,<span>"</span><span>text</span><span>"</span><span>:respone.text} </span><span>def</span><span> parse_page(res): res</span>=<span>res.result() </span><span>print</span>(<span>"</span><span><进程%s> parse %s</span><span>"</span> %(os.getpid(),res[<span>"</span><span>url</span><span>"</span><span>])) parse_res</span>=<span>"</span><span>url:<%s> size:[%s] </span><span>"</span> %(res[<span>"</span><span>url</span><span>"</span>],len(res[<span>"</span><span>text</span><span>"</span><span>])) with open(</span><span>"</span><span>db.txt</span><span>"</span>,<span>"</span><span>a</span><span>"</span><span>) as f: f.write(parse_res) </span><span>if</span> <span>__name__</span> == <span>"</span><span>__main__</span><span>"</span><span>: urls</span>=<span>[ </span><span>"</span><span>https://www.baidu.com</span><span>"</span><span>, </span><span>"</span><span>https://www.python.org</span><span>"</span><span>, </span><span>"</span><span>https://www.openstack.org</span><span>"</span><span>, </span><span>"</span><span>https://help.github.com/</span><span>"</span><span>, </span><span>"</span><span>http://www.sina.com.cn/</span><span>"</span><span> ] p</span>=ProcessPoolExecutor(3<span>) </span><span>for</span> url <span>in</span><span> urls: p.submit(get_page,url).add_done_callback(parse_page) </span><span>#</span><span>parse_page拿到的是一个future对象obj,需要用obj.result()拿到结果</span>
www#gaodaima.com来源gaodai#ma#com搞*代#码网搞代码
并发 爬虫小程序
<span>#</span><span>### 有那几种IO模型</span> <span> blocking IO 阻塞IO nonblocking IO 非阻塞IO IO multiplexing 多路复用IO 也叫事件驱动IO(event driven IO) signal driven IO 信号驱动IO asynchronous IO 异步IO 由signal driven IO(信号驱动IO)在实际中并不常用,所以主要介绍其余四种IO Model。 </span><span>#</span><span>### IO模型的区别是在那两个阶段上?</span> 1)wait <span>for</span> data 等待数据准备 (Waiting <span>for</span><span> the data to be ready) </span>2)copy data 将数据从内核拷贝到进程中(Copying the data <span>from</span><span> the kernel to the process) blocking IO:在IO执行的两个阶段(等待数据和拷贝数据两个阶段)都被block了。 在非阻塞式IO:在等待数据阶段是非阻塞的,用户进程需要不断的主动询问kernel(内核)数据准备好了没有。 需要注意,拷贝数据整个过程,进程仍然是属于阻塞的状态。 非阻塞IO模型绝不被推荐。 多路复用IO: 优势在于可以处理多个连接,不适用于单个连接 当用户进程调用了select,那么整个进程会被block,而同时,kernel会“监视”所有select负责的socket,当任何一个socket中的数据准备好了,select就会返回。这个时候用户进程再调用read操作,将数据从kernel拷贝到用户进程。 对于每一个socket,一般都设置成为non</span>-<span>blocking,但是,整个用户的process其实是一直被block的。只不过process是被select这个函数block, 而不是被socket IO给block。 异步IO:都不阻塞。当进程发起IO 操作之后,就直接返回再也不理睬了,直到kernel发送一个信号,告诉进程说IO完成。在这整个过程中,进程完全没有被block。</span>
IO模型