用PHP抓取百度贴吧邮箱数据
注:本程序可能非常适合那些做百度贴吧营销的朋友。
去逛百度贴吧的时候,经常会看到楼主分享一些资源,要求留下邮箱,楼主才给发。
对于一个热门的帖子,留下的邮箱数量是非常多的,楼主需要一个一个的去复制那些回复的邮箱,然后再粘贴发送邮件,不是被折磨死就是被累死。无聊至极写了一个抓取百度贴吧邮箱数据的程序,需要的拿走。
程序实现了一键抓取帖子全部邮箱和分页抓取邮箱两个功能,界面懒得做了,效果如下:
老规矩,直接贴源码
<?<span style="color: #000000">php</span><span style="color: #800080">$url2</span>=""<span style="color: #000000">;</span><span style="color: #800080">$page</span>=""<span style="color: #000000">;</span><span style="color: #0000ff">if</span>(<span style="color: #800080">$_GET</span>['url2']==""<span style="color: #000000">){ </span><span style="color: #800080">$url2</span>="http://tieba.baidu.com/p/2314539885?pn=1"<span style="color: #000000">;}</span><span style="color: #0000ff">else</span><span style="color: #000000">{ </span><span style="color: #800080">$url2</span>=<span style="color: #800080">$_GET</span>['url2'<span style="color: #000000">];}</span><span style="color: #0000ff">if</span>(<span style="color: #800080">$_GET</span>['page']==""<span style="color: #000000">){ </span><span style="color: #800080">$page</span>="1"<span style="color: #000000">;}</span><span style="color: #0000ff">else</span><span style="color: #000000">{ </span><span style="color: #800080">$page</span>=<span style="color: #800080">$_GET</span>['page'<span style="color: #000000">];}</span>?><table> <tr> <td>帖子链接:</td><td></td> </tr> <tr> <td>总页数:</td><td><input type="text" name="page" style="width:300px;" value="<?php echo <span style="color: #800080">$page</span>;?>" /></td> </tr> <tr> <td colspan="2"></td> </tr></table><table> <tr> <td>帖子链接:</td><td><input type="text" name="url2" value="<?php echo <span style="color: #800080">$url2</span>;?>" style="width:300px;" /></td> </tr> <tr> <td colspan="2"></td> </tr></table><?<span style="color: #000000">php</span><span style="color: #0000ff">if</span>(<span style="color: #800080">$_GET</span>['type']!=""<span style="color: #000000">){ </span><span style="color: #800080">$counts</span>=0<span style="color: #000000">; </span><span style="color: #0000ff">if</span>(<span style="color: #800080">$_GET</span>['type']=="getAll"<span style="color: #000000">){ </span><span style="color: #800080">$pages</span>=<span style="color: #800080">$_GET</span>['page'<span style="color: #000000">]; </span><span style="color: #800080">$url</span> = <span style="color: #800080">$_GET</span>['url'<span style="color: #000000">]<i style="color:transparent">本文来源gaodai$ma#com搞$$代**码)网8</i><strong>搞代gaodaima码</strong>; </span><span style="color: #0000ff">for</span>(<span style="color: #800080">$i</span>=0;<span style="color: #800080">$i</span><<span style="color: #800080">$pages</span>;<span style="color: #800080">$i</span>++<span style="color: #000000">){ </span><span style="color: #800080">$ch2</span> =<span style="color: #000000"> curl_init(); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_URL, <span style="color: #800080">$url</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_FOLLOWLOCATION, <span style="color: #0000ff">TRUE</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_SSL_VERIFYHOST, <span style="color: #0000ff">FALSE</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_SSL_VERIFYPEER, <span style="color: #0000ff">false</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_RETURNTRANSFER, <span style="color: #0000ff">TRUE</span><span style="color: #000000">); </span><span style="color: #800080">$texts</span> = curl_exec(<span style="color: #800080">$ch2</span><span style="color: #000000">); curl_close(</span><span style="color: #800080">$ch2</span><span style="color: #000000">); </span><span style="color: #800080">$dat</span>=getEmail(<span style="color: #800080">$texts</span><span style="color: #000000">); </span><span style="color: #0000ff">for</span>(<span style="color: #800080">$j</span>=0;<span style="color: #800080">$j</span><<span style="color: #008080">count</span>(<span style="color: #800080">$dat</span>);<span style="color: #800080">$j</span>++<span style="color: #000000">){ </span><span style="color: #0000ff">echo</span> <span style="color: #800080">$dat</span>[<span style="color: #800080">$j</span>]."<br />"<span style="color: #000000">; </span><span style="color: #800080">$counts</span>++<span style="color: #000000">; } } }</span><span style="color: #0000ff">else</span> <span style="color: #0000ff">if</span>(<span style="color: #800080">$_GET</span>['type']=="getNow"<span style="color: #000000">){ </span><span style="color: #800080">$url</span> = <span style="color: #800080">$_GET</span>['url2'<span style="color: #000000">]; </span><span style="color: #800080">$ch2</span> =<span style="color: #000000"> curl_init(); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_URL, <span style="color: #800080">$url</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_FOLLOWLOCATION, <span style="color: #0000ff">TRUE</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_SSL_VERIFYHOST, <span style="color: #0000ff">FALSE</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_SSL_VERIFYPEER, <span style="color: #0000ff">false</span><span style="color: #000000">); curl_setopt(</span><span style="color: #800080">$ch2</span>, CURLOPT_RETURNTRANSFER, <span style="color: #0000ff">TRUE</span><span style="color: #000000">); </span><span style="color: #800080">$texts</span> = curl_exec(<span style="color: #800080">$ch2</span><span style="color: #000000">); curl_close(</span><span style="color: #800080">$ch2</span><span style="color: #000000">); </span><span style="color: #800080">$dat</span>=getEmail(<span style="color: #800080">$texts</span><span style="color: #000000">); </span><span style="color: #0000ff">for</span>(<span style="color: #800080">$i</span>=0;<span style="color: #800080">$i</span><<span style="color: #008080">count</span>(<span style="color: #800080">$dat</span>);<span style="color: #800080">$i</span>++<span style="color: #000000">){ </span><span style="color: #0000ff">echo</span> <span style="color: #800080">$dat</span>[<span style="color: #800080">$i</span>]."<br />"<span style="color: #000000">; </span><span style="color: #800080">$counts</span>++<span style="color: #000000">; } } </span><span style="color: #0000ff">echo</span> '<h2>共采集到数据:'.<span style="color: #800080">$counts</span>.'条</h2>'<span style="color: #000000">;}</span><span style="color: #0000ff">function</span> getEmail(<span style="color: #800080">$str</span><span style="color: #000000">){ </span><span style="color: #800080">$pattern</span> = "/([a-z0-9\-_\.]+@[a-z0-9]+\.[a-z0-9\-_\.]+)/"<span style="color: #000000">; </span><span style="color: #008080">preg_match_all</span>(<span style="color: #800080">$pattern</span>,<span style="color: #800080">$str</span>,<span style="color: #800080">$emailArr</span><span style="color: #000000">); </span><span style="color: #0000ff">return</span> <span style="color: #800080">$emailArr</span>[0<span style="color: #000000">]; }</span>?>