直接上代码:
#!/usr/bin/python<BR># -*- coding: utf-8 -*-</P><P>import urllib</P><P>import os,datetime,string</P><P>import sys</P><P>from bs4 import BeautifulSoup</P><P>reload(sys)</P><P>sys.setdefaultencoding('utf-8')</P><P>__BASEURL__ = 'http://bj.58.com/'</P><P>__INITURL__ = "http://bj.58.com/shoujiweixiu/"</P><P>soup = BeautifulSoup(urllib.urlopen(__INITURL__))</P><P>lvlELements = soup.html.body.find('div','selectbarTable').find('tr').find_next_sibling('tr')('a',href=True)</P><P>f = open('data1.txt','a')</P><P>for element in lvlELements[1:]:</P><P> f.write((element.get_text()+'\n\r' ))</P><P> url = __BASEURL__ + element.get('href')</P><P> print url</P><P> soup = Beau<strong style="color:transparent">来源gaodai#ma#com搞@代~码$网</strong>tifulSoup(urllib.urlopen(url))</P><P> lv2ELements = soup.html.body.find('table','tblist').find_all('tr')</P><P> for item in lv2ELements:<BR> addr = item.find('td','t').find('a').get_text()<BR> phone = item.find('td','tdl').find('b','tele').get_text()<BR> f.write('地址:'+addr +' 电话:'+ phone + '\r\n\r')</P><P>f.close()<BR>
直接执行后,存在 data1.txt中就会有商家的地址和电话等信息。
BeautifulSoup api 的地址为: http://www.crummy.com/software/BeautifulSoup/bs4/doc/