• 欢迎访问搞代码网站,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站!
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏搞代码吧

python手机号前7位归属地爬虫代码实例

python 搞代码 4年前 (2022-01-08) 33次浏览 已收录 0个评论

这篇文章主要介绍了python手机号前7位归属地爬虫代码实例,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下

需求分析

项目上需要用到手机号前7位,判断号码是否合法,还有归属地查询。旧的数据是几年前了太久了,打算用python爬虫重新爬一份

单线程版本

 # coding:utf-8 import requests from datetime import datetime class PhoneInfoSpider: def __init__(self, phoneSections): self.phoneSections = phoneSections def phoneInfoHandler(self, textData): text = textData.splitlines(True) # print("text length:" + str(len(text))) if len(text) >= 9: number = text[1].split('\'')[1] province = text[2].split('\'')[1] mobile_area = text[3].split('\'')[1] postcode = text[5].split('\'')[1] line = "number:" + number + ",province:" + province + ",mobile_area:" + mobile_area + ",postcode:" + postcode line_text = number + "," + province + "," + mobile_area + "," + postcode print(line_text) # print("province:" + province) try: f = open('./result.txt', 'a') f.write(str(line_text) + '\n') except Exception as e: print(Exception, ":", e) def requestPhoneInfo(self, phoneNum): try: url = 'https://tcc.taobao.com/cc/json/mobile_tel_segment.htm?tel=' + phoneNum response = requests.get(url) self.phoneInfoHandler(response.text) except Exception as e: print(Exception, ":", e) def requestAllSections(self): # last用于接上次异常退出前的号码 last = 0 # last = 4 # 自动生成手机号码,后四位补0 for head in self.phoneSections: head_begin = datetime.now() print(head + " begin time:" + str(head_begin)) # for i in range(last, 10000): for i in range(last, 10): middle = str(i).zfill(4) phoneNum = head + middle + "0000" self.requestPhoneInfo(phoneNum) last = 0 head_end = datetime.now() print(head + " end time:" + str(head_end)) if __name__ == '__main__': task_begin = datetime.now() print("phone check begin time:" + str(task_begin)) # 电信,联通,移动,虚拟运营商 dx = ['133', '149', '153', '173', '177', '180', '181', '189', '199'] lt = ['130', '131', '132', '145', '146', '155', '156', '166', '171', '175', '176', '185', '186', '166'] yd = ['134', '135', '136', '137', '138', '139', '147', '148', '150', '151', '152', '157', '158', '159', '172', '178', '182', '183', '184', '187', '188', '198'] add = ['170'] all_num = dx + lt + yd + add # print(all_num) print(len(all_num)) # 要爬的号码段 spider = PhoneInfoSpider(all_num) spider.requestAllSections() task_end = datetime.now() print("phone check end time:" + str(task_end))

发现爬取一个号段,共10000次查询,单线程版大概要多1个半小时,太慢了。

多线程版本

 # coding:utf-8 import requests from datetime import datetime import queue import threading threadNum = 32 class MyThread(threading.Thread): def __init__(self, func): threading.Thread.__init__(self) self.func = func def run(self): self.func() def requestPhoneInfo(): global lock while True: lock.acquire() if q.qsize() != 0: print("queue size:" + str(q.qsize())) p = q.get() # 获得任务 lock.release() middle = str(9999 - q.qsize()).zfill(4) phoneNum = phone_head + middle + "0000" print("phoneNum:" + phoneNum) try: url = 'https://tcc.taobao.com/cc/json/mobile_tel_segment.htm?tel=' + phoneNum # print(url) response = requests.get(url) # print(response.text) phoneInfoHandler(response.text) except Exception as e: print(Exception, ":", e) else: lock.release() break def phoneInfoHandler(textData): text = textData.splitlines(True) if len(text) >= 9: number = text[1].split('\'')[1] province = text[2].split('\'')[1] mobile_area = text[3].split('\'')[1] postcode = text[5].split('\'')[1] line = "number:" + number + ",province:" + province + ",mobile_area:" + mobile_area + ",postcode:" + postcode line_text = number + "," + province + "," + mobile_area + "," + postcode print(line_text) # print("province:" + province) try: f = open('./result.txt', 'a') f.write(str(line_text) + '\n') except Exception as e: print(Exception, ":", e) if __name__ == '__main__': task_begin = datetime.now() print("phone check begin time:" + str(task_begin)) dx = ['133', '149', '153', '173', '177', '180', '181', '189', '199'] lt = ['130', '131', '132', '145', '155', '156', '166', '171', '175', '176', '185', '186', '166'] yd = ['134', '135', '136', '137', '138', '139', '147', '150', '151', '152', '157', '158', '159', '172', '178', '182', '183', '184', '187', '188', '198'] all_num = dx + lt + yd print(len(all_num)) for head in all_num: head_begin = datetime.now() print(head + " begin time:" + str(head_begin)) q = queue.Queue() threads = [] lock = threading.Lock() for p in range(10000): q.put(p + 1) print(q.qsize()) for i in range(threadNum): middle = str(i).zfill(4) global phone_head phone_head = head thread = MyThread(request<strong style="color:transparent">来源gao@daima#com搞(%代@#码网</strong>PhoneInfo) thread.start() threads.append(thread) for thread in threads: thread.join() head_end = datetime.now() print(head + " end time:" + str(head_end)) task_end = datetime.now() print("phone check end time:" + str(task_end))

多线程版的1个号码段1000条数据,大概2,3min就好,cpu使用飙升,大概维持在70%左右。

总共40多个号段,爬完大概1,2个小时,总数据41w左右

以上就是python手机号前7位归属地爬虫代码实例的详细内容,更多请关注gaodaima搞代码网其它相关文章!


搞代码网(gaodaima.com)提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected],我们会在看到邮件的第一时间内为您处理,或直接联系QQ:872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:python手机号前7位归属地爬虫代码实例

喜欢 (0)
[搞代码]
分享 (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址