前言
本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理。
以下文章来源于Python爬虫数据分析挖掘 ,作者:李运辰
根据输入的公司名称来爬取企查查网站中公司的详细信息
- 1、获取headers
- 2、登录成功后,可根据输入的公司名称进行查询操作,得到所需要的内容。
- 3、将获取到的文本进行文本特殊化处理,并将其汇总成一个dataframe,方便后面保存为csv
- 4、输入公司名称
- 5、最后执行此代码,查询companys列表中所有公司名称的详细信息并保存为csv。
1、获取headers
1、进入企查查官网进行注册并登录。
2、然后按F12弹出开发者工具,点击Network,然后你会看到企查查这个网址,点击一下
然后可以找到我们需要复制的header,这是非常关键的步骤,切记这个header是自己注册之后登录成功所获取的header,这样方便后面保存一次之后就可以在一定时间内无限访问网址进行查询的操作。
<span>from</span> bs4 <span>import</span><span> BeautifulSoup </span><span>import</span><span> requests </span><span>import</span><span> time </span><span>#</span><span> 保持会话</span><span> #</span><span> 新建一个session对象</span> sess =<span> requests.session() </span><span>#</span><span> 添加headers(header为自己登录的企查查网址,输入账号密码登录之后所显示的header,此代码的上方介绍了获取方法)</span> afterLogin_headers = {<span>"</span><span>User-Agent</span><span>"</span>: <span>"</span><span>此代码上方介绍了获取的方法</span><span>"</span><span>} </span><span>#</span><span> post请求(代表着登录行为,登录一次即可保存,方便后面执行查询指令)</span> login = {<span>"</span><span>user</span><span>"</span>:<span>"</span><span>自己注册的账号</span><span>"</span>,<span>"</span><span>password</span><span>"</span>:<span>"</span><span>密码</span><span>"</span><span>} sess.post(</span><span>"</span><span>https://www.qcc.com</span><span>"</span>,data=login,headers=afterLogin_headers)
www#gaodaima.com来源gaodai#ma#com搞*代#码网搞代码
整段代码的含义为:伪装成用户进行登录行为(返回200状态码代表着登录成功)。
2、登录成功后,可根据输入的公司名称进行查询操作,得到所需要的内容。
<span>def</span><span> get_company_message(company): </span><span>#</span><span> 获取查询到的网页内容(全部)</span> search = sess.get(<span>"</span><span>https://www.qcc.com/search?key={}</span><span>"</span>.format(company),headers=afterLogin_headers,timeout=10<span>) search.raise_for_status() search.encoding </span>= <span>"</span><span>utf-8</span><span>"</span> <span>#</span><span>linux utf-8</span> soup = BeautifulSoup(search.text,features=<span>"</span><span>html.parser</span><span>"</span><span>) href </span>= soup.find_all(<span>"</span><span>a</span><span>"</span>,{<span>"</span><span>class</span><span>"</span>: <span>"</span><span>title</span><span>"</span>})[0].get(<span>"</span><span>href</span><span>"</span><span>) time.sleep(</span>4<span>) </span><span>#</span><span> 获取查询到的网页内容(全部)</span> details = sess.get(href,headers=afterLogin_headers,timeout=10<span>) details.raise_for_status() details.encoding </span>= <span>"</span><span>utf-8</span><span>"</span> <span>#</span><span>linux utf-8</span> details_soup = BeautifulSoup(details.text,features=<span>"</span><span>html.parser</span><span>"</span><span>) message </span>=<span> details_soup.text time.sleep(</span>2<span>) </span><span>return</span> message
上面的代码代表着执行了两个步骤。
- ①查询某公司
- ②点击进入第一位搜索结果的新网站,并返回该网址的文本内容。
3、将获取到的文本进行文本特殊化处理,并将其汇总成一个dataframe,方便后面保存为csv
<span>import</span><span> pandas as pd </span><span>def</span><span> message_to_df(message,company): list_companys </span>=<span> [] Registration_status </span>=<span> [] Date_of_Establishment </span>=<span> [] registered_capital </span>=<span> [] contributed_capital </span>=<span> [] Approved_date </span>=<span> [] Unified_social_credit_code </span>=<span> [] Organization_Code </span>=<span> [] companyNo </span>=<span> [] Taxpayer_Identification_Number </span>=<span> [] sub_Industry </span>=<span> [] enterprise_type </span>=<span> [] Business_Term </span>=<span> [] Registration_Authority </span>=<span> [] staff_size </span>=<span> [] Number_of_participants </span>=<span> [] sub_area </span>=<span> [] company_adress </span>=<span> [] Business_Scope </span>=<span> [] list_companys.append(company) Registration_status.append(message.split(</span><span>"</span><span>登记状态</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].split(<span>"</span><span>成立日期</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Date_of_Establishment.append(message.split(</span><span>"</span><span>成立日期</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) registered_capital.append(message.split(</span><span>"</span><span>注册资本</span><span>"</span>)[1].split(<span>"</span><span>人民币</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) contributed_capital.append(message.split(</span><span>"</span><span>实缴资本</span><span>"</span>)[1].split(<span>"</span><span>人民币</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Approved_date.append(message.split(</span><span>"</span><span>核准日期</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) </span><span>try</span><span>: credit </span>= message.split(<span>"</span><span>统一社会信用代码</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>) Unified_social_credit_code.append(credit) </span><span>except</span><span>: credit </span>= message.split(<span>"</span><span>统一社会信用代码</span><span>"</span>)[3].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>) Unified_social_credit_code.append(credit) Organization_Code.append(message.split(</span><span>"</span><span>组织机构代码</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) companyNo.append(message.split(</span><span>"</span><span>工商注册号</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Taxpayer_Identification_Number.append(message.split(</span><span>"</span><span>纳税人识别号</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) </span><span>try</span><span>: sub </span>= message.split(<span>"</span><span>所属行业</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>) sub_Industry.append(sub) </span><span>except</span><span>: sub </span>= message.split(<span>"</span><span>所属行业</span><span>"</span>)[1].split(<span>"</span><span>为</span><span>"</span>)[1].split(<span>"</span><span>,</span><span>"</span><span>)[0] sub_Industry.append(sub) enterprise_type.append(message.split(</span><span>"</span><span>企业类型</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Business_Term.append(message.split(</span><span>"</span><span>营业期限</span><span>"</span>)[1].split(<span>"</span><span>登记机关</span><span>"</span>)[0].split(<span>"</span><span> </span><span>"</span>)[-1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Registration_Authority.append(message.split(</span><span>"</span><span>登记机关</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) staff_size.append(message.split(</span><span>"</span><span>人员规模</span><span>"</span>)[1].split(<span>"</span><span>人</span><span>"</span>)[0].split(<span>"</span><span> </span><span>"</span>)[-1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Number_of_participants.append(message.split(</span><span>"</span><span>参保人数</span><span>"</span>)[1].split(<span>"</span><span>所属地区</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span>).split(<span>"</span><span> </span><span>"</span>)[2<span>]) sub_area.append(message.split(</span><span>"</span><span>所属地区</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) </span><span>try</span><span>: adress </span>= message.split(<span>"</span><span>经营范围</span><span>"</span>)[0].split(<span>"</span><span>企业地址</span><span>"</span>)[1].split(<span>"</span><span>查看地图</span><span>"</span>)[0].split(<span>"</span><span> </span><span>"</span>)[2].replace(<span>"</span> <span>"</span>,<span>""</span><span>) company_adress.append(adress) </span><span>except</span><span>: adress </span>= message.split(<span>"</span><span>经营范围</span><span>"</span>)[1].split(<span>"</span><span>企业地址</span><span>"</span>)[1<span>].split()[0] company_adress.append(adress) Business_Scope.append(message.split(</span><span>"</span><span>经营范围</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) df </span>= pd.DataFrame({<span>"</span><span>公司</span><span>"</span><span>:company, </span><span>"</span><span>登记状态</span><span>"</span><span>:Registration_status, </span><span>"</span><span>成立日期</span><span>"</span><span>:Date_of_Establishment, </span><span>"</span><span>注册资本</span><span>"</span><span>:registered_capital, </span><span>"</span><span>实缴资本</span><span>"</span><span>:contributed_capital, </span><span>"</span><span>核准日期</span><span>"</span><span>:Approved_date, </span><span>"</span><span>统一社会信用代码</span><span>"</span><span>:Unified_social_credit_code, </span><span>"</span><span>组织机构代码</span><span>"</span><span>:Organization_Code, </span><span>"</span><span>工商注册号</span><span>"</span><span>:companyNo, </span><span>"</span><span>纳税人识别号</span><span>"</span><span>:Taxpayer_Identification_Number, </span><span>"</span><span>所属行业</span><span>"</span><span>:sub_Industry, </span><span>"</span><span>企业类型</span><span>"</span><span>:enterprise_type, </span><span>"</span><span>营业期限</span><span>"</span><span>:Business_Term, </span><span>"</span><span>登记机关</span><span>"</span><span>:Registration_Authority, </span><span>"</span><span>人员规模</span><span>"</span><span>:staff_size, </span><span>"</span><span>参保人数</span><span>"</span><span>:Number_of_participants, </span><span>"</span><span>所属地区</span><span>"</span><span>:sub_area, </span><span>"</span><span>企业地址</span><span>"</span><span>:company_adress, </span><span>"</span><span>经营范围</span><span>"</span><span>:Business_Scope}) </span><span>return</span> df
这段代码是对获取到的文本内容进行文本识别处理,只能处理大部分的内容,可能会有极个别的是空值,大家有兴趣可以自己重写。
4、输入公司名称
这里只是写个案例,所以随便写了个列表,一般跑自己代码的是读取自己的csv文件关于公司名称的那一列,然后转为列表)
<span>#</span><span> 测试所用</span> companys = [<span>"</span><span>深圳市腾讯计算机系统有限公司</span><span>"</span>,<span>"</span><span>阿里巴巴(中国)有限公司</span><span>"</span><span>] </span><span>#</span><span> 实际所用</span><span> #</span><span> df_companys = pd.read_csv("自己目录的绝对路径/某某.csv")</span><span> #</span><span> companys = df_companys["公司名称"].tolist()</span>
5、最后执行此代码,查询companys列表中所有公司名称的详细信息并保存为csv。
<span>for</span> company <span>in</span><span> companys: </span><span>try</span><span>: messages </span>=<span> get_company_message(company) </span><span>except</span><span>: </span><span>pass</span> <span>else</span><span>: df </span>=<span> message_to_df(messages,company) </span><span>if</span>(company==<span>companys[0]): df.to_csv(</span><span>"</span><span>自己目录的绝对路径/某某.csv</span><span>"</span>,index=False,header=<span>True) </span><span>else</span><span>: df.to_csv(</span><span>"</span><span>自己目录的绝对路径/某某.csv</span><span>"</span>,mode=<span>"</span><span>a+</span><span>"</span>,index=False,header=<span>False) time.sleep(</span>1)
至此,就可以得到这两家公司的一些详细信息。
ps:如果大家在 soup.find_all(‘a’,{‘class’: ‘title’})[0].get(‘href’)这里遇到点错误,可能是天眼查那边更新了网页代码,大家可以根据这个操作来更新代码。
①按F12进入开发者调试页面
②就点击“深圳市腾讯计算机系统有限公司”这个点击操作而言,右击,然后选择“检查”选项,然后就可以看到开发者调试页面那里也自动跳转到了相关的位置。
③我们可以看到,这是一个a标签,class为title的html代码,所以,如果报错,可根据这个操作更换。比如,class改为了company_title,那代码也可对应的改为:soup.find_all(‘a’,{‘class’: ‘company_title’})[0].get(‘href’)
最后,大家需要注意的是,爬取的时候需要适当的设置一下睡眠时间,不然会被检测到是爬虫机器人在操作,可能会弹出弹窗让你验证,这样会导致循环被中断。第二个就是某个时间段爬取量尽量不要太大,不然也是会被检测到的。
此处贴上完整代码,大家可参考着学习BeautifuSoup的妙用哦。
<span>from</span> bs4 <span>import</span><span> BeautifulSoup </span><span>import</span><span> requests </span><span>import</span><span> time </span><span>#</span><span> 保持会话</span><span> #</span><span> 新建一个session对象</span> sess =<span> requests.session() </span><span>#</span><span> 添加headers(header为自己登录的企查查网址,输入账号密码登录之后所显示的header,此代码的上方介绍了获取方法)</span> afterLogin_headers = {<span>"</span><span>User-Agent</span><span>"</span>: <span>"</span><span>此代码上方介绍了获取的方法</span><span>"</span><span>} </span><span>#</span><span> post请求(代表着登录行为,登录一次即可保存,方便后面执行查询指令)</span> login = {<span>"</span><span>user</span><span>"</span>:<span>"</span><span>自己注册的账号</span><span>"</span>,<span>"</span><span>password</span><span>"</span>:<span>"</span><span>密码</span><span>"</span><span>} sess.post(</span><span>"</span><span>https://www.qcc.com</span><span>"</span>,data=login,headers=<span>afterLogin_headers) </span><span>def</span><span> get_company_message(company): </span><span>#</span><span> 获取查询到的网页内容(全部)</span> search = sess.get(<span>"</span><span>https://www.qcc.com/search?key={}</span><span>"</span>.format(company),headers=afterLogin_headers,timeout=10<span>) search.raise_for_status() search.encoding </span>= <span>"</span><span>utf-8</span><span>"</span> <span>#</span><span>linux utf-8</span> soup = BeautifulSoup(search.text,features=<span>"</span><span>html.parser</span><span>"</span><span>) href </span>= soup.find_all(<span>"</span><span>a</span><span>"</span>,{<span>"</span><span>class</span><span>"</span>: <span>"</span><span>title</span><span>"</span>})[0].get(<span>"</span><span>href</span><span>"</span><span>) time.sleep(</span>4<span>) </span><span>#</span><span> 获取查询到的网页内容(全部)</span> details = sess.get(href,headers=afterLogin_headers,timeout=10<span>) details.raise_for_status() details.encoding </span>= <span>"</span><span>utf-8</span><span>"</span> <span>#</span><span>linux utf-8</span> details_soup = BeautifulSoup(details.text,features=<span>"</span><span>html.parser</span><span>"</span><span>) message </span>=<span> details_soup.text time.sleep(</span>2<span>) </span><span>return</span><span> message </span><span>import</span><span> pandas as pd </span><span>def</span><span> message_to_df(message,company): list_companys </span>=<span> [] Registration_status </span>=<span> [] Date_of_Establishment </span>=<span> [] registered_capital </span>=<span> [] contributed_capital </span>=<span> [] Approved_date </span>=<span> [] Unified_social_credit_code </span>=<span> [] Organization_Code </span>=<span> [] companyNo </span>=<span> [] Taxpayer_Identification_Number </span>=<span> [] sub_Industry </span>=<span> [] enterprise_type </span>=<span> [] Business_Term </span>=<span> [] Registration_Authority </span>=<span> [] staff_size </span>=<span> [] Number_of_participants </span>=<span> [] sub_area </span>=<span> [] company_adress </span>=<span> [] Business_Scope </span>=<span> [] list_companys.append(company) Registration_status.append(message.split(</span><span>"</span><span>登记状态</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].split(<span>"</span><span>成立日期</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Date_of_Establishment.append(message.split(</span><span>"</span><span>成立日期</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) registered_capital.append(message.split(</span><span>"</span><span>注册资本</span><span>"</span>)[1].split(<span>"</span><span>人民币</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) contributed_capital.append(message.split(</span><span>"</span><span>实缴资本</span><span>"</span>)[1].split(<span>"</span><span>人民币</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Approved_date.append(message.split(</span><span>"</span><span>核准日期</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) </span><span>try</span><span>: credit </span>= message.split(<span>"</span><span>统一社会信用代码</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>) Unified_social_credit_code.append(credit) </span><span>except</span><span>: credit </span>= message.split(<span>"</span><span>统一社会信用代码</span><span>"</span>)[3].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>) Unified_social_credit_code.append(credit) Organization_Code.append(message.split(</span><span>"</span><span>组织机构代码</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) companyNo.append(message.split(</span><span>"</span><span>工商注册号</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Taxpayer_Identification_Number.append(message.split(</span><span>"</span><span>纳税人识别号</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) </span><span>try</span><span>: sub </span>= message.split(<span>"</span><span>所属行业</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>) sub_Industry.append(sub) </span><span>except</span><span>: sub </span>= message.split(<span>"</span><span>所属行业</span><span>"</span>)[1].split(<span>"</span><span>为</span><span>"</span>)[1].split(<span>"</span><span>,</span><span>"</span><span>)[0] sub_Industry.append(sub) enterprise_type.append(message.split(</span><span>"</span><span>企业类型</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Business_Term.append(message.split(</span><span>"</span><span>营业期限</span><span>"</span>)[1].split(<span>"</span><span>登记机关</span><span>"</span>)[0].split(<span>"</span><span> </span><span>"</span>)[-1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Registration_Authority.append(message.split(</span><span>"</span><span>登记机关</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) staff_size.append(message.split(</span><span>"</span><span>人员规模</span><span>"</span>)[1].split(<span>"</span><span>人</span><span>"</span>)[0].split(<span>"</span><span> </span><span>"</span>)[-1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) Number_of_participants.append(message.split(</span><span>"</span><span>参保人数</span><span>"</span>)[1].split(<span>"</span><span>所属地区</span><span>"</span>)[0].replace(<span>"</span> <span>"</span>,<span>""</span>).split(<span>"</span><span> </span><span>"</span>)[2<span>]) sub_area.append(message.split(</span><span>"</span><span>所属地区</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) </span><span>try</span><span>: adress </span>= message.split(<span>"</span><span>经营范围</span><span>"</span>)[0].split(<span>"</span><span>企业地址</span><span>"</span>)[1].split(<span>"</span><span>查看地图</span><span>"</span>)[0].split(<span>"</span><span> </span><span>"</span>)[2].replace(<span>"</span> <span>"</span>,<span>""</span><span>) company_adress.append(adress) </span><span>except</span><span>: adress </span>= message.split(<span>"</span><span>经营范围</span><span>"</span>)[1].split(<span>"</span><span>企业地址</span><span>"</span>)[1<span>].split()[0] company_adress.append(adress) Business_Scope.append(message.split(</span><span>"</span><span>经营范围</span><span>"</span>)[1].split(<span>"</span><span> </span><span>"</span>)[1].replace(<span>"</span> <span>"</span>,<span>""</span><span>)) df </span>= pd.DataFrame({<span>"</span><span>公司</span><span>"</span><span>:company, </span><span>"</span><span>登记状态</span><span>"</span><span>:Registration_status, </span><span>"</span><span>成立日期</span><span>"</span><span>:Date_of_Establishment, </span><span>"</span><span>注册资本</span><span>"</span><span>:registered_capital, </span><span>"</span><span>实缴资本</span><span>"</span><span>:contributed_capital, </span><span>"</span><span>核准日期</span><span>"</span><span>:Approved_date, </span><span>"</span><span>统一社会信用代码</span><span>"</span><span>:Unified_social_credit_code, </span><span>"</span><span>组织机构代码</span><span>"</span><span>:Organization_Code, </span><span>"</span><span>工商注册号</span><span>"</span><span>:companyNo, </span><span>"</span><span>纳税人识别号</span><span>"</span><span>:Taxpayer_Identification_Number, </span><span>"</span><span>所属行业</span><span>"</span><span>:sub_Industry, </span><span>"</span><span>企业类型</span><span>"</span><span>:enterprise_type, </span><span>"</span><span>营业期限</span><span>"</span><span>:Business_Term, </span><span>"</span><span>登记机关</span><span>"</span><span>:Registration_Authority, </span><span>"</span><span>人员规模</span><span>"</span><span>:staff_size, </span><span>"</span><span>参保人数</span><span>"</span><span>:Number_of_participants, </span><span>"</span><span>所属地区</span><span>"</span><span>:sub_area, </span><span>"</span><span>企业地址</span><span>"</span><span>:company_adress, </span><span>"</span><span>经营范围</span><span>"</span><span>:Business_Scope}) </span><span>return</span><span> df </span><span>#</span><span> 测试所用</span> companys = [<span>"</span><span>深圳市腾讯计算机系统有限公司</span><span>"</span>,<span>"</span><span>阿里巴巴(中国)有限公司</span><span>"</span><span>] </span><span>#</span><span> 实际所用</span><span> #</span><span> df_companys = pd.read_csv("自己目录的绝对路径/某某.csv")</span><span> #</span><span> companys = df_companys["公司名称"].tolist()</span> <span>for</span> company <span>in</span><span> companys: </span><span>try</span><span>: messages </span>=<span> get_company_message(company) </span><span>except</span><span>: </span><span>pass</span> <span>else</span><span>: df </span>=<span> message_to_df(messages,company) </span><span>if</span>(company==<span>companys[0]): df.to_csv(</span><span>"</span><span>自己目录的绝对路径/某某.csv</span><span>"</span>,index=False,header=<span>True) </span><span>else</span><span>: df.to_csv(</span><span>"</span><span>自己目录的绝对路径/某某.csv</span><span>"</span>,mode=<span>"</span><span>a+</span><span>"</span>,index=False,header=<span>False) time.sleep(</span>1)
文章源
https://www.gaodaima.com/qq_40694671/article/details/110671900