re模块
1、什么是正则表达式?什么是re模块?
(1)正则表达式:
正则表达式是一门独立的技术,人和语言都可以使用正则表达式,
正则表达式是由一堆特殊的字符组合而来的
① 元字符:
^:代表开头
$:代表结束
|:或者的意思
():可以获取一个值,判断是否是13或14等
{9}:需要获取9个值
[]:分组限制取值范围,[0-9]:限制只能获取0-9的某一个字符
参考图片:点我查看
<span>import</span><span> <a href="https://www.gaodaima.com/tag/re" title="查看更多关于re的文章" target="_blank">re</a> </span><span>#</span><span> w:匹配字母、数字、下划线</span> <span>print</span>(re.findall(<span>"</span><span>w</span><span>"</span>, <span>"</span><span>hello 123_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:["h", "e", "l", "l", "o", "1", "2", "3", "_"]</span><span> #</span><span> W:匹配非字母、数字、下划线</span> <span>print</span>(re.findall(<span>"</span><span>W</span><span>"</span>, <span>"</span><span>hello 123_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:[" ", " ", "*", "/", "-", "="]</span><span> #</span><span> s:匹配任意空白字符</span> <span>print</span>(re.findall(<span>"</span><span>s</span><span>"</span>, <span>"</span><span>hell o 12 3_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:[" ", " ", " ", " "]</span><span> #</span><span> S:匹配任意非空字符</span> <span>print</span>(re.findall(<span>"</span><span>S</span><span>"</span>, <span>"</span><span>hell o 12 3_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:["h", "e", "l", "l", "o", "1", "2", "3", "_", "*", "/", "-", "="]</span><span> #</span><span> d:匹配任意数字,等价于[0-9]</span> <span>print</span>(re.findall(<span>"</span><span>d</span><span>"</span>, <span>"</span><span>hell o 12 3_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:["1", "2", "3"]</span><span> #</span><span> D:匹配任意非数字</span> <span>print</span>(re.findall(<span>"</span><span>D</span><span>"</span>, <span>"</span><span>hell o 12 3_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:["h", "e", "l", "l", " ", "o", " ", " ", "_", " ", "*", "/", "-", "="]</span><span> #</span><span> :匹配一个换行符</span> <span>print</span>(re.findall(<span>"</span><span> </span><span>"</span>, <span>"</span><span>hell o 12 3_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:[" "]</span><span> #</span><span> :匹配一个制表符</span> <span>print</span>(re.findall(<span>"</span><span> </span><span>"</span>, <span>"</span><span>hell o 12 3_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:[" "]</span><span> #</span><span> 匹配特定字符</span> <span>print</span>(re.findall(<span>"</span><span>l</span><span>"</span>, <span>"</span><span>hell o 12 3_ */-=</span><span>"</span>)) <span>#</span><span> 执行结果:["l", "l"]</span> <span>print</span>(re.findall(<span>"</span><span>yangy</span><span>"</span>, <span>"</span><span>yangy my name is yangy, oh yangy is my big baby</span><span>"</span>)) <span>#</span><span> 执行结果:["yangy", "yangy", "yangy"]</span><span> #</span><span> ^:以什么开头</span> <span>print</span>(re.findall(<span>"</span><span>^yangy</span><span>"</span>, <span>"</span><span>yangy my name is yangy, oh yangy is my big baby</span><span>"</span>)) <span>#</span><span> 执行结果:["yangy"]</span><span> #</span><span> $:以什么结尾</span> <span>print</span>(re.findall(<span>"</span><span>yangy$</span><span>"</span>, <span>"</span><span>yangy my name is yangy, oh yangy is my big baby, yangy</span><span>"</span>)) <span>#</span><span> 执行结果:["yangy"]</span>
www#gaodaima.com来源gaodai#ma#com搞@代~码网搞代码
② 字符组:
[0-9]:可以匹配到一个0-9的字符
[9-0]:报错,必须从大到小
[a-z]:从小写的a-z
[A-Z]:从大写的A-Z
[z-A]:错误,只能从小到大,根据ASCII表来匹配
[A-z]:从大写的A到小写的z
注意:顺序必须要按照ASCII码数值的顺序来编写
<span>import</span><span> re res </span>= re.match(<span>"</span><span>[A-Za-z0-9]{4}</span><span>"</span>, <span>"</span><span>Yang9527</span><span>"</span><span>) </span><span>print</span>(res) <span>#</span><span> 默认只获取一个值,{}里面写多少就是匹配多少个值</span> <span>if</span><span> res: </span><span>print</span>(<span>"</span><span>匹配成功</span><span>"</span>)
执行结果:
<re.Match object; span=(0, 4), match=<span>"</span><span>Yang</span><span>"</span>><span> 匹配成功</span>
③ 组合使用
wW:匹配字母、数字、下划线与非字母数字、下划线,匹配所有
dD:无论是数字或者非数字都可以匹配
:table
:换行
:匹配单词结尾
^:startswith 以什么开头
- “^”在外面使用:表示以什么开头
- [^]用在[]里面:表示取反的意思
$:endswith 以什么结尾
^$:配合使用叫做精准匹配,如何限制一个字符串的长度或者内容
|:或. ab|abc如果第一个条件成立,则abc不会执行,怎么解决,针对这种情况把长的写在前面就好了,一定要将长的放在前面
[^…]:表示取反的意思
[^ab]:代表只取ab以外的字符
[^a-z]:取a-z以外的字符
(?:):表示非捕获分组,和捕获分组的唯一区别在于,非捕获分组匹配的值不会保存起来
<span>#</span><span> 将res或者y保留与compan进行拼接</span> <span>print</span>(re.findall(<span>"</span><span>compan(?:ies|y)</span><span>"</span>,<span>"</span><span>Too many companies have gone bankrupt, and the next one is my company</span><span>"</span>))
执行结果:
[<span>"</span><span>companies</span><span>"</span>, <span>"</span><span>company</span><span>"</span>]
补充:贪婪模式和非贪婪模式
贪婪模式:.* 往后一直匹配,匹配到最后一个符合条件的元素结束
非贪婪模式:.*? 往后匹配到第一个符合条件的元素结束,往后继续匹配 ——> 可以拿来过滤数据
<span>print</span>(re.findall(<span>"</span><span>a(.*)c</span><span>"</span>,<span>"</span><span>akjfnvkjfcdnjkngasdfcfjsknasfc</span><span>"</span><span>)) </span><span>print</span>(re.findall(<span>"</span><span>a(.*?)c</span><span>"</span>,<span>"</span><span>akjfnvkjfcdnjkngasdfcfjsknasfc</span><span>"</span>))
执行结果:
[<span>"</span><span>kjfnvkjfcdnjkngasdfcfjsknasf</span><span>"</span>] <span>#</span><span> 贪婪模式</span> [<span>"</span><span>kjfnvkjf</span><span>"</span>, <span>"</span><span>sdf</span><span>"</span>, <span>"</span><span>sf</span><span>"</span>] <span>#</span><span> 非贪婪模式</span>
(2)re模块:
在python中,若想要使用正则表达式,必须通过re模块来实现
import re
2、为什么要使用正则表达式?
比如要获取一堆字符串中的某些字符,正则表达式可以过滤并提取出想要的字符数据
应用场景:
(1)爬虫
(2)数据分析过滤数据
(3)用户名密码、手机认证:检测用户输入的合法性
3、re模块中三种比较重要的方法
(1)findall(),返回列表:
可以匹配所有字符,拿到返回结果,返回结果是一个列表。
<span>import</span><span> re str1 </span>= <span>"</span><span>bear blue hung</span><span>"</span><span> res1 </span>= re.findall(<span>"</span><span>[a-z]{4}</span><span>"</span><span>, str1) </span><span>print</span>(res1)
执行结果:
[<span>"</span><span>bear</span><span>"</span>, <span>"</span><span>blue</span><span>"</span>, <span>"</span><span>hung</span><span>"</span>]
(2)search(),返回一个对象,使用.group()取出:
在匹配一个字符成功拿到结果后结束程序,不继续往后匹配
<span>import</span><span> re str1 </span>= <span>"</span><span>bear blue hung</span><span>"</span><span> res2 </span>= re.search(<span>"</span><span>[a-z]{4}</span><span>"</span><span>, str1) </span><span>print</span>(res2.group())
执行结果:
bear
(3)match(),返回对象,使用.group()取出:
从匹配字符的开头匹配,若开头不是想要的内容,则返回None
<span>import</span><span> re str1 </span>= <span>"</span><span>bear blue hung</span><span>"</span><span> res3 </span>= re.match(<span>"</span><span>[a-z]{4}</span><span>"</span><span>, str1) </span><span>print</span>(res3.group())
执行结果:
bear
例:使用re模块校验手机号码的合法性
需求:11位,开头是13/14/15/17/18/19
① 不使用re校验
<span>import</span><span> re </span><span>while</span><span> True: user_number </span>= input(<span>"</span><span>请输入你的手机号:</span><span>"</span><span>).strip() </span><span>if</span> len(user_number) == 11 <span>and</span><span> (user_number.startswith( </span><span>"</span><span>13</span><span>"</span><span> ) </span><span>or</span><span> user_number.startswith( </span><span>"</span><span>14</span><span>"</span><span> ) </span><span>or</span><span> user_number.startswith( </span><span>"</span><span>15</span><span>"</span><span> ) </span><span>or</span><span> user_number.startswith( </span><span>"</span><span>17</span><span>"</span><span> ) </span><span>or</span><span> user_number.startswith( </span><span>"</span><span>18</span><span>"</span><span> ) </span><span>or</span><span> user_number.startswith( </span><span>"</span><span>19</span><span>"</span><span> )): </span><span>print</span>(<span>"</span><span>手机号码合法!</span><span>"</span><span>) </span><span>break</span> <span>else</span><span>: </span><span>print</span>(<span>"</span><span>手机号码不合法</span><span>"</span>)
② 使用re校验
<span>import</span><span> re </span><span>while</span><span> True: user_number </span>= input(<span>"</span><span>请输入你的手机号:</span><span>"</span><span>).strip() </span><span>#</span><span> 参数1:正则表达式,参数2:需要过滤的字符串</span> <span>#</span><span> ^:代表开头</span> <span>#</span><span> $:代表结束</span> <span>#</span><span> |:或者的意思</span> <span>#</span><span> ():可以获取一个值,判断是否是13或14等</span> <span>#</span><span> {9}:需要获取9个值</span> <span>#</span><span> []:分组限制取值范围,[0-9]:限制只能获取0-9的某一个字符</span> <span>if</span> re.match(<span>"</span><span>^(13|14|15|17|18|19)[0-9]{9}$</span><span>"</span><span>, user_number): </span><span>print</span>(<span>"</span><span>手机号码合法!</span><span>"</span><span>) </span><span>break</span> <span>else</span><span>: </span><span>print</span>(<span>"</span><span>手机号码不合法</span><span>"</span>)
执行结果:
请输入你的手机号:12222222222<span> 手机号码不合法 请输入你的手机号:</span>13333333333<span> 手机号码合法!</span>