php字符串处理之全角半角转换
半角全角的处理是字符串处理的常见问题,本文尝试为大家提供一个思路。
一、概念
全角字符unicode编码从65281~65374 (十六进制 0xFF01 ~ 0xFF5E)
半角字符unicode编码从33~126 (十六进制 0x21~ 0x7E)
空格比较特殊,全角为 12288(0x3000),半角为 32 (0x20)
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的
所以可以直接通过用+-法来处理非空格数据,对空格单独处理
二、实现思路
1. 找到目标unicode的字符,可以使用正则表达式解决
2. 修改unicode编码
三、实现
1. 首先是两个unicode与字符的转换函数:
<span style="color: #008080"> 1</span> <span style="color: #008000">/*</span><span style="color: #008000">*</span><span style="color: #008080"> 2</span> <span style="color: #008000"> * 将unicode转换成字符</span><span style="color: #008080"> 3</span> <span style="color: #008000"> * @param int $unicode</span><span style="color: #008080"> 4</span> <span style="color: #008000"> * @return string UTF-8字符</span><span style="color: #008080"> 5</span> <span style="color: #008000"> *</span><span style="color: #008000">*/</span><span style="color: #008080"> 6</span> <span style="color: #0000ff">function</span> unicode2Char(<span style="color: #800080">$unicode</span><span style="color: #000000">){</span><span style="color: #008080"> 7</span> <span style="color: #0000ff">if</span>(<span style="color: #800080">$unicode</span> < 128) <span style="color: #0000ff">return</span> <span style="color: #008080">chr</span>(<span style="color: #800080">$unicode</span><span style="color: #000000">);</span><span style="color: #008080"> 8</span> <span style="color: #0000ff">if</span>(<span style="color: #800080">$unicode</span> < 2048) <span style="color: #0000ff">return</span> <span style="color: #008080">chr</span>((<span style="color: #800080">$unicode</span> >> 6) + 192) .<span style="color: #008080"> 9</span> <span style="color: #008080">chr</span>((<span style="color: #800080">$unicode</span> & 63) + 128<span style="color: #000000">);</span><span style="color: #008080">10</span> <span style="color: #0000ff">if</span>(<span style="color: #800080">$unicode</span> < 65536) <span style="color: #0000ff">return</span> <span style="color: #008080">chr</span>((<span style="color: #800080">$unicode</span> >> 12) + 224) .<span style="color: #008080">11</span> <span style="color: #008080">chr</span>(((<span style="color: #800080">$unicode</span> >> 6) & 63) + 128) .<span style="color: #008080">12</span> <span style="color: #008080">chr</span>((<span style="color: #800080">$unicode</span> & 63) + 128<span style="color: #000000">);</span><span style="color: #008080">13</span> <span style="color: #0000ff">if</span>(<span style="color: #800080">$unicode</span> < 2097152) <span style="color: #0000ff">return</span> <span style="color: #008080">chr</span>((<span style="color: #800080">$unicode</span> >> 18) + 240) .<span style="color: #008080">14</span> <span style="color: #008080">chr</span>(((<span style="color: #800080">$unicode</span> >> 12) & 63) + 128) .<span style="color: #008080">15</span> <span style="color: #008080">chr</span>(((<span style="color: #800080">$unicode</span> >> 6) & 63) + 128) .<span style="color: #008080">16</span> <span style="color: #008080">chr</span>((<span style="color: #800080">$unicode</span> & 63) + 128<span style="color: #000000">);</span><span style="color: #008080">17</span> <span style="color: #0000ff">return</span> <span style="color: #0000ff">false</span><span style="color: #000000">;</span><span style="color: #008080">18</span> <span style="color: #000000"> }</span><span style="color: #008080">19</span> <span style="color: #008080">20</span> <span style="color: #008000">/*</span><span style="color: #008000">*</span><span style="color: #008080">21</span> <span style="color: #008000"> * 将字符转换成unicode</span><span style="color: #008080">22</span> <span style="color: #008000"> * @param string $char 必须是UTF-8字符</span><span style="color: #008080">23</span> <span style="color: #008000"> * @return int</span><span style="color: #008080">24</span> <span style="color: #008000"> *</span><span style="color: #008000">*/</span><span style="color: #008080">25</span> <span style="color: #0000ff">function</span> char2Unicode(<span style="color: #800080">$char</span><span style="color: #000000">){</span><span style="color: #008080">26</span> <span style="color: #0000ff">switch</span> (<span style="color: #008080">strlen</span>(<span style="color: #800080">$char</span><span style="color: #000000">)){</span><span style="color: #008080">27</span> <span style="color: #0000ff">case</span> 1 : <span style="color: #0000ff">return</span> <span style="color: #008080">ord</span>(<span style="color: #800080">$char</span><span style="color: #000000">);</span><span style="color: #008080">28</span> <span style="color: #0000ff">case</span> 2 : <span style="color: #0000ff">return</span> (<span style="color: #008080">ord</span>(<span style="color: #800080">$char</span>{1}) & 63) |<span style="color: #008080">29</span> ((<span style="color: #008080">ord</span>(<span style="color: #800080">$char</span>{0}) & 31) << 6<span style="color: #000000">);</span><span style="color: #008080">30</span> <p>5本文来源gao!daima.com搞$代!码#网#</p><pre>搞代gaodaima码
case 3 : return (ord($char{2}) & 63) |31 ((ord($char{1}) & 63) << 6) |32 ((ord($char{0}) & 15) << 12);33 case 4 : return (ord($char{3}) & 63) |34 ((ord($char{2}) & 63) << 6) |35 ((ord($char{1}) & 63) << 12) |36 ((ord($char{0}) & 7) << 18);37 default :38 trigger_error(‘Character is not UTF-8!’, E_USER_WARNING);39 return false;40 }41 }