正则表达式 – PHP中，如何用正则提取指定的html容器

文章目录[隐藏]

回复内容：

<code><div class="baby">        <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="nonono"></div></code>

以上代码，我需要提取所有class为baby的容器内容，结构不一定是这样。也就是说，我需要提取某个容器的内容。谢谢。

回复内容：

<code><div class="baby">        <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="nonono"></div></code>

以上代码，我需要提取所有class为baby的容器内容，结构不一定是这样。也就是说，我需要提取某个容器的内容。谢谢。

也不知道为什么大家都想用正则来提取 DOM 树 … 正则分明就不是用来做这个的 …

在想要这么做的时候 … 你看到正则满心的委屈了么 ..?

让千里马拉磨终归是不好 … 如果要提取 DOM 树 … 正确的方式如下 …

<code><?php/* in this situation you need DOM ... */$doc = new DOMDocument();/* load your html here ... */$doc->loadHTML( <<<HTML_SECTION<div class="baby">        <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="nonono"></div>HTML_SECTION);/* make a result array ... */$result = [];/* go through all nodes which have class="baby" ... */foreach( ( new DOMXPath( $doc ) )->query( '//*<div style="color:transparent">本&文来源gaodai^.ma#com搞#代!码网</div><strong>搞gaodaima代码</strong>[@class="baby"]' )     as $element )    /* just push it into the result ... */    $result[] = $doc->saveHTML( $element );/* and print the result out ... */print_r( $result );</code>

关于 DOM 模块的其他我在之前的这个答案里都写的很清楚 … 这里就不再赘述了 …

如果你有兴趣的话可以看看 …

至于标题的问题 … 对于 90% 以上的情况 … 答案都是 使用正则做不到 …

如果你特别坚持非要用正则不可 … 并且你要处理的文档和你例子中的一样 … 那有方法如下 …

<code><?php/* crying regex matcher ... */preg_match_all( '(^(\s*).*^\\1)ism',<<<HTML_SECTION<div class="baby">        <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="baby">    <div class="another-shit">        <h1>contont</h1>        <p>ppppppp</p>    </div>    my link    <div class="lie"></div></div><div class="nonono"></div>HTML_SECTION, $result_tmp );/* only the first element we need ... */print_r( array_shift( $result_tmp ) );</code>

这种方法仅限使用在良好格式化的 html 文档上 … 由缩进来判断标签的对应关系 …

如果是一个比较混乱的 html 文档 … 那么正则对于提取 DOM 这件事来说无能为力 …

恩恩 … 就是这样啦 …

尝试一下phpQuery吧，https://code.google.com/p/phpquery/
介绍可以看看http://www.cnblogs.com/in-loading/archive/2012/04/11/2442697.html

<code>$preg = '/\<div class\=\"baby\"\>(.*?)\<\/div\>/s';preg_match_all($preg, $html, $match);</code>

不知道这样可以么~

搞代码网（gaodaima.com）提供的所有资源部分来自互联网，如果有侵犯您的版权或其他权益，请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected]‍，我们会在看到邮件的第一时间内为您处理，或直接联系QQ：872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接：正则表达式 – PHP中，如何用正则提取指定的html容器

回复内容：

Hi，您需要填写昵称和邮箱！