preg replace - PHP preg_split Input by <br>, <br/>, <p> into Separate Paragraphs -
i curling page ill-formed code. there particular snippet of page trying parse paragraphs. input snippet may divided <p>
, </p>
or separated 1 or more <br>
or <br/>
tags. in cases there 2 <br>
tags after another, don't want 2 separate pargaraphs.
my current code i'm trying parse/display is
$paragraphs = preg_split('/(<\s*p\s*\/?>)|(<\s*br\s*\/?>)|(\s\s+)|(<\s*\/p\s*\/?>)/', $article, -1, preg_split_no_empty); $paragraphcount = count($paragraphs); for($x = 1; $x <= $paragraphcount; $x++ ) { echo "<p>".$paragraphs[$x-1]."</p>"; }
however, not working expected. different inputs/outputs follows:
input 1: first part </p> <p> second part </p> <p> third part </p> <p> fourth part <br/>
output 1: <p>first part </p><p> </p><p>second part </p><p> </p><p> third part </p><p> </p><p>fourth part</p><p> </p>
my code is parsing input paragraphs; however, it's adding paragraphs containing space.
any appreciated.
input utf-8 if makes difference.
here solution preg_replace
:
$article = "first part </p> <p> second part </p> <p> third part </p> <p> fourth part <br/> <br> fifth part"; $healed = substr( preg_replace('/(\s*<(\/?p|br)\s*\/?>\s*)+/u', "</p><p>", "<p>$article<p>"), 4, -3);
it first wraps string in <p>
, replaces (repetitions of) variants of breaks </p><p>
, remove starting </p>
, ending <p>
. note not produce (intermediate) array, final string.
echo $healed;
outputs:
<p>first part</p><p>second part</p><p>third part</p><p>fourth part</p><p>fifth part</p>
note need u
modifier @ end of regular expression utf-8 support.
if on other hand need paragraphs in array, preg_split
better suited (using same regular expression):
$paragraphs = preg_split('/(\s*<(\/?p|br)\s*\/?>\s*)+/u', $article, null, preg_split_no_empty);
if write:
foreach ($paragraphs $paragraph) { echo "$paragraph\n"; }
you get:
first part second part third part fourth part fifth part
Comments
Post a Comment