Photo of source code from this site

Fixing Shortcodes and Paragraph Tags in WordPress

Posted on:
October 9, 2015
Posted in:
Programming, Web Dev, WordPress

WordPress has a very handy function called wpautop() that inserts paragraphs and line breaks automatically. It’s not critical for WordPress to function, but it’s a major part of what makes it user friendly.

It causes problems however when you need to use a shortcode – WordPress will wrap the the shortcode in a paragraph if it’s on its own line. However, if your shortcode creates a block element (like a DIV), that will result in invalid code.

The best solution out there seems to be to build a list of shortcodes that shouldn’t be wrapped, then go through and clean them up. The problem is that shortcodes can take attributes – spotting them using a regular expression is pretty much out of the question.

The answer I came up with is to parse the text character by character like a compiler would. First, create a truly global variable that will hold the codes you want to scrub.

$GLOBALS["MLTS_fix_codes"] = array();
global $MLTS_fix_codes;

Next, define the function itself.

function shortcode_empty_paragraph_fix($content) {
	global $MLTS_fix_codes;				// a global array of shortcodes to check
	$caret = 				0;			// a caret to step through the content
	$inShortcode = 			false;		// whether or not the caret is currently inside a shortcode
	$inQuote = 				false;		// whether or not the caret is currently inside an attribute
	$currentCode = 			null;		// the current code (i.e. [foo] -> "foo")
	$codeStart = 			0;			// where the current code started
	$quoteMark = 			"";			// the quotation mark used (" or ')
	$codeInArray = 			false;		// whether or not the code itself is one we're checking for

	// step through the entire content
	while ($caret < strlen($content)) {
		$codeJustFound = false;
		// if we're in the shortcode and also in a quote
		if ($inShortcode && $inQuote) {
			// if this character is a quotation mark, exit quote
			if ($content[$caret]==$quoteMark) {
				$inQuote = false;
				$quoteMark = "";
			// if this character is a backslash, just advance
			} else if ($content[$caret]=="\\") {
				$caret ++;

		// if we're in the shortcode but entering a quote
		} else if ($inShortcode && ($content[$caret]=="\"" || $content[$caret]=="'")) {
			$quoteMark = $content[$caret];
			$inQuote = true;

		// if we're in the shortcode
		} else if ($inShortcode) {
			// if we don't have the current code
			if ($currentCode == null) {
				// if this character gets us the code
				if ($content[$caret]==" " || $content[$caret]=="]") {
					// store code
					$currentCode = substr($content, $codeStart+1, $caret-$codeStart-1);
					// clean up /code => code
					if ($currentCode[0]=="/") {
						$currentCode = substr($currentCode, 1);
					$codeInArray = in_array($currentCode, $MLTS_fix_codes);
					if ($codeInArray && $codeStart>3) {
						// if there's <p> before the code
						if (strpos($content, "<p>", $codeStart-3)==($codeStart-3)) {
							// remove the <p>
							$content = substr($content, 0, $codeStart-3).substr($content, $codeStart);
							// move caret -3
							$caret -= 3;
			// if the caret is at "]"
			if ($content[$caret]=="]") {
				if ($codeInArray) {
					// check after caret for </p>
					if (strpos($content, "</p>", $caret+1) == ($caret+1)) {
						// splice out </p>
						$content = substr($content, 0, $caret+1).substr($content, $caret+5);
				// we are no longer in a shortcode
				$inShortcode = false;
				$currentCode = null;
				$codeStart = null;

		// if not in a shortcode at all
		} else {
			// increment the counter looking for the start of a code
			if ($content[$caret]=="[") {
				$n = $content[$caret+1];
				if (($n>='a' && $n<='z') || ($n>='A' && $n<='Z') || ($n=="/")) {
					$inShortcode = true;
					$currentCode = null;
					$codeStart = $caret;
					$codeInArray = false;

		// move to the next position
	return $content;

And last, register this as a filter to all content:

add_filter( 'the_content', 'shortcode_empty_paragraph_fix' );

So far this seems to be working okay but any comments or additions would be welcomed.