PHPRO.ORG

Highlight Search Words

Highlight Search Words

With the temperature up on keywords and searches, many sites have opted for highlighting the keywords from their searches. This can be useful for quickly finding relavant words withing large pages of text. It can also be quite annoying as the highlighted colors are usually hot pink, lime green or yellow, like highlighter marker pens after which the process was named.

When putting this code together, two considerations were taken. Speed and functionality. On the one hand this can simply be done with Regular Expressions(regex), however, regular expressions are notoriously slow. On the other hand, string functions lack the power of regex and can use many sting manipulation functions to achieve the same functionality. The use of many PHP string functions can slow the function down slower than using a regular expressesion.

So, what is the functionality needed in the highlight function? Obviously to highlight words within a text string.Do we wish to highlight all occurances of a string? Do we wish case sensitivity. Do we wish to highlight only whole words? The answers to these will dictate which method to use, regex or string functions. Here is two examples of achieving the highlighting with slightly different functionality, using string manipulation, and the second using regular expressions.

String Manipulation

This first method will highlight all the occurances of the string and is case sensitive. It uses str_replace to do the required manipulation but has faults in not being able to tell the difference between PHP and PHPRO. It will highlight the text "PHP" and "PHP"RO which may be the desired result.


<?php

/** 
 * 
 * @highlight words 
 * 
 * @param string $string 
 * 
 * @param array $words 
 * 
 * @return string 
 * 
 */
 
function highlightWords($string$words)
 {
    foreach ( 
$words as $word )
    {
        
$string str_ireplace($word'<span class="highlight_word">'.$word.'</span>'$string);
    }
    
/*** return the highlighted string ***/
    
return $string;
 }

/*** example usage ***/
$string 'This text will highlight PHP and SQL and sql but not PHPRO or MySQL or sqlite';
/*** an array of words to highlight ***/
$words = array('php''sql');
/*** highlight the words ***/
$string =  highlightWords($string$words);

?>

<html>
<head> 
<title>PHPRO Highlight Search Words</title> 
<style type="text/css">
.highlight_word{
    background-color: pink;
}
</style> 
</head>
<body>
 <?php echo $string?>
</body>
</html>

Regular Expression

This second method makes us of PHP PCRE to achieve a better result. The seach is case insensitive, which means it will match php and PHP. This method has the added benifit of being able to use word boundries which enables highlighting of the word PHP but not PHPRO. The word boundary prevents partial matching of the search text and of highlighting parts of words. If this is the functionality you require, this is the method to choose.


 <?php

/**
 * @highlight words
 *
 * @param string $text
 *
 * @param array $words
 *
 * @return string
 *
 */
function highlightWords($text$words)
{
        
/*** loop of the array of words ***/
        
foreach ($words as $word)
        {
                
/*** quote the text for regex ***/
                
$word preg_quote($word);
                
/*** highlight the words ***/
                
$text preg_replace("/\b($word)\b/i"'<span class="highlight_word">\1</span>'$text);
        }
        
/*** return the text ***/
        
return $text;
}


/*** example usage ***/
$string 'This text will highlight PHP and SQL and sql but not PHPRO or MySQL or sqlite';
/*** an array of words to highlight ***/
$words = array('php''sql');
/*** highlight the words ***/
$string =  highlightWords($string$words);

?>

<html>
<head>
<title>PHPRO Highlight Search Words</title>
<style type="text/css">
.highlight_word{
        background-color: pink;
}
</style>
</head>
<body>
 <?php echo $string?>
</body>
</html>

Performance

Although these two function achieve slightly different results, they use widely different methods of achieving the result. Regular expressions have long been shunned by PHP coders due to the performance penalty suffered when using them. String manipulation has always been favoured for tasks wherever posssible. Recent performance tuning of PCRE mean the penalty may not be as severe as once thought. Here we took the two above scripts and test them with Apache Bench (ab) with 10,000 accesses. The results may surprise you.

In the first test, we test the string manipulation method..

 ab -n 10000 http://localhost/search_replace.php
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests


Server Software:        Apache/2.2.3
Server Hostname:        localhost
Server Port:            80

Document Path:          /search_replace.php
Document Length:        479 bytes

Concurrency Level:      1
Time taken for tests:   13.926626 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      6710000 bytes
HTML transferred:       4790000 bytes
Requests per second:    718.05 [#/sec] (mean)
Time per request:       1.393 [ms] (mean)
Time per request:       1.393 [ms] (mean, across all concurrent requests)
Transfer rate:          470.47 [Kbytes/sec] received

At 13.9 seconds for 10,000 requests, this looks a pretty good method of highlighting. Lets see how it shapes up when compared to PCRE

ab -n 10000 http://localhost/search_regex.php
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests


Server Software:        Apache/2.2.3
Server Hostname:        localhost
Server Port:            80

Document Path:          /search_regex.php
Document Length:        362 bytes

Concurrency Level:      1
Time taken for tests:   14.135278 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      5540000 bytes
HTML transferred:       3620000 bytes
Requests per second:    707.45 [#/sec] (mean)
Time per request:       1.414 [ms] (mean)
Time per request:       1.414 [ms] (mean, across all concurrent requests)
Transfer rate:          382.73 [Kbytes/sec] received

So there you have it, at 14.1 seconds, or a difference of 0.208652 seconds over 10,000 accesses. This equates to 000020865 seconds per access. A moral victory at best.

Adding more colors

By keeping with the second method above, it is now a simple task of adding and array of colors so that the colors are different. The array is looped over with the array of words. The script follows here. However, there is a further increase in the time taken to display the results, adding 2 seconds to 10,000 requests. A small penalty, but a penalty no less.


<?php

/**
 * @highlight words
 *
 * @param string $text
 *
 * @param array $words
 *
 * @param array $colors
 *
 * @return string
 *
 */
function highlightWords($text$words$colors=null)
{
        if(
is_null($colors) || !is_array($colors))
        {
                
$colors = array('yellow''pink''green');
        }

        
$i 0;
        
/*** the maximum key number ***/
        
$num_colors max(array_keys($colors));

        
/*** loop of the array of words ***/
        
foreach ($words as $word)
        {
                
/*** quote the text for regex ***/
                
$word preg_quote($word);
                
/*** highlight the words ***/
                
$text preg_replace("/\b($word)\b/i"'<span class="highlight_'.$colors[$i].'">\1</span>'$text);
                if(
$i==$num_colors){ $i 0; } else { $i++; }
        }
        
/*** return the text ***/
        
return $text;
}


/*** example usage ***/
$string 'This text will highlight PHP and SQL and sql but not PHPRO or MySQL or sqlite';
$words = array('php''sql''phpro''sqlite');
$string =  highlightWords($string$words);

?>

<html>
<head>
<title>PHPRO Highlight Search Words</title>
<style type="text/css">
.highlight_pink{
        background-color: pink;
}
.highlight_yellow{
        background-color: yellow;
}
.highlight_green{
        background-color: green;
}

</style>
</head>
<body>
 <?php echo $string?>
</body>
</html>