Flare Tip: How the full-text search works

Following is an article I wrote about how Flare ranks topics in the full-text search.

 

In the late 90s, research showed that the majority of help users started looking for the answers to their questions using the index. Over the last 10 years, we believe that we’ve seen a shift towards the full-text search due to the influence of search engines on the way most users look for online information. This doesn’t mean that the index isn’t important (or less effective than a full-text search)—we’ll save that for another tip. The growing reliance on the search feature does mean that help authors need to know how the help authoring tool (HAT), in this case Flare, evaluates help topics when the user performs a search.

 

When users perform a full-text search in Flare, they enter a term or phrase and Flare creates a list of topics ranked in an order that should put the best matches first in the list. As you might expect, the search uses exact matches and frequency to rank certain topics higher than others; however, you may not realize that Flare also considers the formatting of the terms. Let’s cover each of these factors in more detail.

Flare ranks exact matches of the term the user enters higher than partial matches. An exact match uses the same form of the term (for instance, an exact match for “deleting files” would be “deleting” or “files”). A partial match may share the same root word as the original search term, but the terms aren’t exactly the same (for instance, the user might search for “deleting files” and the search would also find “delete”, “deletion”, “file”, and “filed”).

 

It’s worth noting that Flare does not consider the order of terms when evaluating topics. In other words, when searching for “deleting files” it doesn’t matter if the topic uses “deleting multiple files” or “files that need deleting”.

The next factor for ranking topics in a search is the number of times a search term is used in a topic. As you might expect, a topic that uses the word “deleting” 10 times will have a higher ranking than a topic that uses it only once. However, the exact match comes into play in an interesting way. If the user searches for “delete”, Flare searches for topics that use the word “delete” or any of its variations. A topic that uses the exact term just one time (in this example, “To delete a file”) will be ranked higher than a topic that uses the root word 10 times (for example, “deleting files”).

Formatting also affects how your topics are ranked during a search. A search term that appears in a heading style (H1, H2, H3, H4, H5, H6) will be ranked higher than the same term that appears in the body of the topic. The heading styles are also hierarchical, so a term in an H1 has a higher ranking than the same term in an H2. The heading styles have the largest affect on the ranking of a topic after an exact match.

 

In addition to heading styles, text markup (bold and italics) makes a small difference in the rankings. A search term that appears in bold will have a slightly higher ranking than the same term without bold. A bold term has a higher ranking than an italicized term.

There is a long list of formatting that does not impact the topic ranking in a search. These non-factors include: the inclusion of a topic in the TOC or browse sequence, index entries, concept entries, hyperlinks, and drop-down and expanding links.

 

Flare also uses a stop list of words that the search ignores completely. The list includes:

 

a, an, the, to, of, is, for, and, or, do, be, by, he, she, on, in, at, it, not, no, are, as, but, her, his, its, non, only, than, that, then, they, this, we, were, which, with, you, into, about, after, all, also, been, can, come, form, from, had, has, have, me, made, many, may, more, most, near, over, some, such, their, there, these, under, use, was, when, where, against, among, became, because, between, during, each, early, found, however, include, late, later, med, other, several, through, until, who, your

Understanding how a topic will be evaluated in a search combines all three factors that we’ve discussed here: (1) exact matches, (2) frequency, and (3) formatting.

 

Here’s the basic ranking based on exact matches and formatting.

 

H1 Exact match
H1 Root match
H2 Exact match
H2 Root match
H3 Exact match
H3 Root match
H4 Exact match
H4 Root match
H5 Exact match
H5 Root match
Bold in Body Exact match
Italics in Body Exact match
H6 Exact match
Regular in Body Exact match
Bold in Body Root match
Italics in Body Root match
H6 Root match
Regular in Body Root match

 

As you can see from this ranking, the best way to boost a topic in the full-text search is to use the exact term the users will search for in one of the heading styles.

 

Now let’s add some frequency to the ranking:

 

Regular in Body Exact match (4 instances)
H1 Exact match
Regular in Body Exact match (2 instances)
Bold in Body Root match (44 instances)
H1 Root match
H2 Exact match
H2 Root match
H3 Exact match
H3 Root match
H4 Exact match
H4 Root match
H5 Exact match
H5 Root match
Bold in Body Exact match
Italics in Body Exact match
Bold in Body Root match (20 instances)
H6 Exact match
Regular in Body Exact match
Regular in Body Root match (100 instances)
Bold in Body Root match (19 instances)
Bold in Body Root match (2 instances)
Bold in Body Root match
Italics in Body Root match
Regular in Body Root match (50 instances)
H6 Root match
Regular in Body Root match

 

These frequency examples point out some very important points.

First, you can significantly boost the ranking of a topic by repeating the exact word that the user enters in the search. The good news is that this occurs naturally when you write. The bad news is that the root words do not have the same weight as the exact words.

 

Notice that it takes only 4 instances of the exact term in the body of the topic to move the topic in our example above a topic that uses the exact term with the H1 style. In contrast, you can repeat the root term 500 times in the body of a topic and it will never move above a single instance of the exact term in the body of a topic. The root terms are clearly the weak link in the search. If the users search for the exact terms you used, then they will have a much more successful experience than if they search for the root terms. Even with bold formatting, it takes 20 instances of a root word in the body of a topic to give it a higher ranking than one exact match in the body with regular formatting.

 

Second, the inclusion of the exact term in a heading style is vital to giving a topic a high ranking in the search.

Here are a few tips for preparing your help topics for a full-text search:

  • Make sure you use the official heading styles (H1-H6) in your topics.

  • Include the most important keywords for the topic in these headings.

  • Repeat the most important keywords when possible.

  • Consider reserving the H6 style for entering additional search terms at the end of your topics. Set the font-color property of this style to match the background color of your topics. Type the important keywords using this style and repeat the most important words (with root variations) 4-10 times. This is similar to adding metadata to the topic.

 

   
   



Copyright 2007. User Assistance Group, Inc. All rights reserved.