What makes a good site search?
Search is the dominant form of navigation on the internet; when it works, it's the fastest way to get where you want to go. Yet when it comes to site search, many websites really phone it in. As a result, users have started ignoring site search boxes entirely.
So, how do you do site search properly?
Before you begin, ask yourself if building or deploying anything is really necessary. Do you need something that Google or Yahoo's public site search features can't manage for you? Do you have domain-specific knowledge that you can apply to your specific situation?
Perhaps you've got information that Google can't, won't, or is not permitted to index. Perhaps your website is all about search such as it is with online classifieds. Perhaps your needs don't align with the behaviours of the public search engines, for example, when your content has an extremely short shelf life.
Doing search right is all about achieving good results and communicating them well. There are many techniques you can use to tune your site search algorithm. Common ones include:
- Keyword relevance. This is the meat and potatoes of any text-based search, yet it's often poorly implemented. Make sure you're not capturing content that isn't relevant. If possible, exclude navigation, related links and other elements that don't describe the page's unique meaning.
- Identify the core elements on a page and weight their content more highly. For example, the content's title is important, while the names and home towns of satisfied customers probably aren't. Excess content can lead to false positives which pull attention away from relevant pages.
- Not all high-value content is on your page. You can sometimes find high-value keywords in unexpected places. For example, if you capture the keywords in google referrals to each page, these could be used to automatically populate your keyword meta data. These are keywords used by real people who in all likelihood wanted to find that page.
- Build a keyword thesaurus, but use it cautiously. If possible, define the strength of each keyword relationship. Consider the following pairs: footwear and shoes; pills and drugs; knife and cutlery. Clearly not all of these relationships are as relevant as each other; in many cases the relevance only goes in one direction. Often these relationships follow a taxonomical structure; be aware of this.
- Link graph analysis. This was Google's secret weapon when they burst onto the scene in 1999, but is generally not useful for site search algorithms. You should be able to determine the relative importance of each page on your site with more objective methods.
- Content popularity. You can determine this in many ways, but it generally involves counting the number of hits the content receives. Frequently viewed pages are more likely to be what someone is looking for. You can refine this further by looking at what's popular this hour versus this month, for example.
- Age of content. When dealing with timely information such as news reports, you could assign higher importance to new content. Ask yourself how many searches you'd likely get for current versus back catalogue material.
- Removing outdated content. Increase your signal to noise ratio by making sure expired content isn't cluttering up good current content.
- Contextual relevance. If you can ascertain the country a user is in, you could tailor the results to their locale. In a similar vein, you could identify the user's computer platform and target results that way — this might be useful for a shareware library, for example. There's no hard and fast rules here; even the time of day could be used to improve search results for some niche.
- Active personalisation. Anything you might know about a user can prove useful. A makeup company might ask users to set their skin colour within their user profile; then in turn subtly change search results to favour products compatible with that complexion. Many sites could benefit from knowing the age or sex of a user. Even ethnicity isn't out of the question — yes, there really is a search engine whose algorithm favours 'black' related content.
- Passive personalisation. If you know what the user has viewed or interacted with in the past, you can bias the choices towards (or away) from the same or similar items.
Okay, so now you have great search results that consistently find exactly what they're looking for. Don't screw things up in the last innings by not conveying the content well on the results page.
- You can have exactly the page they're looking for come up first in the results, but if the result doesn't look like what they're after, they might overlook it. Make sure your titles are clear and concise, and ensure descriptions take a relevant quote from the page.
- Highlight the user's terms within the search results. This greatly improves the user's comprehension of the results.
- Don't over-complicate the page design for search results. People are used to Google; don't be afraid to let its design help to inform your own.
How do you know if you've made any real-world improvements to the search quality? Measure it.
- For each search, log which results they click on, and whether it was the first, second or 43rd result. If you graph the lowest position for every search, you should see an upward trend as the algorithm is improved.
- Log whenever a search fails to produce any results, or fails to produce any clickthroughs. Individual cases can be looked at for possible new thesaurus entries — you'll learn about many common spelling mistakes this way.
- Log whenever the user went to page two. In particular, when they never clicked on any page one results. This can be a sign that results aren't sufficiently relevant.
- What are people finding via Google versus your own search engine? Reporting this can often prove insightful about how users interact with your search, and what they expect it to be capable of.
In a future article I'll talk about some practical implementations for many of these ideas.