
No 2, September 2008
Welcome to the 2nd edition of the Funnelback bi-annual newsletter. A great deal has happened here at Funnelback since our last newsletter.
Funnelback Version 8 was successfully released in June and Dr David Hawking has joined us in a full time capacity as our Chief Scientist. David has had a distinguished career at CSIRO and is internationally recognized as a research leader in information retrieval. We are indeed fortunate to have David join our team full time.
We have also moved offices. Our new contact details are as follows:
401 b Clunies Ross St
Black Mountain, ACT, 2601
Ph: 1300 65 58 52
Fax: 13000 65 68 59
We hope you enjoy our second newsletter. All the best,
Stuart Beil, General Manager


Promoting diversity in search results
Author: Dr. David Hawking, Funnelback's Chief Scientist
In 1960, Marron & Kuhns proposed that search results from an information retrieval system should be ranked according to their probability of relevance to the searcher's request. This has become known as the Probabilistic Ranking Principle (PRP) and it was the starting point for the theoretical models which underpin many of today's successful search engines.
Unfortunately, PRP falls down in two ways. The first is when the same request is submitted by multiple users with different criteria. Say for example, that 80% of searchers submitting the query 'java' are interested in computer programming, 15% are interested in good coffee and the remainder have the quaint notion that Java is an island! In this example, the probability ranking principle suggests that the first page of search results should be filled entirely with resources related to the programming language, giving a probability of 80% that a searcher is very happy, but leaving 20% of the searchers totally unsatisfied.
The second failing arises from the fact that what is useful to a searcher depends upon what they have already seen. A document, which would be highly useful if presented by itself, may be rendered redundant by higher ranked documents which provide the same information. In 1998, Carbonell and Goldstein proposed that the document presented at rank n+1 should be the one which provides the maximum marginal relevance, or MMR, relative to the previous n.
Unfortunately, query responsiveness may be slowed by MMR due to the need to assess at each rank the similarity of candidate documents against the documents already presented.
In a talk I gave recently at the SIGIR conference in Singapore ( view slides ) I argue the need to promote diversity in search results, and canvass various dimensions of diversity, and ways to achieve it.
Funnelback provides several capabilities which address the problems described above:
1. Source diversity
Funnelback web site search promotes a diversity of sites in its search results by imposing a boredom penalty on the scores of subsequent results from sites already represented in the ranking. So, for example, the second result from a site may may be downgraded to 75% of its original score, and the third result to 67%. In the screenshot shown here , two sets of results for the same query are shown, the left with suppression and the right without.
In my opinion, the left hand set is more useful because it provides relevant results from ten different government sites, while the right hand set shows 8 results from budget.gov.au -- the homepage of the site plus 7 subsidiary results which are prominently linked to from that homepage. Do you agree?
The degree of penalty imposed by Funnelback's "same site suppression" mechanism is configurable as is the definition of a 'site'. Each host (such as shop.abc.net.au) is considered a site, but subdirectories such as www.anu.edu.au/physics and www.anu.edu.au/physics/optics can be configured to be treated separately.
2. Profiles
The need for diversity can be reduced if we know something about the person conducting the search or the reason they are conducting it.
Funnelback can be configured to define profiles such as "coffee connoisseur", "geographer", "Indonesian", etc. and to modify queries, promote sites, and promote types of documents etc to suit the applicable profile. Provided the right profile was active, all of the people submitting the query 'Java' could potentially be made happy.
3. Exposing and giving access to diversity within a full results set
Funnelback provides two facilities by which the diversity present in a deep results set (not just the top-10) can be presented to the user in a way which allows them to easily choose the interpretation or source they want.
The first facility is Contextual Navigation , which mines deep search results for repeated text elements (such as phrases) related to the query and provides links which activate more specific queries. For example, after submitting the query 'jaguar', a searcher might be given links such as 'jaguar enclosure', 'wild jaguars', '10.2 Jaguar', 'Jaguar owners club', 'E-type Jaguar' and so on.
You can see Contextual Navigation in operation at www.abc.net.au/news and www.australia.gov.au
The second facility is faceting, in which key characteristics of documents and objects represented in metadata are exposed through counts.
When searching a collection of email for 'funnelback', the faceting facility might report 17 messages from 'sbeil', 10 from 'apritchard', and 19 from 'support'. It might also show 1 message in 2006, 12 in 2007 and 84 in 2008. If you clicked on '2007', the result counts for the items in the 'from' facet would be reduced and you might now see additional facets such as 'subject' and 'to'.
This facility has obvious applicability in e-commerce sites, libraries, archives and wine clubs.
You can see faceting in action at CareerOne and Telstra's website www.nowwearetalking.com.au


Using Funnelback to search the J V Barry Library on the AIC website
Author: Ben Miskin and Peter Levan, Australian Institute of Criminology
The problem
The Australian Institute of Criminology (AIC) is Australia's leading national research and knowledge centre on crime and justice. Within the AIC is the J V Barry Library, which holds Australia’s most comprehensive collection of criminology and criminal justice resources. This collection comprises approximately 25,000 books and 1,440 serial titles, including journals, magazines, annual reports and other report series.
Providing web access to search this collection has always been problematic. The traditional method is via a web-based version of the Library’s internal on-line public access catalogue (OPAC), but this solution never worked as well as we had hoped.
The OPAC on the public website was driven by a secondary database which was updated daily using custom scripting. Synchronising changes between the two installations was sometimes tricky, and often required a considerable amount of liaison with the vendor. Exporting the relevant data from the main database to the database on the public web server was complicated by differences in the database management systems used. As a result we had ongoing problems and were keen to try a different approach.
The solution
We had been toying with the idea of using Funnelback to replace the OPAC’s search functionality, but it was only when the database schema was updated at the end of 2006, subsequently breaking the update process once again that we got serious. At the same time the public web server was due for replacement and a decision was taken to not reinstall the OPAC onto it. We decided that, as we needed to upgrade and tune the AIC’s installation of Panoptic 5.5, it was time to revisit the idea of replacing the public OPAC with Funnelback.
Funnelback to the rescue
While we had some perceived problems with moving to Funnelback such as the loss of functionality, questions surrounding the ability and amount of time required to export and index the database effectively (~95,000 records), and the ability of Funnelback to emulate the fielded search provided by the OPAC we could also see some distinct advantages. These included an increase in the speed of searching and return of results (which were taking 20-30 seconds on average), the addition of relevance based searching, the removal of a lot of complexity from the system as it would mean that there was only one database and OPAC that needed to be maintained, and most importantly, the ability to federate the search with the AIC website search.
Funnelback sent Katherina Ng to take on the project, and the development process began. The first problem was determining how to get Funnelback to talk to our library database. After exploring various options it was decided that direct indexing would be a preferred solution if we could solve the problems mentioned above. Funnelback consulted with the vendor of our library system and a database view was created that included all the relevant information required for the external OPAC. Funnelback was then configured to index this view as a database collection; we provided them with direction on what we wanted the search forms and output to look like, and they did the rest.
Once we got to a point where Funnelback would search the data and present it, the next problem was how to tune the query processor to provide relevance searching.
The cues for weighting the data that are available on regular web pages do not exist when searching a database collection, and there are very few fields that can be used to weight the data. The following fields were perceived as the only candidates for weighting: Title, Subjects, Publication Date, Abstract, and Author, and we began tuning these. This was a time-consuming process, but a necessary one. The final result had the date field upweighted slightly from default, with the title field as the most important, followed by subject and record content. URL length weighting was discarded as it has no meaning, and stemming was also found to improve the search results.
Funnelback’s ‘Contextual Navigation’ was suggested as a feature that would add improved search functionality to users, and once tested we quickly implemented it. The navigational structure of search results meant the user was presented with similar or related searches, as well as opening up new search paths. This refinement of search functionality also increased usability.
The result
The integration of Funnelback’s search functionality to provide access to the J V Barry Library’s holding has been a big success. There has been a huge improvement in speed, with the database reliably updating overnight and search results returned at a speed previously unseen. Relevance tuning brings back good results, and the navigational structure provided through Contextual Navigation has proved to be a big hit.
Thanks to Funnelback’s adaptability, willingness to try new things, and professional staff, the solution they provided met our needs, and as a result the search functionality of the Library’s collection is now stable, reliable, speedy and adaptable, helping us to improve access to Australia’s most comprehensive collection of criminology and criminal justice resources.
The new catalogue can be viewed at:
http://search.aic.gov.au/search/search.cgi?collection=first


Our Products and Services department have recently been conducting product surveys with several of our clients to gather feedback on how the product is being used, what version of the product clients are using and improvements or suggestions for future versions.
In general responses were extremely positive. Some of the suggested improvements include:
Additional query reports
Reports on how people interact with Contextual Navigation results
More influence over result ranking
Better detection of navigation boilerplate and near duplicates (improved result summaries)
Improved featured pages (best bets), more influence over result ranking
Reports on query spikes (rapidly trending queries)
More DB-like querying functionality
Funnelback would like to thank all of our customers who participated. We greatly appreciate any feedback we receive and try to incorporate any suggestions / improvements in future releases of the product. Our Research and Development team are currently working on improving some of our reporting features to include in the next release.
What our clients had to say:
"Overall, we are very happy with the Funnelback product"
Peter Coppola, Web Content Services Manager
Australian Catholic University
"Extremely satisfied"
Peter Levan, Web Manager,
Australian Institute of Criminology
"Very satisfied"
Brett Sergeant , Web Team Leader, Information Management
Department of Infrastructure, Transport, Regional Development and Local Government
"It's very robust ...I like that Funnelback just works unlike some previous products. I don't have to really worry about it. ...Keep up the great work!"
Leon Wild, Web Manager, Public Affairs
Australian Human Rights Commission
"Very satisfied, easy to use"
Justin Bryce, Search Engine Marketing & Measurement - Website & Online Sales
Westpac
Take the survey
If you're currently using the Funnelback product and would like to complete our survey please click here. Funnelback thanks you in advance.
30 Second Survey
Funnelback is interested in the Content Management Systems our clients are currently using. If you’re interested in letting us know click here.


Funnelback Search Analytics
Author: Francis Crimmins, Manager, Research and Development, Funnelback
Funnelback provides site administrators and publishers with a set of reporting tools designed to help them analyse how end users are interacting with their Funnelback search service. This can be used to improve your site content and search service to help give the best possible experience for visitors to your site.
The Funnelback Reports system is part of your Funnelback administration interface. The Query Report provides a monthly summary of search activity, showing how many queries and clicks on results occurred during each month. Most links in the report are clickable, allowing you to "drill down" to get more detailed information. The table below shows which reports will help you answer some key questions about user search behaviour:
| Question |
Report to look at |
What are the most popular keywords used by
visitors to your site? |
Top overall query terms |
What are the most popular results that people
click on? |
Top overall clicks |
| What queries return no results? |
Query terms returning no matching results |
Table 1. Questions and Reports
By drilling down by month or selecting a specific date range you can see how searches change over time. Some searches will be perennially popular, others will be more seasonal. For example, administrators for a University site may notice that "timetable" is always popular, while "exam results" will trend upwards when exam results are due to be announced. The reports can be exported in CSV format for loading into Excel for further analysis.
Looking at what results users click on will help you decide if the search engine is returning useful content for specific queries. You may want to make sure that the content on these popular result pages is kept as up-to-date as possible.
The question "What queries return no results?" is probably one of the most important questions to get an answer for. The following table shows some suggestions for acting on this report:
Query type in
"No matching results" report |
Suggested Action |
| Correctly spelled query |
Add a featured page to ensure relevant content is displayed to the searcher and/or create a query expansion. If required, create some new content on your site or intranet. |
| Incorrectly spelled query |
Run the query yourself and check if the search engine returns an appropriate "Did you mean ...?" spelling suggestion. If not create a featured page or query expansion. |
Table 2. Actions for "No Matching Results"
Funnelback's "Featured Page" mechanism allows you to specify that certain results should be displayed whenever a set of trigger words is present in the query. For example, a University search engine administrator might specify the following featured page record
Admissions -> http://www.uni.edu.au/enrolments/
which specifies that if a query contains the word "Admissions" then always display the given URL at the top of the results page.
The Funnelback "Query Expansion" feature allows you to specify that certain queries should be expanded to a different form. If the query term "enrolments" always returns the key Enrolments site or sub-site in the standard search engine results then you may decide you do not need a featured page for that query. However, if visitors to your site uses the word "admissions" instead of "enrolments" then you might create a query expansion like:
admissions -> [admissions enrolments]
which specifies that when the query "admissions" is submitted it should be expanded to match resources that contain "admissions" or "enrolments".
By using a combination of:
Modifying your existing content (including page titles and headings)
Featured Pages (AKA "Best Bets")
Query Expansions
you can have a better match between a user's mental model of your organisation and the language and structure that you publish with.
Many users who come to your external web site (or employees to your intranet) will use the search box straight away, without relying on your existing site navigation structure. Using search analytics as described above will help ensure that these users find what they're looking for and thus improve their experience when using your site.
Funnelback continues to improve our analytics capabilties and these improvements will make their way into future versions of our software. We would be very interested in suggestions from our users on what improvements they would like to see in Funnelback's reporting capabilties.


Adding metadata to PDF's.
A great way to improve the accessibility of your PDF's and get the most out of Funnelback is to fill in the metadata fields of your document. Before converting your document to a PDF, open it's document properties (in Microsoft Word, this is under the File menu) and fill in the Summary fields. Your conversion software will automatically pick up this information and include it in the resulting PDF. If you've already converted your document to a PDF and you have PDF editing software such as Adobe Acrobat, just open the file and edit the metadata from there.
This simple measure will allow you to control the title, description, author and a range of other fields that are displayed in the search results, as well as making your document easier to find.
URL Scopes.
You can filter search results by document URLs using the 'scope' parameter.
For example:
http://servername/search/search.cgi?query=drought&collection=mycollection&scope=news
..will search for the term 'drought' in the 'mycollection' collection. However search results will be restricted to web pages with the name 'news' in the URL.
For example http://www.url.com/news/ or http://www.url2.com/section/news.html.
More information on the scope parameter is available from http://docs.funnelback.com/8.0/url_scope.html
XML interface. Did you know that Funnelback has an XML search results interface? This can be retrieved by using the following search URL instead:
http://server/search/xml.cgi?query=test&collection=mycollection
Most Funnelback URL parameters are the same for the XML interface.
If you have any Funnelback tips or tricks you'd like to share, please email us with your suggestions: media@funnelback.com

We hope you've enjoyed our 2nd newsletter.
If you'd like to suggest how we can improve the Funnelback product or, you're interested in a search tuning and upgrade please call Ph:1300 65 58 52 or email: media@funnelback.com. We're always happy to hear from you.
Sincerely, The Funnelback Team