Research Commentary on the Spire Project

The Coming Confusion.
By David Novak

As the web grows larger and the tools we use begin to fail, there comes a need for more advanced internet research expertise. This is a growth industry.

I just finished a delightful article that I think will be published in the library periodical ONLINE. It was a piece tracking the evolution of research techniques, showing that internet research is in for a shake-up. I want to explore one of the themes that emerged in a little more detail.

The internet is growing. As it grows, the percentage of the web indexed by the giant global search engines, falls. That is, the coverage and opportunity to directly search for information is being reduced.

Search engines have been dealing with this dilemma for several years, in several ways.

1) Search engine databases have grown in size. Google now rests just under 2.5 billion records.

2) Popularity has been co-opted as a measure of value. This popularity extends to the process of selecting pages to index. Pages that are frequently linked are more likely to be indexed.

3) I've heard search engines selection criteria is being refined to place more emphasis on resources with other hallmarks of value - like inbound links from a selection of respected websites.

4) I think I detect a growing emphasis on recently crafted webpages. Search engines drop or index less frequently the pages that don't change.

5) I also detect a wider distribution with fewer pages from more domains. This leads to a thinning of coverage but casting the net wider.

I am sure there are other clever approaches to extend an index. But these approaches are not sufficient in themselves to dispel the effect of growth. There is a tremendous amount of information in the world, merging from different publishers, geographical regions, and purposes. I like to draw attention to how the number of people capable of publishing webpages grows exponentially and will continue for a few years yet. I've another article called "10 Billion or more" that explores this concept.

There are to be a great many webpages on the internet. If we accept this, we can consider how it affects the way we find information now.

A year ago I pushed the need to use punctuation when you search the internet. Quotes are particularly important but field searching and Boolean are also significant. We are in a transition. Future information research will be more challenging. I can taste it in some of my searches now.

Take a search for personal details. I wanted a phone number to a lecturer I met a year ago who works at the University of Technology, Sydney (UTS). Google did not list it. A search for her name in quotes did dig up a single page about her work but the page did not have any contact details. A visit to the UTS website seemed the next obvious step. We'll come back to why it was obvious. On the website there was a search function - that defaulted to an Inktomi search restricted to the UTS website. This is clever, since I recall Inktomi does a deep search of Australia (or at least once did), so may have indexed more UTS pages than Google. Again this search came up empty. But wait… Wouldn't UTS have a staff directory? Yes it does. Her phone number was listed there.

A similar situation emerged in a background search about someone reserving a seat to one of my seminars. I like to know something about who attends. In this case, her name was too common to search directly. I did have her email address though, so off I went to find the organization responsible for that address (hoping it was not an ISP). I could not visit the domain directly, so I found a way to phrase a Google field search (a site search) that included the important elements of her email address. This alerted me to the organization holding the domain - a government agency. A Google search for this name and the name of the attendee led to the details I was seeking. On this occasion, the page was indexed by Google but could not be found easily. This round-about way succeeded where a direct search faltered.

One more example: let me find another person who runs seminars on internet research. A blunt search, by which I mean throwing several words at a search engine, presents far too many online resources that are not relevant to the business of delivering seminars. The results are "muddy", as I would say. The use of quotes around "internet research" is useful but "internet research seminar" is too restrictive. What to do?

Where could they be? does not list others. I have stumbled across many training efforts as I browse library association websites. A note to the Buslib-l mailing list alerted me to a mini-conference, which was most helpful. I still seek more, but I have yet to find the middle ground that would help to resolve this search.

Middle ground. That is what these searches are about. And that is why I mentioned earlier that a visit to the UTS website was the next obvious step. Where as last year it probably sufficed to teach the use of quotes and other punctuation, now we need to ask people to think about middle ground, nexus points, and existing structures that might help in their search.

The impetus behind the need for middle ground is the continued growth of the internet. This growth dissipates some of the delightful traits of Google and popularity ranking in general. We are leaving the period when a direct search was often sufficient. The future is certain to require more. We can not avert this destiny.

For the lay public still learning to use quotes, this will be a difficult shift. To the segment of the population familiar with either internet research or library research, the changes are not so very cumbersome. On a superficial level these are the same skills used in library research. The specifics depend on insights from search tools bias and url interpretation, knowledge of the invisible web and other internet structures, an understanding of the publishing process … and the list goes on.

The future of internet research is going to change dramatically. I look forward to this change. It will be exciting and will improve the productivity of those that understand the turbulence emerging from the mixing of research and the internet. Where as now these skills may be helpful, they will soon be critical.

* * *
* * *

