Research Commentary on the Spire Project

Information Pollution.
By David Novak

My old university has publicised the launch of a search engine called YourAmigo ( destined to reach new levels of indexing webpages on the Internet. At first glance, it sounds impressive: more webpages indexed, more depth, better search ranking and relevancy. Trouble is, we are ignoring the biggest trend to hit the web since its inception: the inevitable problem of information pollution.

If you are having difficulties searching the Internet today, a year from now will be so much worse.

The web has several undeniable qualities we will live to regret. It will shortly bury us in billions of webpages on every topic imaginable. Dreams of organising the web will falter in the face of this mud slide.

Publishing online is not so difficult, and is certainly getting easier. A far more vital trend, however, is the number of Internet users able to publish a webpage. Once a new Internet user becomes familiar with the Internet, they can learn how to add html to a document and publish a webpage for others to see. I would propose the time-lag is about 18 months to two years, meaning just as the number of Internet users slows down from exponential growth, we will enter a phase of exponential growth of those capable of publishing.

Secondly, many people, previously separated by geographical distances, come online and begin to publish in an environment where experience co-exists, often for the first time. But there can only be so many experts in each field.

Thirdly, as more information floods out of previously closed systems like association libraries into the public domain, the true extent of overlap will become apparent. Every state and national government in the world will have dealt with the issues of childcare - so just how important is each additional document when it is published online? How valuable is the 53rd childcare study by a state government?

We see early evidence of the growth of the web in several ways. Firstly, the search engines are slowly losing the struggle to index the world. Two years ago, search engines were brilliant, routinely indexed up to 40% of the web. This year, even with much larger search engines, estimates hover around 20%. Next year this figure will surely drop below 10%. I don't value search engines in proportion to crudely guessed percentages but it does underscore the trend.

Which leads us to further trends to help bury us in webpages. Firstly, there is no natural attrition rate to digital information like there is in print. Newspapers get recycled. Books go out of print. On the web - information tends to stay.

The cruellest factor is the multiplication of overlapping content. There is an almost persistent need for further websites on every topic. There are never enough websites. Always room for more.

This phenomenon is the gift of the web. Anyone can publish at minimal cost potentially reaching everyone. Make no mistake, with this gift come billions of webpages, (dreams of fame and fortune) and given time, absolute information pollution.

There are ways around this dilemma. The first is for all of us to take a crash course in Internet navigation. Locate the best information and to a degree, the better sites get our attention, the worst sites get ignored.

The technology has always been in place to evade the worst effects of information pollution. Boolean, proximity, truncation, search techniques and tactics have long existed but require a little more study than most of us will want to undertake.

Alternatively, a truly effective way to navigate the web could emerge which can somehow raise sites with merit to their desired audience. Nothing I have seen to date promises this, but it is a persistent hope. Everything gets overwhelmed by the volume of the Internet or falters before the rapidly changing Internet.

Most likely we will take shortcuts which eliminate a vast segment of the Internet from our attention. Appropriate or not, individual webpages, new websites and unpromoted websites will be shuffled to the side, out of the light. The logic goes that anything without a dozen websites pointing your way probably is not up to scratch. Some search engines like Google and Altavista's Raging invoke link analysis to determine site ranking, placing this kind of filter over which webpages get re-indexed.

It would be unfortunate if this filtering took hold. Getting attention is already slowly taking precedence over content. Attention becomes a slippery slope where webpages slide into the background noise.

My old university is launching a new search engine promising vast new coverage of the web. It is a noble idea, a worthy attempt to hold on to the past. It may even succeed though not for these reasons. In a mere 12 months the Internet will grow to tens of billions of webpages. Searching the complete web will become impossible and impractical. The ideal of searching all the web will pass into history.

You see, where information was once constrained by geography, social circles and publishing costs, now we have a single, unified, global information space.
David Novak manages The Spire Project, an Internet research resource and thinktank.

