toto article listarticle list
The Spire Project: Searching the Web
  Jump to:    
    Internet
        Search Engines
            Meta
            Regional
        Categorized
        Reviewed
        Specialty tools
    Commercial
    Conclusion
Strategy
Unified Global Search Engines byto article list 
Recognizes and translates title: url: domain: and link:
SearchSearch in new frame
Altavista All-the-Web HotBot Debriefing Google
Global Directories
SearchSearch in new frame
Yahoo Open Directory Project Looksmart W3 Virtual Library


Webpages are often of unknown age, of only guessed at quality and potentially the easiest information to retrieve. There are many points of entry to web resources but search tools differ. Try to match your search tool to your question. To start, you will need to learn something of the different tools - introduced below - and four basic search techniques: Boolean, Proximity, Field Searches & Truncation.

      Internet 

to article list Global Search Engines
When searching for a topic with precise descriptive terms, use a large international search engine. Always place the Boolean +symbol before each search word (like this: +word1 +word2) to insist all words appear in the results. Quotes keep words together ("word1 word2"). These two simple steps dramatically improve results. Keep adding words and search limits until the number of matches is reasonable.
database Altavista, among other tools, has a very large fast search engine. Allows for Full Boolean AND + NOT - OR | Proximity " " ~ (near ~ within 10 words of each other.) Several Fields title:"Spire Project" domain:gov url:edu link:spireproject.com and Truncation/Wildcard (*) Note that Capitals matter with Altavista. Read more here.
database All-the-Web is important because it is large - really large - with a flexible search facility. All-the-Web appears to have little overlap with Altavista. Allows Partial Boolean + - Simple Proximity " " and Several Fields a title field search normal.title:spire url field url.all:.au link-text normal.atext:spire and link-url link.all:spireproject.com All-the-Web is not case sensitive. Read more here.
webpage HotBot is a search engine reputed to have a particularly large spread. Hotbot also allows searches by region, by date, and more.

database MSN search might be powered by Inktomi. The search is fine but has nebulous interpretations of search tactics. I can't find any fields and proximity appears to be unavailable. In this way it has much in common with other search portal sites. I include it here as it should be a distinct database.
For more global search engines, consider visiting the W3 Search Engines page at the University of Geneva. The Industry Research Desk also has a good search engines page as does this site by Paul Hopper and this page from Search Engine Watch.

to article list Meta-Search Engines & Google
If you know something of the destination already, like a title or company name or full name, try using a search tool that excels in finding named websites. There should be little difficulty in finding such sites with either Google or a Meta-Search engine but don't get excited and use these on other occasions.
database
Debriefing is our meta-search engine of choice. Use this to find names and named websites. Accepts Partial Boolean + - Simple Proximity " ". Capitals matter.
database Google is a new style of search engine which ranks sites with more care and concern. This works well for sites you know a little about in advance. Allows Partial Boolean + - Simple Proximity " ". Unfortunately, No Truncation not even for plurals! Read more here. Google now accepts link: and site: field searching with mixed results.
to article list Regional Search Engines
EuroSeek SearchUK Woyaa NZ Explorer Anzwers.com.au Canada.com Iquana Orientation Asia Catcha.com.my Arianna Ecila Dónde? Web DE Trovantor Yupi Goo Khoj
A powerful search technique is to use a regional search engine for questions with a geographical dimension. A purely Australian search engine like Aussie.com.au, indexes only Australian websites and has more complete coverage of Australian issues.

Beyond the few here, SearchEngineColossus has a more definitive list.

If you come upon a Latin-language site, this tool by Systran may be useful.

to article list Categorized Lists
When searching for information that lends itself to a particular category or topic, start with resources which group information in categories. With few exceptions, these resources index websites, not webpages. Keep your search words simple and search more than one as each organize information differently.
webpage Yahoo is the largest of this type of directory tree; the definitive site. Accepts Partial Boolean + - Simple Proximity " " Truncation * and Several Field t: (for titles)  u: (for urls) and a date field through a form. Read more here.
webpage The Open Directory Project is a Netscape effort to, presumably, mute the strength of Yahoo. It is very good, and very similar to Yahoo.
webpage Looksmart is another very significant directory tree.
webpage For an alternative, try the World Wide Web Virtual Library: Subject Catalogue, a distributed network of subject lists, not nearly as dominant as Yahoo but far more "scholarly" shall we say. This virtual directory has been around many years, previously famous from www.w3.org.

(and or not accepted but must be typed in.)

to article list Reviewed Sites
When seeking specific fields of study, when topics are clouded with many similar low quality sites, start with resources with a greater degree of personal attention. Peer review and vetting produce resources with more quality but limited coverage, better suited to this situation. Also, keep your search words simple.
webpage
The Scout Report is one of the oldest and most highly regarded e-newsletters introducing internet resources. Residing at the University of Wisconsin, the Scout Report describes research, education & topical sites. The Scout Report Archive includes a quick search of previously featured sites:
webpage
database
BUBL is a British site which reviews internet resources then indexes by Dewey decimal number. I prefer their Dewey presentation but the collection is not large (though the largest of the library projects). Here is their search:
database
webpage
The Argus Clearinghouse is a vast collection of internet guidebooks. We can search the titles & descriptions but then click on the highlighted keywords to find related guides. I suspect Argus is not successfully keeping pace with internet development.
webpage AlphaSearch is similar to Argus. This one indexes important nexus sites and should be browsed.
webpage The Britannica.com (as in Encyclopedia Britannica) has been remolded as a free guide to books, periodicals, web and their encyclopedia. This encyclopedia is perhaps the best. Search from their search page.
database FAQs are compilations from newsgroups and can be searched at the Internet FAQ Consortium. An advanced search rests here. This is further covered in Discussion Groups

webpage
webpage
WebRings list sites by topic. Each webring is maintained by a volunteer at an uninvolved site using standard software. The search here is of Webring.com (now a part of Yahoo). Bomis.com is the other major webring site.

to article list Specialty Tools
Not all search engines are global, and the regional or topic-specific search tools are valuable in searching a more tightly focused collection.
webpage
webpage
Topic-Specific Search Engines. A particularly emergent approach is to search for a search engine which only indexes webpages on particular topics. Finding such a search engine is not simple as frankly there are not that many that exist yet. Most lists mix in searchable collections and online databases. I recommend two sites to review: Firstly, Search Engine Guide (searchengineguide.com) and a second, more technical collection of specialized searchable collections: direct search (gwis2.circ.gwu.edu/~gprice/direct.htm) Ideally you will find a search engine like ChemGuide covering over a million chemistry related pages.
database
Altavista can be limited to specific domains (gov edu au) with their "domain:domainname" field search. "url:url-segment" is also useful. Read the Altavista Fancy Features for Typical Searches.
database FirstGov and Google:Uncle Sam have replaced the now discontinued GovBot. Both provide searches just of US government webpages.
database
Altavista also allows for a field search by language. Searching for a Japanese site? Consider searching only webpages in Japanese.

      Commercial 

to article list Commercial Databases
database There are commercial resources applicable to the study of the internet. NetFirst, is produced by OCLC, and delivers bibliographic data to internet resources. Further descriptions can be found from FirstSearch. Further databases cover PC magazine articles.
database Newsbytes (www.newsbytes.com) is a newswire solely on computer topics, computer, telecom and online world. Their websites includes a search engine though past news items will cost you. Here is a description.

      Conclusion 

  5 Second Summary:
Search the web with several tools in succession.  
No search will find everything.
Different tools suit different questions.
For many of us, searching the web is simply typing words into a search engine. This works, until it doesn't, and then we need something more.

Contrary to myth, global search engines are not the best place to start most of the time. Just some of the time. On other occasions, start with a directory, a meta-search engine, a guide, an faq... There is no simple search of everything. Specific tools excel at locating different types of webpages. This page is arranged to follow this insight.

There is more to effective internet research than selecting the right tools and invoking Boolean syntax. Firstly, Information clumps. Information is not established in isolation but instead develops in context, is reinforced, and becomes a trend. The publishing motivation and promotion purpose helps us both pre-select the most likely types of sites and rapidly judge the content of a website. The webpage address can tell us a great deal about both the website structure and the type of publisher. These topics are covered in greater detail in Section 31 of the Information Research FAQ.

Article ListResearch CommentarySeminar datesUpdate Notices A skilled searcher can swiftly segment and search the most promising areas of the web. If answers are not quickly found, there may be other more appropriate resources to consider. Perhaps ask an appropriate discussion group or review printed literature instead. The Web always remains just one resource among many.

If your primary interest is Search Engines, consider reading A Higher Signal - To - Noise Ratio: Effective Use Of Web Search Engines by Bob Bocher & Kay Ihlenfeldt, Sink or Swim: Internet Search Tools & Techniques by Ross Tyner and the recent The Search is Over by Adam Page. Read also Searching the Internet a publication in the Scout Toolkit and browse Search Engine Watch.

      Strategy 

Searching the web is more a skill than most of us acknowledge. The web is a manifestation of the demon professional researchers work with all the time in the commercial information market. There is constantly the fear you have missed that single important site with everything. Consider the researcher's motto:

Someone, somewhere, probably knows the answer.

How long do we search for gems and where do we look? To decide, we must learn about internet structure and organization. Why is information published on the web? Why is it promoted? There is so much more than putting words into search engines.

#1 Motivation
We can make some very astute generalizations about a webpage very quickly if we can judge the reason it was published. Not only is this an important step in analyzing any information but this tells us a great deal about the contents of a webpage.

Yes, merely determining a site belongs to an association actually specifies the quality, motivation and type of information we will find.

Associations either publish what is termed 'brochureware' (promotional material), or if well advanced, present research work previously restricted to the association library: important research studies and the like.

Commercial interests have much more difficulty delivering useful resources. The importance of projecting a corporate image comes first (lots of 'brochureware'), and service descriptions come second. On occasion, commercial interests will support a worthwhile service tied closely to their own service - thus banks present interest rates and bookstores present their book stock.

The certainty with which we can make these judgments will astound you. Corporate websites never publish "changes to patent law". They simply don't have the motivation. Only an individual would publish this, most likely not on the web but though a mailing list.

Information is not distributed randomly. Consider Format, Preparation, Motivation and Promotion. Consider this, then Visualize the information you seek.
#2 Promotion
We can make further snap judgments about web information from the way you get there. Promotion is very difficult on the web, and it is hard to find poorly promoted information. The tools you use to reach information pre-determines the type and quality of information you will find.

Search engines index webpages indiscriminately. Advertised websites must have a pay-off. Directories focus on established websites (not webpages). Link pages also link to established websites but put more thought into the selection of resources. Both usually focus on general sites. For specific or current resources, we need to move to mailing lists or active nexus point.

Yes, when we find a webpage through the Scout Report (a prominent resource discovery newsletter), we can assume the webpage has a high quality of information, is reasonably current and has a general appeal (within the interest of the newsletter readers).

Let us now put our car in reverse. If we are looking for a recent document by a prominent library committee, we will not find it through Altavista, Yahoo, or normal link pages (except accidentally). We will find it through specialist newsletters, active nexus points and library mailing lists.

#3 Visualize
This discussion continues in Chapter 3
of the Information Research FAQ.
When an artist begins to paint, they visualize the image. They already have a concept of the finished result. Internet research is no different. We start by building a vision of the information we seek. Who would publish it? What is their motivation? Who would promote it? How will I find it?

Information Clumps. Information is created, nurtured, develops, gets transplanted, gets arranged and becomes visible through a process which brings similar information together. Your understanding of this process, including motivation and promotion, must guide your search. Only then will we will know where to look, and quickly know if the answers lie on the web.

to article listThe Spire Project - better ways to find information.
Like this? You should attend our public seminar and receive our bi-monthly update notice.
 | SpireProject.com | SpireProject.co.uk | Project Background | Feedback. Copyright©David Novak 2002.