From: firstname.lastname@example.org (David Novak) Newsgroups: alt.internet.research,sci.research,alt.answers,sci.answers,news.answers Subject: Information Research FAQ v.4.7 (Part 1/6) Followup-To: poster Approved: email@example.com Summary: Information Research FAQ: Resources, Tools & Training Archive-name: internet/info-research-faq/part1 Posting-Frequency: monthly Last-modified: April 2002 URL: http://spireproject.com Copyright: (c) 2001 David Novak Maintainer: David Novak
Many of us unwittingly digest great amounts of information in the course of a day. Our information needs are more modest and usually repetitive. When we have questions, we reach for a small collection of preferred information sources close at hand with a collection of assessments as to what is credible and trusted.
As a child, these sources include the school library, an encyclopedia and parents. All the sources are trusted.
As an adult, these sources include the state library, the newspaper, bookstores and current magazines. Adults understand truth has become a little more relative, but when the evening news declares presidential hopeful George W Bush is ahead by 3% (on a sample of 707) we slip into thinking he is leading.
There is more to information literacy. It is, after all, a profession. There are tools you know nothing about and techniques you have never heard of. There is a specialized vocabulary just made to confuse you. Research, or rather information research (to distinguish it from lab-coat style research) is so very much more involved.
Yet there is great simplicity to research too. Just under the murky mist of confusing resources rests a solid platform to stand on. In any one field there are just a handful of databases, directories and periodicals to consider. After decades of library and information industry evolution, clearly valuable sources have already floated to the top, monopolizing their respective fields. Most cities have just one or two primary newspapers. Large industries like book publishing have few book databases and a handful of primary book distributors.
Enters the internet: not so much a change of information as a revolution in access to information. Previously you could justify having just a handful of preferred information sources because these were the sources easily available. Today, and the future, is filled with information close at hand. We are dropped into a morass of competing information just waiting to capture our attention, and strain both our capacity to absorb information and our capacity to understand the differences between sources.
A great segment of our community will fall back to tried and true information sources they grew up with: state library, bookstore, local newspaper. The better alternative sources will be ignored for no particular reason. The rush of the information revolution will push past them. They will only hear of changes when their information needs suddenly change - and they are confronted with a vast collection of unfamiliar options, and struggle with understanding what sources they need.
A smaller segment of our community, by virtue of frequently tackling questions best answered with unfamiliar sources, will be driven to understand the information world: to become truly information literate.
There is another story here too. The way our society handles information is undergoing some very fascinating changes. Any predictions for the future should acknowledge the tension and flow of information in our society. Take, for example, the vast surplus of information emerging on the internet, and the convulsions of the commercial information industry in response. Rather than focusing on how information is organized, we can also focus on how information becomes organized. The who, where and why of information, the sociological perspective, adds meaning to the phrase "information revolution".
It was another warm day. The young Egyptian boy strode purposely out the gate towards the river. The Nile was low this time of year. Very abundant with fish and bird life. With luck, Shakh would return at sunset with food for the pantry. Mother would be pleased with that.
Shakh knew fishing had changed little over the last hundred years. The walls of his family's ancestral home had just such a scene of his grandfather fishing on the Nile from a small reed boat. The thinly carved relief was complete with spear, fish, ducks and Shakh's grandmother nearby holding lotus flowers.
Shakh stopped by old-man Jacob on his short walk to the bank of the Nile. He liked the old trader. Years ago Jacob had traveled to the Levant and brought back many strange artifacts. Some even came as far a field as the Harrapan people who were said to live beyond Sheba, across the waves, some three years journey away. He especially liked the small black head carved in a style so unlike anything else Shakh had seen.
The Harrapan people lived on the banks of the great Indus river in modern-day Pakistan. A great civilization almost on par with the Sumerians and the more distant Egyptians, very little remains today. They built vast cities of clay brick with rectangular city blocks. They built drains, public toilets and state granaries. They were the first to populate the Indus river valley. (see www.harappa.com/indus2/index.html)
Little remains. The Harrapan civilization fell with the arrival of the Aryan race and the intervening millennia treated their past poorly. The arrival of Islam erased much of their history as did the shifting Indus river itself. The British used the bricks from one ancient city in the construction of a great railway. Only today are the archaeological digs once again unearthing the past.
I search for Harrapa on the internet. Nothing special, just type 'Harrapa' into any of the popular search engines and I uncover harrapa.com, a website devoted to some recent information from these digs. Looks good. Pictures of ancient pots. Children's toys. A map to an ancient city.
Of course, Shakh would have known of the Harrapan civilization. While it is uncertain ancient Egyptian ever visited in person, goods and rumors traveled far from trader to trader. Ancient Egyptians, while not accomplished conquerors abroad, did travel and mix with distant peoples.
Shakh lived in a civilization centuries distant from us, yet both you and Shakh know a similar amount about the Harrapan civilization. The intervening years have not made everything clear. Even the information revolution has not changed the facts. Both you and Shakh have just a single source of information about the Harrapan civilization. You have the pictures on harrapa.com and our short excerpt here. Shakh has the old-man's art object to look at, the old-man's myth of a civilization beyond the waves.
This story carves the act of searching in deep relief. Searching is a skill, a trade and to some a profession. It is also just a simple task of finding information - something we do every day, in so many ways, without any of the difficulties we will get into later in this FAQ.
The difficulties only emerge when you want to do something spectacular. Should you wish to know something specific about the Harrapan civilization, or understand something contentious - then we require a greater degree of expertise and experience. The search becomes a challenging adventure in its own right.
The Nile was always a slow river but three months out of the year it burst its banks and flooded the fields, bringing life on the banks of the Nile to a complete halt. For these three months Shakh's family would move into the ancestral home in the streets surrounding the great pyramids. It was an old home, centuries old. Well suited to their needs with a storeroom for food, separate rooms for the parents, and an active social life in close proximity to others. In many ways, this was the most exciting time for young Shakh. For the rest of the year he lived in relative isolation in the village by the Nile. For these three months, he lived in a city, bustling with activity, construction and recreation.
Shakh had expected this year to be like the last but his father secured Shakh an important position - he would be in training to become a scribe. Father had grand plans for young Shakh, plans that extended far beyond life as a scribe. What's more, with luck and further prosperity, Shakh's father had the means to secure his further advance.
Much of ancient Egypt is available for us to read off the walls of the many remaining buildings. They were not a literate nation, yet were able to adorn almost everything with writing and pictures. They lived in the most enlightened society of the day. Years later, Egypt would gift the fledgling Hellenic state a full third of their Greek vocabulary.
This is part of the reason for such an interest in travelling to Egypt. It is the visual symbols that inform us and draw us in so deeply. Standing before the great religious statues, we begin to feel how it was to live and work in that day. To run amok as a young student, waiting for the Nile to subside once again.
Yet, there is much more to knowing ancient Egypt than just the monuments and wall reliefs. Years of study has recovered their lost language of hieroglyphs. Years of archaeology has unearthed their daily lives.
History and Archaeology are fine examples of searching in practice. Both fields struggle openly with the bias and uncertainty each new fact brings forth. Malta is a small island off the coast of Sicily, close to Tunisia. Should evidence emerge of ancient Egyptians living on Malta, what does it mean? Was Malta an Egyptian conquest or an occasional station for their fishing fleet?
This uncertainty applies to all information, in all situations. One of the first events for the new regime in Pakistan was to acknowledge that important national statistics, like the national GDP figures, had been fudged to a serious and significant degree. Important national statistics are not intrinsically true because of their source. This is not a problem solely of underdeveloped nations. Rumor suggests that during the height of Singapore's land value bubble their national figures were unreliable too.
Searching is a skill and an attitude. In this FAQ we progressively unfold the way information is found. Initially, let's cover a simple way to find information; a structured approach to an everyday problem. Afterwards, we shall look more closely, and with more complexity, at the world of information.
Searching is simple. It starts with a question. It ends with an answer. Everything between is searching. Much of it has to do with the tools you use. Select the right tool and you can get to the answer almost by default. Luckily, for any given topic there tends to be just a handful of must-use tools. For more complicated questions, there are usually plenty of people to ask for assistance.
The answers you are seeking will be found in a selection of different formats. In this I mean books, articles, interviews, and more. This is a very convenient concept and forms the foundation to all our work both here and in the Spire Project. Few research tools cover more than a single format; those that do, tend to cover each format poorly. Start a search by selecting the specific format you are seeking. Then, select your preferred search tool from a small collection specific to that format. To get the information, simply follow through and read, search or interview. Everything follows naturally.
There are just a few formats to consider.
. . . . . Dense, factual, comprehensive and a minimum of 6 months to a year old.
. . . . . Shorter than books but focused on one topic.
. . . . . Short and shallow. Immediate.
. . . . . Factual. More reliable.
. . . . . Very thick. Deeply researched. Esoteric.
. . . . . Immediate, mixed quality, with limited factual support.
. . . . . Immediate, varied quality, partly digested.
Each format has a selection of simple tools to find information. Many of these tools will be on the internet - which may mean easily accessible. A word of caution: try not to confuse search tools that happen to be on the internet with searching internet information. The Amazon.com book catalogue is a search tool useful in locating books. Though on the web, searching Amazon is part of a book search, not a web search. A search of the Reuters newswire is a news search, not a web search, even though Reuters releases current news on the web. Each format should remain distinct in your mind.
Tools to Find Books
1) Some books, particularly classics, are free on the internet through efforts like Project Gutenberg.
2) Libraries allow you to read books. Library catalogues are frequently online.
3) The largest libraries, like the Library of Congress and the British Library, list millions of books in their online catalogues.
4) Most currently available 'in print' books are listed in national Books-in-Print databases.
5) Each country maintains a special government publication database.
6) Lastly, online bookstore catalogues like that of Barnes & Noble, list a sizeable portion of current in-print books.
Tools to Find Webpages
1) Global search engines index hundreds of millions of webpages for free text searching. Consider Altavista and All-the-Web.
2) Global directories list resources by category. Consider Yahoo or the Open Directory Project.
3) Regional search engines and directories focus more tightly on regionally important topics.
4) Lastly, more specialized search tools, from search engines which focus on specific topics (like maths or government webpages), services which link you to important topic-specific websites, and services which manually review websites, all can take you further.
Tools to Find News
1) Current news is found in newspapers and the evening news. News clips can be delivered electronically, or purchased through specialist news clipping services.
2) Newswires redistribute regional news to a larger audience. Many newswires release their text news free online.
3) Specialized search engines like NewsBlip and TotalNews aggregate current online news.
4) State libraries archive past copies of regional papers.
4) Individual newspapers maintain libraries of previous articles. Many are available as commercial databases.
5) Larger commercial databases unite the news from many prominent newspapers. These databases of news articles stretch back many years.
This story is repeated with all the formats information comes in.
To drum this in with repetition, searching starts with a question. Select the format (book, news or webpage). Next, select one or more tools from our short list of search tools for that format. Want to understand the lifecycle of the spider? A book should prove useful. Let's look at either our local library book catalogue or a big commercial bookstore catalogue like Barnes & Noble (bn.com).
Search. Read. Voila, the lifecycle of the spider.
If searching appears a little boring at this point, you have not visited a library recently. The excitement comes in finding the information. The rest is dull indeed.
The information revolution washes over us, picks us up and pushes us forward like so much driftwood. From now on our lives will forever be awash with information. We will eat it. Breathe it. Live in it. Drown in it. Some of us will even learn to live for it. Those most capable will have the skills to search, sift and sort information.
The information revolution is not about primary research, lab coats and discovery. It is about a surplus of information. The searching we have just discussed is not a particularly creative process. Simple searching is not sufficient to deal with the great tide of information moving against us. But then, simple searching lacks finesse. Simple searching is, well, simple.
Searching is one of those most delightful tasks where skill is everything. A search without talent will give you just a taste. Like pottery perhaps. Anyone can get something but only an expert can accomplish wonders. Quality information, reliable answers, effective coverage of resources; it takes skill to get to this level.
Advances in technology and the delivery of search assistance has made searching easier than ever before. Many search tasks can be accomplished without any experience. With more challenging questions a novice will get results - results they will be proud of. But not results they should be proud of. With experience, you will recognize how much more is possible.
Let's proceed by adding a little more complexity.
Your value as a searcher is directly related to the number of resources you can reach for quickly, and your skill at phrasing a research question. Consequently, as a searcher, you will work hard at building ready access to a range of resources. You also work hard at understanding the special characteristics of collections of information.
The technical name for complex searching is 'Information Research'. I prefer to think of information research as an effort to locate answers, efficiently. Information Research is not vague browsing of available information for something that interests you. It is not browsing the library bookshelf or reading the newspaper, nor is it internet surfing. Information research is searching with a purpose ... and it is hard work.
Research is also an art form. The skills, tools, and resources we work with are only the canvass and paints of an artist. Research extends from commercial, legal, reporting, through the skills of interviewing, database searching, and research analysis using books, articles, experts and patents. Research is so large a field, involving so many skills, tools and resources, you will quickly find you do not wish to learn it all.
At the heart of information research lies a simple motto: "Someone, somewhere, probably knows the answer."
To quote The Information Broker's Handbook (Sue Rugge and Alfred Glossbrenner): "As information brokers, we shouldn't consider ourselves capable of providing solutions... What we 'can' provide, and what sets a really good information broker apart from the rest, are resources. We can provide the client with the kinds of information he or she needs ... that make it possible for individuals to solve their problems."
Let this sink in. We are not experts in the field we are researching. Collecting information on the moons of Jupiter? Do not pretend to be an astronomer. We are only experts at the tools for gathering information.
A Quick Introduction to Effective Searching.
1) Searchers work hard to properly frame the question.
2) Searchers know the technology, know where to look.
3) Searchers know you can ask.
Step One: Properly Frame the Question
The preparation of your question is critical. There is a galaxy of difference between a young student asking, "I am interested in trees", and a specific, attainable question like "Where would I find a tree surgeon I can talk to?"
The information sphere is very large and rather confusing. Each item of information has aspects of authenticity, accuracy, reliability, and bias. Information comes in many formats: interviews, books, articles, statistics. We learn about information from many sources: literature, discussion, resource lists, experience. There are also personal issues: budget, time, depth and purpose.
With all this to think about, we must be very careful about each question we ask. This issue is vital once we start an article search, and can easily mean the difference between 5 concise articles, and hundreds of general articles. The essence of our question is the manner with which we approach the information sphere. The question directs our efforts.
One key is to treat searching as an art, much like painting or photography. The true mark of an artist, and the primary step wanna-be artists miss, is visualizing what you want before you begin.
When searching, sit down and visualize what a successful search would look like in this situation. How many pages? How many documents? What kind of authors and what kind of quality of document? Go through the whole gamut of different types of research tools and describe it. Would a simple three-line newspaper article be a success? Would a 20-year-old dissertation be acceptable? Would a short conversation with an expert suffice? Would all three together suffice? (This approach works exceptionally well with internet research too.)
If you can phrase a question in a way that lends itself to your resources, you are far more likely to get the answers desired. Oddly, this often means you are asking for places where the information resides rather than asking directly for the information.
A novice starts with a question like, "What can I do for my exceptional child?" You should rephrase this question immediately. "What resources will help me help my exceptional child." These are both valid questions but the second question has a distinct answer - the first is far too vague. Other questions could be "What are other parents doing for their exceptional child?" or "Who can help advise me on how to teach my exceptional child."
Now we shape the question to get precise answers. "Where do I find a definitive list of associations?" (or a search for "+association +directory") works much better than, "What association works with exceptional children?" What about, "Who would know of associations for exception children?" and, "Are there pamphlets of advice for parents of exceptional children?" and, "What umbrella organizations/specialist libraries exist for exceptional children?"
Questions are not right or wrong, just better or worse at illuminating certain aspects of the answer. Make sure your questions illuminate something useful.
There are ways to frame questions for commercial databases, for research assistance, for interviews, for getting the truth from to your children. Your skill in phrasing the question has a lot to do with the results. Poor questions tend to come back and haunt us later when you miss relevant information. Set aside ample time to refresh and reframe your questions.
Step Two: Know the Technology, Know Where to Look.
Research rests on understanding the technology and an awareness of the resources. In the example above, a directory of associations does exist. Here in Australia it is the "Directory of Australian Associations", found in most important Australian libraries. The Australian "Department of Education" has a major interest in promoting exceptional children. In Western Australia, Infolink, a community information service, should have a record of major community groups for exceptional students. I have no direct knowledge of umbrella organizations or specialist libraries, though I expect both the education department and Infolink would. A quick search of some large libraries may help us find some of the pamphlets.
Knowing of specific resources is helpful. It is great if you live next door to the president of Mensa. You have easy access to someone knowledgeable, able to give his or her take on the situation.
Knowing the tools to help you find resources, the meta-resources, is vital. So what if we do not know exceptional students come under the Department of Education. Do we know who to ask to find the government department involved? If you do not know of the directory of associations, who or where would you look for one? Being unfamiliar with meta-resources is a serious handicap - you will find yourself searching hours for something a professional would do on the phone while drinking coffee.
Keep in mind the Spire Project is dedicated to providing you some of this experience. Our web articles should suggest directions to look. But there are limits to how we can help. At some point you simply must sit down with the Kompass Directory, or the Gale Directory of Databases, or the Australian Bureau of Statistics library, and become familiar with getting to all the relevant information.
Another must, for all searching, is experience searching electronic databases with complex research queries - a difficult task only made better with practice. As a general rule, if you don't use Fields, Proximity and Boolean search terms, you are doing it wrong. Most people do it wrong.
Step Three: Know You Can Ask.
There is very little mystery about professional research. Lots of people are experienced in different aspects of this field. My personal weak point is in direct interviewing where as I am a pioneer in secondary resource research. This is OK. In fact I use this liberally to determine the skill of professional researchers - do they know their own limits? The field is much too large to be an expert in all its aspects.
The positive site to this is many people welcome requests for help. I enjoy asking librarians questions. I also ask my customers, my suppliers and other professional researchers. Never get caught in the trap of feeling you know what to do. The joy in this profession is that most people do not expect you to be an expert in their field, just an expert in your field: particularly the meta-resources. Even if it requires a polite reminder, customers will appreciate you asking them for likely keywords in difficult searches. I always make a habit of asking librarians if I am missing something. A librarian is always fluent in their collections and I frequently locate real gems this way. (As an example, my state library arranges computer books in two sets, one Dewey and another in an alternative structure. Who would have guessed?)
Especially if you are just a student, always keep your ears open. You will frequently find yourself in the presence of some expert in some facet of research telling you something you already know. Consider carefully before you interject... Your expert may be about to explain something new to you.
Information research is a dedication to learning. At its heart is a collection of specific research skills, an awareness of research tools, and a gifted mind. - Oh, and a large amount of coffee. Without knowledge of and access to relevant research-worthy resources, your research will be severely limited and doubtful. This is why much of your work becoming an effective researcher involves learning about the resources and meta-resources for your field. Much of our work in the Spire Project is drawing your attention to relevant resources.
Before we progress to specific resources for specific formats (books, webpages, news), let us attack head on the role of the internet in information research. This should surprise you.
As Shakh became more proficient with writing, father wrote more frequently of the family deity. Horus, the falcon god, had long watched over his family. Horus sees all, his father would write, and even across the many miles separating you from us, Horus will watch over you and keep you close. It was a great comfort to Shakh to have the family deity looking after him.
Shakh too devoted himself to a life of watching and knowing.
We have discussed how information comes packaged in certain standardized formats like books, articles or news clips. Each format has particular qualities and standards that reflect the way the information is prepared. For example books are dense, factual, comprehensive and a minimum of 6 months to a year old.
So how can we apply this newfound wisdom to the internet?
Let's start at the beginning. The internet is an inexpensive and pervasive system for the delivery of data. It is also the medium of a dramatic shift in the way we access information.
A (1) dramatic drop in the cost of publishing is fuelling (2) the liberation of information from previously closed systems, leading to (3) an emergence of alternative funding for certain public resources and (4) an eagerly awaited 'direct to consumer' commercial information industry.
The first mental knot to untie is the separation of internet resources into distinct formats. Electronic books share most of the qualities of books published on paper. News stories found on the web share all of the qualities of news in your local newspaper. The fact they are electronic or appear as webpages has nothing to do with it. News is news. Electronic books are almost books.
But if online news is news, and online books are almost books, and both are not internet formats, what is an internet format?
The search-by-format method is a concept to simplify and understand the many information resources which exist in the world. The concept is only as valuable as it is successful at enlightening us. As to the internet, we have more to learn, but could safely divide the internet into several formats at this time, perhaps webpages, online discussion and ftp resources. Yet this is largely superficial. The real value comes from understanding the qualities of different types of webpages. We shall divide the webpage format further.
Must we really learn this?
You would be pardoned for equating searching and the internet. Much of the hype surrounding internet search tools builds the illusion that the skill of searching can somehow be distilled computationally then delivered to you electronically. Through the wonders of modern science, you can have the best information at your finger tips without having learn anything of search technology.
This is a pervasive lie (or marketing fiction). The electronic research industry has been around for decades and has worked on this problem for some time. No upstart internet guru has invented a technique to suddenly transform the search process. Such thinking would work in section two (Searching is Easy) but is the first illusion we must shatter for you to progress.
Case in point, Lycos and All-the-Web search engines use the same database of webpages. This database is growing rapidly, it stood at 350,000,000 webpages in June 2000 and hopes to reach one billion webpages by the end of 2001. It stands as a grand achievement in organization, right?
Wrong. Years ago I was using a unified database of news called Global Textline (no longer available but replaced by others). It had an astounding four billion news articles available for advanced text searching! Four billion news items, representing many years of news from all over the world. This was superficially 10 times the size of the current All-the-Web search engine.
No, the internet does not even hold the record for being the largest information field. Oh, it will surely surpass the quantity of commercial information, and superficially we could say it may already have achieved this. But the internet is not a new medium for information research. It is emerging as a new resource, not a new phenomenon.
The internet is a new medium for business - most businesses have never incorporated the immediacy or global nature of internet involvement, so considerable rethinking is required. The internet is a new medium for publishing for almost all of us; very few of us published electronically before the internet emerged. The internet is NOT a new medium for research. Information researchers have been working electronically for years. The internet is just a new resource we can reach for with strengths, weaknesses and peculiar traits we must appreciate.
By way of an example, let us compare Link Analysis as used in Google and Raging (of Altavista) with the process of editorial vetting as used in scientific journals.
Through the magic of link analysis, we can make certain assumptions about the value of a webpage by adding up the number of other pages linking to that page. In its simplest form, webpages with at least 100 inbound links from other websites are judged to be quality, valuable resources. A webpage without any inbound links has the suspicion of being of poorer quality. After all, no one has thought it valuable enough to add a link to their further resources page.
This logic has some serious shortcomings. Firstly, the process rewards long-term projects that have been online long enough to earn links. A brilliant new webpage would have few links - yet. It would be ranked poorly, undeservedly. Secondly, link analysis rewards websites over webpages. The pages with the most links are often homepages. Rating homepages over second level webpages works at odds to keyword searching. Our keywords will be found in specific, perhaps second-tier webpages. Links go to the top level. Thirdly, link analysis is a mass market, popular technique. You are banking on the intellectual finesse of a mass of mindless computer users much like yourself. It is the same kind of popular democratic selection that votes B-grade actors into the presidency.
Let's contrast this with the process of editorial vetting used in scientific journals. Each article is reviewed by a selection of knowledgeable peers who understand the topic is great depth. Each article is further improved by the editing of the journal editors, and by self-editing, for there is great competition and prestige at stake. Only a handful of the many submissions are judged worthy and appear in the printed journal. Success places the successful in the standard of record; stamped with an external statement of truth and importance.
Of course, the logic of editorial vetting also has shortcomings. Firstly, the process is time and effort intensive. Many of the most important journals will delay six months or more between submission and publication. In our digital era this is increasingly unacceptable. Secondly, the number of submissions accepted are at odds with the pace of development. So much more happens in the world than can be digested in this manner. Thirdly, editorial vetting supports the clannish behavior leveled against the upper echelons of science. New and novel developments have difficulty floating to the top if the peer review process should not be open to new ideas.
If link analysis is popular and democratic, editorial vetting is elitist and autocratic. Both approaches have pros and cons.
Once you have absorbed the drama between link analysis and editorial vetting, please do not retain the belief that your search needs will be completely solved for you. Searching is a complex, overgrown garden and its time to get your hands dirty.
So what does the internet have to do with searching?
The internet changes searching in two ways. Firstly, the webpage is a new format to contend with.
"Webpages are often of unknown age, of only guessed at quality and potentially the easiest information to retrieve. There are many points of entry to web resources but search tools differ. Try to match your search tool to your question."
The internet is also a conduit to many of the pre-existing tools for searching other formats (books, news, interviews).
With an internet connection, we can reach database retailers and many commercial quality databases like LOCOC, ERIC, MOCAT and AGIP directly from the source. We can also remotely search the catalogue of most libraries in the world. These are not new resources, just new ways to reach them.
In this day of interconnectivity and change, it is too tempting to declare the information industry is in rapid flux. Everything I have learned suggests this is not so. There are some changes associated with new channels but by and large the process of searching for information remains the same.
Let's look briefly at news as an example. News articles are written by the reporter, sold to international newswires which then distribute these stories to interested newspapers and news channels, that incorporate the news into your newspaper or evening TV news.
News would also be added to commercial databases of past news. These databases are then provided to database retailers like Dialog or Lexis-Nexis who sell occasional access to you.
With the internet, newswires have also provided their text news to online sites. Text news is thus available for you to browse or search.
I draw your attention to several facts. The fundamental nature of the industry has not changed. Journalists and newswires still impart upon the news the same nature as before. It is short, shallow, immediate. It is created to journalistic standards.
If you wish to search past news, you must still reach for the commercial database, most likely through a database retailer. Searching for news online only goes back two weeks at most.
Lastly, to date only the text format for news is widely disseminated. Sometimes a couple of pictures are included but the visual news, as used in the evening news on TV, is sure to remain priced beyond public consumption.
So what has changed? There is another venue for you to pick up the news. There are opportunities for new databases to be created, some of limited time (like totalnews.com - a database of current news on other websites). Little else has changed. The creation and dissemination of news remains pretty much as before the internet arrived.
Let us look even more briefly at book publishing. Books are produced by authors, improved by editors, published by publishers, marketed by bookstores, then purchased by you.
Today we have a couple of new online bookstores - and a large number of new old online bookstores (existing bookstores now selling online). We have a collection of free books online (largely classics like Shakespeare, which strangely, were immediately published as really inexpensive paperback classics available in airports everywhere).
There are also a range of very useful commercial quality book databases which have become free to search online. I am thinking the government publication catalogues (MOCAT [US], AGIP [Australia] and Stationery Office Online Catalogue [UK]) and the online catalogues for the Library of Congress (LOCOC) and the British Library.
Lastly, the online catalogue to the large bookstores like Barnes and Noble, Amazon and The Internet Bookshop (UK's WHSmith) can provide a free and fast database of books in print, though not as good as the commercial Books-in-Print databases. Of course, any local bookstore will offer to search books-in-print for you, so this is not as revolutionary as it might at first appear.
In summary, we have a collection of recently discounted book databases we can more easily search, we have additional sites to buy books, and little else. The creation and dissemination of books remains pretty much as before the internet arrived. Has the book industry changed? Not really.
The most remarkable change has been the emergence of group discussion online, the emergence of a new format for information (like the webpage) and the opportunities to connect faster to a whole range of pre-existing searchable resources.
This is the reason why we discuss searching-by-format. Later, at the end of this FAQ, we return to this topic and show that the real revolution is not in resources or industry or search tools but a revolution in immediate access. Access, it turns out, enriches the art of searching.
On counterpoint, as an information resource, the internet can still be much too limited for many situations. If we are not careful, searching the internet becomes no better than browsing the shelf of your state library.
What most impresses me about the internet is the promise of changes in the future. The internet as a system suggests radical improvements to the current decade-old systems that have attained their search-worthy status. What impresses me most are the improvements mostly still in the future, not yet proven, set to remain promising ventures for a time.
This is not to say internet research can not be rewarding. In some fields like computer studies, the internet has already surpassed parity with books, articles and associations. Just when you will consult the internet as a research-worthy resource depends on cost, effort, and the quality of the information returned. This judgement call requires more than a little experience.
Value is important. I sincerely hope we can suppress our enthusiasm for free information in favour of a truer appraisal of the value of information. Make no mistake, commercial information is brilliant. It is almost heresy to even compare commercial information with the results of a few hours on the internet.
Internet Information Theory
Let us agree the internet is great fun to surf but more challenging when you have a specific question in mind.
To improve our search skills, we begin by understanding how information is arranged on the internet. Contrary to myth, information is not disorganized but rather organized very carefully along clear patterns. Many patterns are specific to the information format (text document, webpage, email message, printed article). Further patterns match the way we become aware of information, or are specific to the information systems (mailing list, FAQ, peer-reviewed journal). Your understanding of the strengths and weaknesses of each pattern, each format, each system, guides your search for information. We shall start by shattering the internet, and commenting on the many pieces.
Three Definitions of the Internet
Do be careful when using the word 'internet'.
1_ The internet is a physical network; more than a million computers continuously exchanging information. The internet allows us to transfer information around the world.
2_ The internet is a landscape of information available on almost every topic imaginable. This information appears almost chaotically distributed to the world but holds clear patterns. For instance, linking information together are various structures like government web links, search engines and FAQ documents.
3_ The internet is a community of 500+ million individuals. These are real people who choose to interact, discuss and share information online.
In this example, let me just draw your attention to the way most of our research effort focuses on the second definition: a landscape of information. Much of the best information originates in the third definition: the internet is a community. Sometimes it is far more effective to ask real people than search the information cyberspace.
What I just mentioned is not so important as the technique I just used. I broke the large seemingly chaotic system into smaller pieces: pieces that hopefully make more sense. Eventually, when we've made sense of the little bits, perhaps we can comment astutely on the big-picture.
Information, transaction, entertainment
There is a triad of functions to all online activity:
Function - Activity - Unit
Information - Research - The Fact or Conclusion
Exchange - Business - The Transaction
Entertainment - Play - The Experience
Each internet function grows at a different rate and moves in a different direction. The development of forums is firmly in the smallest segment dealing with information. This segment is quite poorly organized and confusing. The entertainment function in contrast is well financed and graphically innovative with clear, profitable opportunities.
Much of the web is prepared with Exchange or Entertainment in mind. "Brochureware" (purely promotional webpages) is rarely required for research but is critical to securing a transaction. Entertainment related or just entertaining websites abound. Let us recognize just how few webpages are information & research related.
My own experience suggests we are just beginning to see the movements towards profiting from providing information. Direct selling of information is still chaotic and unrewarding.
The way information is packaged has a great bearing on the content, quality and use of the information. This theme is evident throughout the work of the Spire Project, and is particularly applicable to internet information. Webpages, text files, software, email and database entries each have particular qualities. Each shapes, constrains and restricts the informative content. These particular qualities apply irrespective of the information involved.
Books are dense, factual, a little old. Articles are short, sharp, more recent. News is puff, introductory, immediate. Each way the information is packaged, each format, presents the information to set standards.
Information formats on the internet are the same. Webpages are graphical, technical to produce, and not easily updated. FAQs are easier to maintain, text only, and attract more peer review. Mailing lists are simpler still, text, short, immediate, very peer-reviewed, characterized by discussion and resource discovery. Newsgroups are characterized by extremely low costs, vulnerable to trashing, poorly managed. Email is simple use, one-to-one discussion.
Let's look at books more closely. Books are created by authors who have something to write. Books are printed and marketed by Publishers to the bookstores that then provide it to the readers. Each facet of this process defines the resource. Books have quality, editorial vetting but minimal peer-review, marketable value and a potentially lengthy preparation time.
When it comes to research, why look for a book when investigating digital money? Books would just have the wrong qualities - would present the information poorly. We need a more current format (digital money is a fast moving topic), and a more peer-reviewed format (books have editorial vetting but not intrinsic peer-review). Why not search for a mailing list, an FAQ, or an association website. These formats have qualities more appropriate to our question.
Information flows also impress patterns on internet information. Most information is transplanted to the web - first created elsewhere. The source of information imparts as much pattern as the eventual format the information takes.
Information may appear as a webpage, and conform to our expectations for all webpages but the information may have been prepared from the discussion on a mailing list - and thus enjoy a more topical, specific, timely and peer-reviewed quality.
Let's look at FAQs. The best resource in the world on copyright law is the musings of a group of copyright lawyers who form the copyright mailing list. The copyright FAQ supported by this group is a logical document summarizing much of the discussion of this mailing list. FAQs are vetted by the news.answers team, then automatically mirrored around the world. From its origins in the mailing list, the FAQ is a peer-reviewed document, often full of links to further resources, topical, knowledgeable and factual. As an FAQ, the document is not immediate, graphical or financially rewarding (some FAQs stagnate).
Only some internet information is created within the internet environment. The concept of 'brochureware' describes the common traits to promotional webpages directly prepared from paper promotional brochures.
One of the more exciting trends is the movement of information from the dusty shelves of government offices and association libraries to their more accessible websites. The quality of information retained in your average government agency, from quality research reports, to detailed studies, to current industry monitoring is very high. These qualities are then brought over to the web format. Such web-documents tend to be isolated (not linked to other related resources) and perhaps a little behind the time line but of a generally high quality.
An exciting holistic view of the internet information landscape is based on these descriptions. Imagine, for a moment, information flowing through a collection of systems. At certain points, information groups together, and generates new, perhaps higher quality information, which then flows in a different system, a different direction, to different people.
The flow of information from one person to another, from one format to another, imprints qualities to the information along the way. Each organization, or subsequent re-organization, imparts specific styles and conventions and quality to the result.
Let us proceed to a third set of patterns. Information appears on the internet for one very specific reason. Someone Publishes (DUH). The motivation behind publishing colours the information. This is a pattern we can use to quickly judge the contents of a webpage.
Ask yourself who is publishing, and why.
One of the biggest publishing segment a year ago were individuals publishing documents derived from their personal expertise. A typical document would be one with minimal peer review, a list of aging links to further resources, simple graphics, variable to short length, prone to bias but moderately reliable because the publisher knows their topic well. These pages are often located on web pages with private sub-directories (usually starting /~name/).
Commercial sites publish mainly for the promotional value. Their secondary purpose is to provide sales information to prospective clients. Rarely do commercial sites go beyond this. Commercial webpages often reside on their own domain name, as a .com, or in sub-directories - without the tilde symbol. Commercial sites also tend to age badly. They are very noticeable from their front page.
Government agencies are emerging as valued publishers. Slowly their dormant information becomes available through this new medium. Currently almost all government documents on the internet also appear in print, meaning they are factual, exhaustively reviewed, tend to be a little old (but age well), and come from highly paid knowledgeable people who believe it is their duty to inform others. Such documents are lengthy and appear on .gov domains.
These patterns are simple to see.
Grant-funded projects create brilliant research resources and hold much promise in pushing the limits of this technology. I am eager to see the results of the US Patents project, and appreciate the value of having Supreme Court rulings on the internet. Often such projects focus deeply on content. Most projects reside on educational servers and are widely discussed within knowledgeable groups.
Associations publish association-kind-of-things. Most are initially just like the commercial webpages. With time such sites become much more factual and research-worthy. Most associations are dedicated to developing awareness of their chosen topic, albeit coloured by their chosen bias. Few associations are significant publishers but in time, this segment will begin to liberate dormant information within associations.
Let's summarize. The key is to always watch who is the publisher. We can assume a great deal, quickly. We are unlikely to find the latest changes to patent law from government or commercial publishers. Such organizations are simply not motivated to present such information.
Publishing is one achievement but you and I will never read any information until we learn it exists. This simple fact creates even more patterns to internet information. Knowledge of information moves through set routes on its way from writer to reader.
Promotion is not simple. It is a process that takes time, effort and perhaps money. Information without serious promotion tends not to be promoted far from the source. Another way to phrase this; you must search close to the source to find poorly promoted information.
A search engine indexes pages relatively indiscriminately. This also means a site of quality is not likely to reach your attention. The odds are not good, and from a promotion point of view, search engines generate minimal traffic to your webpage. Search engines also drop you rather randomly into a website. It is often necessary to move up a directory to understand the purpose and motivation of a site you find interesting.
Information published through advertising tends to have a financial payoff for the promoter. This kind of information tends to be promotional information. Brochureware.
The alternatives are to promote a webpage or website through one of the referral tools. Each such tool accepts links on some criterion. Each tool you use to locate information also selects particular types of information for your attention.
If you arrive at a document by recommendation through a mailing list, the document is likely to be recent, on-topic and specific to the purpose of the mailing list. Alternatively, (for poor mailing lists) it will be wildly off topic and trash. You are unlikely to see referrals to old documents or documents of historical importance. These are the qualities most acceptable to the mailing list environment.
Directory trees, FAQs, guidebooks and related promotion tools all work as historically important documents. In the past, such resources list, describe and alert people to relevant information for the field. Slowly, over time, this function becomes acknowledged, reinforced and promoted. Time is the essence of this fame.
Webpages or websites found through historically important documents, by their nature, tend to be long lasting websites with lasting importance in the field. Such documents point to other similar documents or websites that have achieved a long-lasting importance. You are unlikely to find specific documents but rather sites that focus or bring together information. In short, there is little motivation to link to specific webpages, when a link to an important website is just as good.
Similar generalizations can be made of each type of promotional tool, and become important in rapidly seeking our information which matches our intention, as well as summarizing the likely motivation, and bias, of webpages we are interested in.
Information Clumps. Information is created, nurtured, develops, gets transplanted, gets arranged and then becomes visible through a process which brings similar information together.
As we have discussed, there are factors deeply affecting all information on the internet. Motivation, Preparation, Format and Promotion all define the quality and content of any given item of information. With so many influences, we should not be surprised to learn information naturally groups together. In reality, there is nothing natural involved - it is a social phenomenon reinforced each time you and I visit or read one resource but not another.
History can explain some aspects of internet development. As a small collection of sites become dominant in particular fields, by collecting and delivering better content to more people, new sites find it progressively more difficult to capture attention. This dynamic works for websites reaching out for visitors, and discussion groups reaching out for subscribers. In each case, seniority counts.
Seniority counts in several ways too. Promotion is directly related to quality, interest, traffic and time. The longer a site is active, the better the footpath develops, the more people visit. Secondly, quality content is directly related to access to quality content, peer review, and time/money. Important existing sites gain in every way.
This results in a grand system where the first-in, best-dressed, can capture the high ground and secure a grand lead in awareness and footpath over competitors who follow. Yahoo is a prime example of a directory tree, not even the best in most areas, which has achieved unparalleled traffic & awareness.
This competition is equally evident where no money is involved. Perhaps your association wishes to create a new referral website, or an open mailing list, or an informative guide. All sound concepts, effective projects. However, if older, established resources exist, the work will be long and arduous.
Despite the marketing message, the internet is not a world where the best information floats to the top. The internet will not let you to reach millions. You must compete for the attention, participation, devotion and assistance in a manner very similar to building a business.
In concrete terms, information clumps on the internet. The best resource could appear on any internet system (webpages, email mailing lists, ftp-archives, FAQs, online databases, newsgroups...) but we can be fairly certain the best information will congregate in just one or two. Consider this as an application of the 80:20 rule. 80% of the good information will be found on 20% of the formats, arranged concisely by 20% of the search tools.
Consider our article "Searching the Web" (spireproject.com/webpage.htm). We progressively search different web tools, looking for the most worthy. Searching the internet is the same. You must touch each system to see which system is dominant, where the information is congregating for your topic.
Bringing this together
In summary, we have broken down and discussed various qualities of published information and promoted information. We have made sweeping generalizations and educated guesses about information on the internet. Now what?
When a painter begins to paint, they have already visualized some of the image. They already have a concept of the finished result. Internet research is no different. We start by building a vision of the information we seek. Who would publish it? Where would I find it? What is its motivation? How would we find it? We now have a practical vision.
The address is one of the keys. The web address (or URL - Uniform Resource Locator) for any item of information gives us a surprising amount of information - particularly as we are making generalizations about information patterns. We can guess if information resides on a personal webpage, a funded university project, or a commercial project. The information resides on a .gov website? - the quality is likely to be higher and conform to our expectations of government resources.
We use this new-found experience in three ways. Firstly, we restrict our searches to the most likely sources. Secondly, we quickly jump through lists of resources (such as those generated by search engines) to the sources that match our expectations. Thirdly, our assessment of information quality can be guided by our snap-judgements of its origin and purpose.
Internet newcomers often expect to have instant access to the latest information at the touch of the button in beautiful colour and peer reviewed quality prose. Who is publishing this? Where is this information coming from? Who would help us find this? Such a vision is fantasy. If we were instead to look for an association website, dedicated to a certain type of research, or an informed newsgroup, maintained by people passionate about sharing this technology, then we have made four steps forward. We are clear about where to look for the answers we seek, and we will know quickly if the answers are online.
Let us now leave this discussion on internet organization and internet theory. This is tough newly discovered territory, more than a little rough. I fear it will make most sense to people with considerable experience with the internet. Let us now explore the fertile grounds of understanding more familiar formats like books and news.
On the second year of his training, Shakh began to piece together the many rules and guidelines to understanding hieroglyphs. He had thought the lessons would end once he learned the glyphs but no, there were long and convoluted rules governing the translation of sounds into glyphs. Simple rules govern the placement of glyphs on the wall - certain glyphs lose their meaning when placed apart.
Then, there was the art of writing. The glyphs had to be the right size and shape. If you were about to finish the line, you could squish certain glyphs just a little to make room for the next glyph. If you did not plan well, you would leave the line hanging, a word unfinished, a sentence incomplete.
Then Shakh started to learn hieratic - shorthand glyphs for less formal situations.
It was all very complicated and cumbersome. Shakh did not like the technical nature of writing. So much to learn and still so far from writing clear, interesting results. His seasons in training went very slowly. The Nile rose then fell then rose again.
A great deal of dull information must be comprehended, absorbed, internalized. Nothing spectacular. Nothing of particular interest. Just a mass of rules and guidelines to help you move within the world of information.
On the third year of medical school the aspiring doctor begins to memorize a vast linked-array of drugs, symptoms and afflictions. The next three years are spent developing this mental array; refining, building, adding experience, so that one day a doctor may look at a symptom, think of possible afflictions or drug reactions, then proscribe drugs or call for further tests. The whole process of learning this array is intensely dull.
In the first part of this FAQ we explained in detail how an information search involves first selecting a suitable format (book, webpage, news, interview ...) then searching a few important tools that help us find information in that format. The first format we will look at is the humble book.
Shakh arrived in Edfu on a small boat in the company of his father. It was a short walk from the dock to the Edfu temple complex. A fantastic sight. A noble sight. The temple included a vast library of books and manuscripts - a warehouse of knowledge about Egypt.
Not that there were many manuscripts in total. The time and expense it took to create even a single copy made the library a prohibitive expense open to only those in certain need. This was not a public library, but an elitist library, open only to those who could justify the gifts required to enter. There it was, open before them, long shelves of scrolls arranged by rough topic. Amazing indeed. Shakh shivered slightly in the cool air. This would be his life for the next few years.
Books have such meaning to us as a society. We have a vibrant emotional connection. Books exude a solid proof of value to a larger community. They are important resources but the additional awe is amazing to behold. Try ripping a chapter from a book you own in public. The stares and discomfort is almost tangible. Some book-lovers get upset about slight creases in books, treating books as if they were important museum quality manuscripts - something to hold with awe and treat gently.
Being a book writer is similarly impressive. It is a mark of an expert. A knowledgeable expert. A knowledgeable expert we should listen too, should pay money for the chance to listen to, should pay, listen and carefully not crease their work.
This attitude is silly.
A book is a package of information, prepared along certain guidelines, with a purpose. In research we look for books on a topic that may help us answer a question. These books tend to be large, lengthy, detailed, verbose, heavy. Books are not good at describing cutting edge developments. They generally summarize popular consensus. They avoid criticism. When searching, they can make horrible resources.
Books are also large and physical creations. They must be stored. They stick around. They have a limited shelf life but libraries are forever over-stocked with dated publications of limited use and value. They are also long - troublesome things to read.
Books come in different flavors. There are the books by industry insiders who tell the truth, rip the facade about a particular industry. Such books make brilliant resources. There are also books by journalists, prepared without insider knowledge, more of a novel of a newsworthy situation. Such books tend to the verbose, circumstantial, light on facts.
Certain questions simply beg to be answered by reading a book. Such questions are usually general, introductory, timeless. For such questions a stack of news articles would lack cohesion. A collection of articles would be too precise, not give you the larger picture. Such questions need the 100 pages of description, pictures and the considered framework that books embody.
Finding a Book
As an information format, there are certain tools and resources you need to be aware of to effectively search for books. Thankfully, many of these tools have emerged on the internet. These include:
- A database of the free books on the internet from projects like the Online Book Initiative and Project Gutenberg. Includes many copyright-free classics (but not ebooks - a different concept).
- Three government publication databases for the US, UK and Australia. The US and Australian databases are comprehensive. The UK database is incomplete. The complete database is commercially available
- The book databases of large online bookstores is incomplete but useful as a fast search of current books. Some include background information. I use Barnes & Noble, Amazon, Borders and the UK Internet Bookshop (of the WHSmith bookstore chain).
- The largest libraries of the world, like the US Library of Congress and British Library hold more than 20 million publications stretching back many years. The online book catalogues are not good for the latest books, but are brilliant at earlier works.
- Local libraries and state libraries are noteworthy as finding a book in their database also means you have found access to these books.
- The definitive resource is the collection of national Books-in-Print databases like [US] Books in Print, Australian Books in Print, French Books in Print... These databases are commercially available online, as print directories (yuck) in libraries and often from publicly available to search from good bookstores
Information about new books is organized in a collection of national "Books in Print" databases. This information is publisher-verified, includes forthcoming titles, and is naturally updated far faster than the library and bookstore catalogues.
Books in Print, produced by Bowker, delivers publisher-verified information on US books. British Books in Print is produced by Whitaker & Sons, delivers publisher-verified information on UK books. Further national book indexes include Australian Books in Print (Thorpe), Canadian Books in Print (University of Toronto Press), Les Livres Disponibles/French Books in Print (Electre), Italian Books in Print, German Books in Print and others.
All these directories are available as print directories (not particularly convenient), as a commercial database (through database retailers), for subscription (bookstores frequently subscribe) or through Global Books in Print (through not really global, is a group of book databases).
With regards to the print versions, there may be recent editions in your state library but don't bother. The directory is not user-friendly as you must page through each month's subject categories. A more convenient alternative access point is your favorite large bookstore. For about Au$4500/year, many bookstores subscribe to Global Books in Print on CD-ROMs, or a national 'books in print' database. There should be no cost for searching, but ask for the date and the database name so you have a clearer idea of what is being searched.
Further Book Resources
Book Reviews are a viable tool in a book search. The tools mentioned above will give you very little information indeed - mainly title, author, format and price. You will usually want more than this before you buy a book.
Book reviews are published in a range of book-related journals and newspapers. These are compiled into a commercial database of Book Reviews, like the Book Review Digest by H.W.Wilson or Book Review Index by Gale Research, or individual book reviews from the like of the New York Review of Books (www.nybooks.com/nyrev/). A state library may provide access to the Book Review Digest Database.
Online book reviews are further discussed in Locating Book Reviews (www.lib.monash.edu.au/hss/guides/fsreview.htm) by Monash University Library.
Barnes & Noble, and to a lesser degree Amazon, have additional information in their book database. Since it is free, it makes for a fine immediate alternative to searching book reviews.
Future developments in book-related discussion groups holds out more promise in harnessing the opinions of a book-reading public. Quality issues remain (and the anonymous musings listed in Amazon.com and Barnes & Noble
There are also book finding services with specialty book databases - like a database of second-hand books. Books on Demand is a directory of out-of print books available for reprinting (and includes price and order information.)
Obviously title searches are not effective tools to discover new books. Not all books on Vincent Van Gogh include Vincent in the title. Subject searches, work well only if you can grasp the indexing.
Apply these effective search techniques:
1) Browse the subject listing and select the subjects which interest you.
2) Read the subject listings off a book you know interests you - then search for other books in those subjects.
3) Search for other publications from suggestive authors (especially when the author is an association).
Library catalogues, like LOCIS can illustrate these techniques. Let's say a title or subject search lands you with one of the books listed in LOCIS. This catalogue lists the applicable subject titles. Looking at books placed in the same subject category works well.
A word about Book Types. Just as internet information comes in different qualities and formats, books also come in different styles and flavours. Books written by industry insiders are characterized by personal stories and expert wisdom from an author telling all the secrets. These books are worth looking for, and the short bio may give a clue. Books written by Journalists have a different flavour, slightly more newsy with less factual than, let say, Government books (far more factual than most), and frequently updated books (far more current than most). Try to find the style of book suited to your needs.
The book industry has reached a kind of plateau where fairly definitive databases exist for listing books. There are databases for government books, out-of-print books, second-hand books, current books. The internet has changed some elements of this mix, as business models try to support moving existing databases to free access, and others use this change to try to present more definitive databases. Book reviews have never properly been used by the book industry, so the big change appears to be a move from book titles (as in most book databases and library catalogues) to rich information (like Barnes & Noble) which includes reviews and readers comments.
Articles hold a definitive value, a statement of quality and currency. Sometimes articles are long, unique and informative works. Sometimes articles are short, simple, trite; a rehash of common knowledge. There is a range of ways to access articles - though none are particularly inexpensive. We also have difficulties paying copyright - so most paid research assistance is restricted to certain, more expensive tools. In all, articles are cumbersome, cumbersome and time-consuming to work with. They can also be brilliantly rewarding.
There are three difficulties with article searches:
1_ Finding the articles which interest us.
2_ Getting our hands on a copy. (Many articles you locate may be impractical to access in person while electronic access can be expensive.)
3_ Copyright permission, (which can be potentially simple or exceedingly expensive).
Of course, the main stay of article research is photocopying an article directly from a journal. Find a library nearby which holds the journal then read or photocopy it then and there. This process can be improved by using the online library catalogues (to see if they hold the journal) and by searching a database of library holdings (often available for free by asking or calling a librarian at your state library). As you could expect, some commercial businesses will undertake this work on your behalf, for a fee.
The difficulty with this process, of course, is this does not help you discover what articles will interest you - this only works if you have a useful bibliography to work from.
In recent years, a concerted effort has been made to bring you full text articles electronically. Commercial databases in general have moved from being strictly bibliographic to many full text articles. A system of full text articles on CD-ROM has a brilliant future. Up to 500 journals are updated frequently in this inexpensive format. (Most Research Libraries have this station.)
Some of the commercial full text databases have emerged online too. Northern Light presents this. Unfortunately, the better quality articles are not included in these databases. It is not an absolute rule but to date, many of these commercial databases are filled with regional business papers, newspapers or similar middle to low quality publications.
There is another system for accessing articles, which comes to us from a very long time ago. Inter-library loans are a system worked out between libraries so articles can be exchanged between libraries. Naturally you need the assistance of a library - and a great deal of patience. Such requests can take over a month to arrive.
Lastly, there is always the option of direct purchase of periodicals from the publisher.
Carl Uncover service (fatback articles).
CARL (www.carl.org) is one of the great library groups in North America established a service to provide articles by post or fax. Carl promises to fax articles provided you use their system to check one of their many libraries has the required document.
Northern Light - online database of articles
Northern Light (www.nlsearch.com) is a search engine of both the web and their own database of articles available for purchase. The rates are cheaper than Carl (up to $4.00 per downloaded document) and the articles are delivered over the internet (not faxed) but the range is smaller.
Many of the databases will begin to offer their services either as a pay-per-view, or through reasonable direct subscription methods on the internet. This has been predicted for years but depends on the emergence of a fine way to purchase cheap items on the internet: digital money. No effective digital money has emerged yet, and most databases will either wait, or try one of the existing incomplete methods. Essentially, critical mass has not yet arrived, and it now appears that the true fall in price of information is waiting on an effective digital money. In preparation, magazines and newspapers are purchasing all the rights possible - especially the electronic rights. More appears on this topic later.
Webpages are often of unknown age, of only guessed at quality and potentially the easiest information to retrieve. There are many points of entry to web resources, but search tools differ. Try to match your search tool to your question. To start, you will need to learn something of the different tools - described below - and four basic search techniques: Boolean, Proximity, Field Searches & Truncation.
Global Search Engines
Altavista (altavista.com) includes a very large, fast search engine. It allows for Basic Boolean AND + NOT - OR | Proximity " " ~ (near - within 10 words of each other.) Several Fields: title:"Spire Project" domain:gov url:edu link:cn.net.au and Truncation/Wildcard (*) Of import, Capitals matter with Altavista.
All-the-Web (www.alltheweb.com) is important because it is large - really large - with a flexible search facility. Allows Partial Boolean + - Simple Proximity " " and Several Fields a title field search normal.title:spire url field url.all:.au link text and link url fields normal.atext:spire link.all:cn.net.au All-the-Web is not case sensitive. The same database supporting All-the-Web supports Lycos.
Inktomi (via hotbot.lycos.com) provides its substantial web directory through other companies, in this case, HotBot. also allows searches by region, by date, and more.
Debriefing (www.debriefing.com) is our meta-search engine of choice. Use this to find names & named websites. Accepts Partial Boolean + - Simple Proximity " ". Capitals matter.
Google(www.google.com/) is a new style of search engine which ranks sites with more care and concern. This works well for sites you know a little about in advance. Unfortunately, has no useful field searches. Allows Partial Boolean + - Simple Proximity " ". Unfortunately, No Truncation not even for plurals!
When searching for a topic with precise descriptive terms, use a broad search engines. Always place the Boolean +symbol before each search word (like this: +word1 +word2) to insist all words appear in the results. Quotes keep words together ("word1 word2"). These two simple steps dramatically improve results. Keep adding words and search limits until the number of hits is reasonable.
For more global search engines, there are numerous lists to consider like the W3 Search Engines page at the University of Geneva (cui.unige.ch/meta-index.html#INF) and the Industry Research Desk (www.rbbi.com/links/sengine.htm).
Meta-Search Engines & Google
If you know something of the destination already, like a title or company name or full name, try using a search tool that excels in finding named websites. There should be little difficulty in finding such sites with either Google or a Meta-Search engine, but don't get excited and use these on other occasions.
When searching for information that lends itself to a particular category or topic, start with resources which group information in categories. With few exceptions, these resources index websites, not webpages. Also, keep your search words simple as these are small databases.
Yahoo (yahoo.com) is the largest of this type of directory tree; the definitive site. Accepts Partial Boolean + - Simple Proximity " " Truncation * and Several Field t: (for titles) u: (for urls) and a date field through a form.
The Open Directory Project (dmoz.org) is a Netscape effort to, presumably, mute the strength of Yahoo. It is very good, and very similar to Yahoo.
Looksmart (www.looksmart.com) is another significant directory.
For an alternative, try the World Wide Web Virtual Library: Subject Catalogue (vlib.org/Overview.html), a distributed network of subject lists, not nearly as dominant as Yahoo, but far more "scholarly" shall we say. This virtual directory has been around many years, previously famous from www.w3.org.
When seeking specific fields of study, when topics are clouded with many similar, low quality sites, start with resources with a greater degree of personal attention. Peer review and vetting produce resources with more quality but limited coverage, better suited to this situation. Also, keep your search words simple.
The Scout Report (wwwscout.cs.wisc.edu) is one of the oldest and most highly regarded e-newsletters introducing new internet resources. Residing at the University of Wisconsin, the Scout Report describes research, education & topical sites. The Scout Report Signpost provides a quick search of previously featured sites.
BUBL (www.bubl.ac.uk) is a British site which reviews internet resources then indexes by Dewey decimal number. I prefer their Dewey presentation but the collection is not large (though the largest of the library projects I have seen).
The Argus Clearinghouse (www.clearinghouse.net) is a vast collection of internet guidebooks. We can search the titles & descriptions, but then click on the highlighted keywords to find related guides. I suspect Argus is not successfully keeping pace with internet development.
AlphaSearch (www.calvin.edu/library/searreso/internet/as/) is similar to Argus. This one indexes important nexus sites and should be browsed.
The Britannica.com (as in Encyclopedia Britannica www.britannica.com) has been remolded as a free guide to books, periodicals, web and their encyclopedia. This encyclopedia is perhaps the best.
FAQs can be searched from an FAQ database like the one at www.faqs.org
WebRings list sites by topic. Each webring is maintained by a volunteer at an uninvolved site using standard software. The primary sites are currently Webring.com and bomis.com
For issues with a particular government, url or language origin, consider using tools designed with this in mind.
* Altavista can be limited to specific domains (gov edu au) with their "domain:domainname" field search. "url:url-segment" is also useful. Read the Altavista Fancy Features for Typical Searches.
* GovBot (ciir2.cs.umass.edu/Govbot/) as developed by The Center for Intelligent Information Retrieval (CIIR) is a search engine which indexes exclusively a great number of government webpages, a unique resource.
* Altavista also allows for a field search by language. Searching for a Japanese site? Consider searching only webpages in Japanese.
* Purely regional search engines may also be the answer. Aussie.com.au, for example, is a search engine indexing only Australian websites. There are fine lists of regional search engines and directories like SearchEngineCollossus, Search Engines WorldWide, SearchEngineWatch and Yahoo.
* Topic-specific search engines, a new arrival, has a very promising future. Ideally you will find a search engine like ChemGuide (www.fiz-chemie.de/en/datenbanken/chemguide/)covering over a million chemistry related pages. Search Engine Guide (searchengineguide.com) and Gary Price's Direct Search. (gwis2.circ.gwu.edu/~gprice/direct.htm) list topical search engines.
* Lastly, there are some commercial databases aimed at the software and internet industries. Consider OCLC's NetFirst (articles from magazines describing the internet).
For many of us, searching the web is simply typing words into a search engine. I hope I have shown there is more to it than this. What may not be clearly evident from a brief overview of resources is that each resource has a particular difference, a particular focus, a particular angle that helps us answer certain questions faster than other tools and searches.
Yes, in the simple world of Yahoo and Altavista you pay no attention to the specific differences between alternatives - you are left with the worst of these two tools. Your results are general, timeless and imprecise.
Contrary to myth, global search engines are not the best place to start most of the time - just some of the time. On other occasions, start with a directory, a meta-search engine, a guide, an FAQ... We should be able to identify which tools excel at locating what kinds of webpages. (There is no simple search of everything.)
There are more insights into effective internet research. Information clumps; Information is not established in isolation but instead develops in context, is reinforced, and becomes a trend. The publishing motivation & promotion purpose can help us rapidly judge the content of a website. The webpage address can tell us a great deal about both the website structure and the type of publisher.
Once skilled, you can segment and search the most promising areas of the web quickly and efficiently. If you do not quickly find your answers there may be other, more appropriate resources. Consider asking for help in an appropriate discussion group, or reviewing printed literature instead. The Web is only one resource among many.
If your primary interest is Search Engines, consider reading A Higher Signal - To - Noise Ratio (www.dpi.state.wi.us/dpi/dlcl/lbstat/search1.html) by Bob Bocher & Kay Ihlenfeldt, Sink or Swim: Internet Search Tools & Techniques (www.lboro.ac.uk/info/training/finding/sink.htm) by Ross Tyner and The Search is Over (www.zdnet.com/pccomp/features/fea1096/sub2.html) by Adam Page. For even more, read Searching the Internet (wwwscout.cs.wisc.edu/toolkit/searching/) a publication in the Scout Toolkit and browse Search Engine Watch.
Searching the web is more skill than most of us acknowledge. The web is a manifestation of the demon professional researcher's work with all the time in the commercial information market. There is constantly the fear you have missed that single important site with everything. Consider the researcher's motto:
Someone, somewhere, probably knows the answer.
But how long do we search for gems, and where do we look? To decide, we must learn about internet structure and organization. Why is information published on the web? Why is it promoted? Let's review the reasoning behind effective internet research. There is so much more than putting words into search engines.
We can make some very astute generalizations about a webpage very quickly if we can judge the reason it was published. Not only is this an important step in analyzing any information, but this tells us a great deal about the contents of the webpage.
Yes, merely determining a site belongs to an association actually specifies the quality, motivation and type of information we will find.
Associations either publish what is termed 'brochureware' (promotional material), or if well advanced, present research work previously restricted to the association library: important research studies & the like. Commercial interests have much more difficulty delivering useful resources. The importance of projecting a corporate image comes first (lots of 'brochureware'), and service descriptions come second. On occasion, commercial interests will support a worthwhile service tied closely to their own service - thus banks present interest rates - bookstores present their book database.
The certainty with which we can make these judgments will astound you. Corporate websites never publish "changes to patent law". They simply don't have the motivation. Only an individual would publish this, most likely not on the web but though a mailing list.
Information is not distributed randomly. Consider Format, Preparation, Motivation and Promotion. Consider this, then Visualize the information you seek.
We can make further snap judgments about web information from the way you get there. Promotion is very difficult on the web, and it is hard to find poorly promoted information. The tools you use to reach information pre-determines the type and quality of information you will find.
Search engines index webpages indiscriminately. Advertised websites must have a pay-off. Directories focus on established websites (not webpages). Link pages also link to established websites but put more thought into the selection of resources. Both usually focus on general sites. For specific or current resources, we need to move to mailing lists or active nexus point.
Yes, when we find a webpage through the Scout Report (a prominent resource discovery newsletter), we can assume the webpage has a high quality of information, is reasonably current and has a general appeal (within the interest of the newsletter readers).
Let's put this in reverse. If we are looking for a recent document by a prominent library committee, we will not find it through Altavista, Yahoo, or normal link pages (except accidentally). We may find it through specialist newsletters, active nexus points, or through mailing lists.
When an artist begins to paint, they visualize the image. They already have a concept of the finished result. Internet research is no different. We start by building a vision of the information we seek. Who would publish it. What is their motivation? Who would promote it? Where would I find it?
Information Clumps. Information is created, nurtured, develops, gets transplanted, gets arranged and becomes visible through a process which brings similar information together. Your understanding of this process, including motivation and promotion, must guide your search of the web. Only then will we know where to look, and quickly know if the answers are on the web.
Shakh was invited to travel with the army on the conquest of Nubia. The Egyptian army was not in need of further soldiers but there was a need for a witness. Shakh would write the official chronicles of the army's exploits. He would be expected to send a simple diary on papyrus back to the palace and then to compose numerous descriptions for memorial walls. He may also be consulted for paintings on the pharaohs tomb. It was a fine offer, and he relished in the prospect of increasing his value exposure.
The war was not swift, nor was it entirely one-sided. In the end, superior numbers had its effect and Nubia was once again reunited with Greater Egypt. Reporting was initially a challenge, since very little happened from day to day. Slowly, Shakh got a handle on the process and focussed on the grandness of the venture. Two years after floating up stream, Shakh was able to do his finest work, the parade of captured soldiers past the Pharaoh's representative.
News articles are typically light and biased. Do not believe a news item is a great critical analysis of current events. Most news is produced under time restrictions, for prompt consumption. In research, news often proves particularly useful for locating information about individuals or businesses. News is also critical in creating a timeline of events, in recording events of regional/national/international importance.
News prepared by individual reporters is collected together by large news organizations, then delivered to other news organizations around the world. Your local news organization does not have a reporter in Iran, but rather buys the story off a newswire, then packages it in your evening news hour or morning newspaper.
You have probably heard of: United Press International (UPI), Reuters Global News, Agence France Presse, Associated Press and Xinhua Chinese Newswire. These very large organizations make their information available to you in a variety of ways. News collects in commercial databases of past news, some single source, others, large multi-source databases. Current news is also packaged into large multi-source systems delivered by email or newsgroups. Many newswires are available online free of charge.
Critical to the changes on the internet is the emergence of free access to text news. Individual newspapers present news free. Newswires present news free. News sections to larger sites like Yahoo present news from many sources, free. News-only search engines will help you find information from a great many sites with news.
The process of finding current news is about as slick as imaginable. Here are a few players in the market:
* Yahoo News (www.yahoo.com/headlines/) is leading this field with web delivery of current news from Reuters, Associated Press, and others. Yahoo also includes a free search for one week's news.
* Voice of America Newswire (VoA and now voanews.com) delivers news in English & many other languages.
* The Washington Post (www.washingtonpost.com) offers their own current news for searching, as well as the Associated Press wire, each searched separately for the past week.
* Fox News (www.foxnews.com) presents current news online (both current events and sport news). CNN news (www.cnn.com) is another searchable site. Both repackage some newswires and present them online. C|news (www.news.com) does this too.
* Newsbytes (www.newsbytes.com) is a newswire solely on computer topics, computer, telecom and online world. InternetWire and other specialty newswires also present news from their website.
* United Nations Radio: The World in Review is one of many news shows with the transcripts online. Unusually, the Vatican's newswire is not free online.
* Obviously many more exist - and thankfully we don't need to create a list or manage the sources. The Spire Project has a clickable map of English language newspapers. There are definitive lists of global newspapers like Gary Price's gwis2.circ.gwu.edu/~gprice/newscenter.htm#International dailyearth.com and ipl.org/reading/news/
The commercial segment of the news market is obviously being squeezed by the copious quantities of free news online. There are, however, still some viable markets, principally enterprise solutions (companies are willing to pay for slight improvements), past database access, and surprisingly the Wall Street Journal (US$49/yr).
To these markets we have Clarinet and Newspage. World News Connection is US Government service presenting translated news (quite a gem) as a searchable database. Unusually, prices start at US$25/7days - yes one price for the news!
Of course news alerts can be arranged from the commercial news databases through the database retailers, and each newswire like Agence France Newswire, Canada Newswire, Xinhua News and Associated Press all are unique databases, and all stretch back many years. Further databases like Newswire ASAP and what used to Global Textline are massive databases of multiple newswires and newspapers. I recall at one stage Textline had over 4 billion pages.
News articles are typically light and biased. The sheer quantity of news in the large news databases make this a useful resource to fall back for any tightly focused research topic. I once discovered an obscure scientist working in a unique field from a small 3 paragraph article in a local farmer's newspaper in England (Global Textline Database).
Newswires and News Databases are just two elements of a large industry which extends to the your local newspaper and to further specialty databases. Most newspapers maintain their own local news database, and some make this available electronically. A manual clipping services may also be the option - certain firms manually page through local papers looking for advertisements or articles.
While on the topic, certain newswires like Business Wire and PR Newswire essentially distribute certain types of news for money. Yes, anything in these newswires is there because the company paid for it to be there - $500 and up most likely. Other newswires earn money in the reverse process: from the media who read or publish their work. Associated Press or Reuters are created from news organizations. Others like Voice of America (VOA) are alternatively funded, but with reasonable reliability.
There are also a range of focused newswires such as Newsbyte (computer issues), PR Newswire (product releases), and Middle Eastern newswires. Further newswires can be found at Yahoo.
I can think of four ways to use this information for research:
1) As an alternative to your evening news or morning newspaper. Online news is available 24 hours a day, in more detail, from respected news organizations.
2) Search past news to locate information unlikely to emerge in journals or magazines. News includes a great deal of local detail and personal information unlikely to be found elsewhere.
3) As a historical record of events, perhaps the basis of a timeline.
4) Current Awareness and Alerts so articles come to you as they are reported. News stories by email will become a large industry over the next two years.
Just how inexpensive can news become? US$25 gets you access to past translated news! VoaNews.com keeps a searchable directory back a month for free. Many newspapers still have extensive archives of news, though they hope to one-day charge for them. In a way, no-one is making money from news. It is only worth the advertising revenue for distracting you from reading the news - and that is falling too. With the freedom of moving information through the internet, several free services will send you email when an news article matches your interests (an Alert).
The future will see much more "compile your own" newspaper - especially since it could conceivably be compiled at minimal to no expense depending on the technology (frames anyone?) An intriguing lawsuit recently stopped TotalNews (a news only search engine) from displaying news articles in a frame.
If allowed to speculate for a moment, News-for-Pay may also become a viable businesses. Perhaps this is just being cynical of journalistic standards and the accepted standards of promotion. Perhaps it is also recognition that Businesswire and PRWire are just two of several newswires where you pay to have your news included. Obviously news today is biased towards advertisers (through advertorials) and promoters. Perhaps this will become automated some day - like Yahoo's "we will look at your site right away for $200".
Naturally, the links and many of the forms to news resources discussed here can be found at spireproject.com/newswire.htm and also our All-in-one page: spireproject.com/spir.htm
Theses and dissertations are professional papers completed for higher degrees. That is to say, they are long, dense and often very esoteric and convoluted. Trouble is, most theses and dissertations have no more than 12 copies ever - one always to the University Library, one with the author, but others scatter to the wind.
All University Libraries hold a copy of past theses undertaken at their university. This gives rise to the unfortunate but necessary pastime of searching each local university library for relevant theses. The advantage here is masters and occasionally honours theses are indexed. Most often, just undertake a keyword search then add "thes*" (truncation of theses or thesis).
Electronic Theses Databases:
Dissertation Abstracts Online, produced by UMI, delivers abstracts to most every doctoral dissertation/thesis in North America, some master's theses and some international theses. This is the definitive site to search, though you will need the help of your library to see more than the abstract. Some libraries will have subscribed to Dissertations Abstracts OnDisc - the CD-version of this database.
The [British] Index to Theses with Abstracts is a print directory by ASLIB. This publication is also available as a database, available for site licenses through Theses.com (www.theses.com). This source is quite comprehensive as can be seen with the University List.
Several other national databases do exist. Here in Australia, a list of theses was maintained from 1966 to 1991. The Gale Directory of Databases also lists THESA, a database of French theses, and Dissertations and Theses of the ROC (Taiwan).
The Australian Education Index (1978+), produced by ACER (Australian Council for Educational Research), is a directory listing citations and some abstracts to Australian work in education. Also available as a commercial database, AEI is bundled into Austrom, a common collection of Australian databases.
Digital Archives of Theses
In theory, some theses should be available on the internet, particularly theses lodged electronically. There is a push for universities to accept electronic thesis submission, and to build digital archives of theses. The embryonic National Digital Library of Theses and Dissertations (NDTLD - www.theses.org) is just one such a project. There is a distributed and sequential keyword search to participating universities through its not particularly functional. In theory, this is an incremental improvement to searching library catalogues.
Getting a thesis can be very difficult. You will need the help of a document delivery through a library and many theses will not be available to borrow. You can also buy theses. Read Obtaining Copies of Dissertations (www.library.yale.edu/ref/err/disscops.htm) by Yale University Library for more. For an alternative look at theses, consider Locating Theses (www.lib.monash.edu.au/hss/guides/fstheses.htm) by the Monash University Library.
A note on developments in this field: some Theses abstracts are emerging online already. Projects like the LA Theses Database (Landscape Architecture Theses Archive) have much promise but poor coverage. Full text theses presentation also have promise with the US Department of Education funding a National Digital Library of Theses and Dissertations and Virginia Tech starting to request electronic submission of all theses.
UMI (the producers of Dissertation Abstracts Online) has backed this move with a direct delivery service of electronic theses to US libraries for $26, but only theses held in their digital archives are available. Eventually, large digital Theses archives will be the norm, but until then, very little will happen in this field.
A thesis is a tightly constrained information package, produced in the university environment with limited appeal. For economic reasons, we should not be surprised theses databases are incomplete. The emergence of theses archives sounds interesting - a good use of the internet - but does not represent a financial opportunity that could be explored without government assistance. Consequently, this small area of the information sphere is government grant-driven.
A patent discloses certain facts about a commercially important invention in exchange for certain rights to exploit the invention. This is a little simplistic, but explains why patents are factual, unique from other research resources, and a little vague in certain specifics. If you have never seen a patent before, see a sample US patent , Australian patent, and this brief description (www.ipaustralia.gov.au/patents/P_home.htm).
There are three primary resources involved in patent research. Firstly, we have the free internet resources. Secondly, we have the national patent agency resources. Thirdly, we have the commercial patent databases.
Free Patent Databases
The concept of free patent databases has surely come, and while many countries are only slowly moving this direction, the movement is inevitable.
* The US Patent and Trademark Office (USPTO) provides a US Patent Bibliographic database at patents.uspto.gov with full use of fields, date and abstract text searching. Choose between their Boolean search, advanced (field) search or by US patent number. They also maintain a fulltext [US] Aids Patent Database and other resources.
* The IBM's Patent Server is a public service providing a different patent database of US Patent abstracts. The IBM service is similar but different from the USPTO service - certainly not less powerful.
* The Canadian Intellectual Property Office (CIPO) maintains the Canadian Patent Fulltext Database from '89. This database is on par with the US Patent Database, with perhaps even better searching technology.
* The Japanese Patent Office (www.jpo-miti.go.jp) has a searchable database of Japanese patent abstracts, including patent number, title, inventor, company, and abstract of the patent.
Patent Authority Services
Patent libraries are an important and cost-effective patent resource.
* IP Australia (www.ipaustralia.gov.au) (formerly the Australian Industrial Property Organisation (AIPO)) has a patent library in each Australian state capital. Each library provides free access to the APAS database (Australian Patent Abstract Search) and includes a complete microfiche copy of all Australian patents and the Australian Official Journal of Patents, Trademarks & Designs (the official Australian patent gazette).
Most offices also hold US Patents on microfiche! Staff will help you use the APAS database, arranged for free text searching by International Patent Classification. A particularly useful service by IP Australia is the delivery of copies of many foreign patents for AU$15. You will need the patent number, country and title for this.
* The US Patent and Trade Mark Organization (USPTO) has the Patent and Trademark Depository Library Program (PTDL's) placing the CASSIS database (The USPTO patent abstract database on CD-ROM) and US patents around the US.
The US patent libraries also hold the Official Gazette of the U.S. Patent and Trademark Office, The official US patent gazette. Importantly, the gazette is fully online and searchable from 1995.
* The [UK] Patent Office (www.patent.gov.uk) provides for the Patents Information Network (PIN) which hosts patent information in the UK. The British Library is just one listed source of UK patents (further information online) and delivers some patent services.
* The Canadian Intellectual Property Office (CIPO) (cipo.gc.ca) produces the Canadian Patent Index (CPI). They also publish The Patent Office Record, Canada's official patent gazette.
* There are many more national & international patent organizations like Intitut National de la Propriete Industrielle [France], World Intellectual Property Organization (WIPO) and European Patent Office. Thankfully there are fine lists of patent libraries and patent websites.
Commercial Patent Services
One of the most invaluable resources in serious patent research is access to several of the very large commercial patent databases.
* Lexis-Nexis (www.lexis-nexis.com) retails several patent databases. Thanks to Patscan (University of British Columbia), we also a guide to searching patents on Lexis-Nexis.
* The Dialog Corporation (www.dialog.com) retails a collection of patent databases including: Derwent World Patents Index, Inpadoc, Claims/U.S. Patents and European Patents FullText.
* CASSIS is the USPTO database. For a little more information on this, consider the Patent Guide to Using CASSIS, at the University of Michigan.
* Derwent Scientific and Patent Information (www.derwent.co.uk) is a prominent publisher of Patent and scientific information including commercial databases.
* Questel-Orbit (www.questel.orbit.com) also retails patent databases.
* CAS/STN (www.cas.org) retails a collection of patent databases including Chemical Patents Plus for U.S. Chemical patents.
In addition to the database retailers and producers, there is a lively industry of patent services.
* The Patent Libraries will assist you with some services. IP Australia, for example, will retrieve most full patents from other countries for AU$15.
Until recently, the legal profession has had a complete monopoly on patent work. As you can see, this need no longer be the case. Casual researchers will find the free patent databases easy to use, and more experienced researchers should not be dissuaded from searching the commercial databases or patent libraries themselves. The very large commercial databases, like Inpadoc, are particularly easy to use.
Of course, there are occasions when patent searches are critical, and experts should be sought. Certainly legal assistance is required if you are preparing to lodge your own patent, but patent data as a source of information is another matter.
As an industry, patent research is still deeply entrenched in the high-price commercial database and database-centered services. I am mildly surprised the emergence of free databases like the USPTO's patent database has not led to a fall in the costs of the high-end databases (which remain some of the most expensive databases in publicly accessible). It appears this industry, as indeed several others, has no intent to drop the price of retail database access to a more supportable level. I can only predict this rests on economic grounds. Patent information purchases are price insensitive.
Statistics allow us to lie with confidence. Dense and factual, carefully interpreted statistics are also far more reliable than personal experience. The expense of collecting meaningful statistics limits the types of organizations involved in this work. This divide is also a very elegant way to divide this field.
#1 National Statistical Agencies,
#2 Government Agency Statistics,
#3 Commercial Statistics,
#4 Association Statistics.
Statistical Abstracts (statistical bibliographies and statistical directories) describe sources of statistics.
Instat publishes "International Statistics Sources: subject guide to Sources of International Comparative Statistics" but I found this less than brilliant. A better link is Statistical Sources (by Gale Research), a basic and very large statistical abstracts directory.
On the internet, US government statistics are well recorded in Statistical Abstract of the United States 1999 (www.census.gov/stat_abstract) a 1000+ page document made available online in pdf format by the US Census Bureau.
Many statistics appear regularly in journals, annual reports and newspapers. Specialty libraries, particularly specialty librarians, may be aware of additional statistics.
If an expert goes through the effort to collect statistics, you are far more likely to locate them by undertaking an article search, (looking particularly for journal articles) and a book search. In both cases, limit your search to only the last couple of years or you will locate very old, dated statistics. A particularly sophisticated approach could be to ask BusLib-l (Business Librarians' Electronic Discussion List) since this is a mailing list of librarians. Use this resource sparingly, and only after having exhausted other avenues.
National Statistical Agencies
Most every country in the world has a single government agency dedicated to collecting, collating and publishing national statistics. Statistics Canada, Australian Bureau of Statistics, The US Census Bureau, The (UK) Office for National Statistics; we have a fine page on national statistical agencies (spireproject.com/bureau.htm).
These organizations manage the census, watch the movement of money and goods in and out of the country, and undertake a wide range of other surveys. Finding these statistics is relatively straight forward, with several directories on the internet.
Government Agency Statistics
Most government agencies collect reams of data on the industries they monitor. Sometimes these statistics are published, sometimes you have to ask for them, only rarely are they considered private or unavailable.
Here in Western Australia, the government departments for Tourism, Labour, Small Business and Big Business all publish top-rate statistics free to interested parties. Our Dept of Tourism keeps a directory of future tourism related projects.
When government statistics are bound and published, try the government book databases. Remember MOCAT, AGIP and part of UKOP are free online. Again, some US government statistics are well recorded in Statistical Abstract of the United States 1999 by the US Census Bureau, online in pdf format.
Valuable statistics only come from motivated sources, and associations are certainly motivated. Start with a list of likely associations, then call up and either explain you needs or ask for their price list for publications and statistics. For AU$25, the Australian Booksellers Association publishes a brilliant analysis of the book industry. Association statistics are financially informative, as the intended audience is association members.
Statistics created for sale are frequent in the financial sector but exist in a number of further situations. Banks use more professionally prepared market reports such as reports by the Australian economic consultancy firm Syntec Economic Services, Guide to Growth, which examines Australian industries financially with forecasts. IBIS (www.ibis.com), another economic consultancy, also publishes to this market.
Professionally prepared market reports are also emerging, with the full text immediately from the commercial information market. Each database retailer has several such databases, but often these databases are focused globally or in a different country. Sheila Webber (www.dis.strath.ac.uk/people/sheila) has a very good list of firms which market research reports.
Central to the Internet Revolution is the liberation of just this kind of information. Increasingly, we will see the publishing of such documents on the internet, but for the few statistics currently online, there is no effective search. You can only browse government websites. Away from the internet, you must either contact the agencies directly (in the hope they do collect statistics), look at the statistical directories or seek agency statistics in other documents: books, pamphlets, newsletters.
Once you have proceeded this far, it is wise to stop looking for statistics, and begin again at sophisticated commentary - which is likely to include supporting statistics or references to statistics anyway. Seek expert guidance from others who would know of hard-to-find statistics.
One approach to finding statistics is to reverse the process. Who would prepare the statistic? Statistics are created in a logical manner, in a very expected manner. Tourism statistics? - most likely undertaken by either the government tourism authority, a tourism association or the national statistical agency. There are few others who could even consider preparing tourism statistics. If you can think through the preparation process, you can usually identify who would have created the statistic. (Internet statistics are the exception - too many organizations are creating statistics of worth.)
Let's move on to specific fields of statistics.
National Statistical Bureau
The Spire Project has a fine html article on the National Statistical Agencies (spireproject.com/bureau.htm). Australia (www.abs.gov.au), United Kingdom (www.ons.gov.uk), Canada (www.statcan.ca) and United States (www.census.gov) all have national statistical agencies. Each organization collects and publishes statistics on many facets of their respective countries. This article should simplify your work in searching, selecting and appraising these sources.
Each statistical agency organizes their statistics in a distinct way. The Australian Bureau of Statistics (ABS) has an annual Catalogue of Publications but also a search function, specialized statistical category guides and several periodicals on new resources. The UK Office for National Statistics (ONS) has a statistical overview, product catalog and a search. The US Census Bureau has a collection of very large publication catalogues, directories and periodicals. Statistics Canada has several searches, publications and a catalogue
The two further elements to the statistical agencies are the statistical libraries and the unreported commercial statistics. The ABS has a dedicated statistical library within each Australian state, and collections of ABS documents within most public and school libraries. While the ABS documents within libraries are limited, the ABS libraries are very detailed with most every publication they create available for review. This is standard throughout the world.
While publications are sold by each statistical agency, and the publication catalogues are available online, each agency has data they sell in other formats. CD-ROMs of popular geographical and statistical distribution have become very popular, as have small area population statistics. Some of these services are packaged and sold for specific purposes, like 4-site by the ABS used in describing business locations. Even further, statistics can be generated specific to your needs. This might include ABS import and export statistics for specific commodities, or specific results from any of their surveys.
Lastly, Usinfostore.com presents a collection of economic indicators as time-series data. The statistics originate from several government agencies and is best considered as a value-added service: an intriguing beneficial trend?
National Statistical Agencies are certainly not the only source of statistics. They are, however, some of the easiest to access. These agencies also have several traits that distinguish them from other information sources.
Firstly, these agencies are legally required to disguise their statistics to protect the identity of specific businesses and individuals (with the exception of the Business Register). If there is only one or two timber exporters in Western Australia, the ABS will not give you timber exports from Western Australia. Specifics are found in directories like Kompass, commercial databases, or insider information (experts and articles by experts).
Secondly, national statistical agencies have a tendency to be old. Most surveys are not completed annually, but rather every two, three or more years. Census data is older still. The analysis process also adds a delay. The ABS tends to take a year or more to collate and analyze statistics. For Legal and Accounting Services Australia we have '92-'93 statistics, and the '95-96 statistics are due to be released early Nov 1997. Certain statistics like National Indicators are rapidly produced, but most are not.
Thirdly, national statistical agency publications are detailed - far more than most statistical publications. Commercial statistical sources often neglect supporting information like sample size and demographic breakdown, but expect these publications to include this and more. Publications may still require further analysis, and may occasionally come from inferior sources of information, but they are professionally delivered.
There are several ways to search each agency: (1)
Each agency has thoughtfully provided their catalogue of publications online. The links are above.
(2) Each agency collects certain information for analysis. It is helpful to become familiar with the various surveys and information sources used by each agency.
Besides the Census, the ABS conducts surveys of weekly household expenditure, agricultural land-use surveys, R&D surveys, and periodic surveys of various segments of the economy (like Legal and Accounting Services, Australia 1992-93). They also collect landing cards (tourism information), export and import documentation, regional hotel occupancy rates and more. Each statistical agency is similar.
If the Australian Bureau of Statistics (ABS) has not yet conducted a survey of hospital occupancy, they will not have this information.
(3) Agencies publish guides to information on a particular topic. They also publish various newsletters of recent releases and annual yearbooks too.
National Statistical Agencies are not the only statistics, nor particularly the best. They are, however, often the best source for demographic data, widely used by government and frequently re-published in other government documents. These agencies also provide a range of sample and national summary data directly from their website. Online statistics have not yet been organized, so I rather expect browsing the website for free information will be unwise, unless you are looking for simple national data.
At the successful completion of his work in Nubia, Shakh was invited to travel to Babylon as the assistant to the new ambassador. It had been many years since Egyptians were in official contact with the residents of the two rivers. All trade had been conducted through the Phoenicians living along the Mediterranean coast. With these cities captured by the Assyrians, new trade links were needed.
The journey took much longer than Shakh had expected. Leaving Egypt in a simple boat, it took many months to reach the shores of Lebanon, where the tall cedar trees grew. These trees, essential to crafting fine sea-worthy ships, was just one of the items sought by the Egyptians.
Within two weeks of their arrival in the Assyrian capitol Nineveh, the Ambassador fell ill and died. Without guidance, 18 months journey from Egypt, Shakh stepped into the position.
His first task was to gather information both of the officials best to approach, and of Egyptian goods most likely to interest the Assyrians. With few local contacts, Shakh set about building connections with other governments, dining with export officials, collecting information about how other governments had succeeded and failed in their trade requests with the Assyrians. Shakh knew success would depend on approaching the most practical of officials while delicately side-stepping the wishes of the officials who threatened, or felt threatened, by Egypt.
While it may be practical to divide all information into a collection of formats, information is also organized by others for our benefit. Libraries, commercial databases, journals, information archives, each of these venues will assist you to find particular information. The information is already gathered together, classified and organized for your benefit. As a skilled researcher, you must be proficient in finding information from these resources.
"The United Nations is involved in every aspect of international life - from peace-keeping to the environment, from children's rights to air safety. ... The UN system generates an enormous amount of information on some of the most pressing issues the world faces ... press releases, video and photographic footage, publications, briefing papers, etc."
Samir Sanbar, A Guide to Information at the United Nations.
United Nations documents are a recognized authority for any number of international issues: social, legal and political. You certainly will not be chastised for quoting United Nations statistics. Critical to research, the UN is a collection of almost autonomous organizations (called organs) with occasionally overlapping responsibilities, distinct websites, and recorded as distinct publishers. As you approach UN information, remember this is not a monolithic organization with clearly defined roles. All drug efforts are not coordinated by the UNDCP and all statistical work is not undertaken by the UN Statistical Division.
UN Internet Resources
The UN website at www.un.org is just one entry point to UN information. Of note, it contains a searchable archive of UN press releases stretching back to 1995, 7 days of press briefings, an archive section and information about UN publications. The real tool to use is UNIONS (www3.itu.int/unions/search.cgi), a meta-search engine for many of the larger UN organ websites.
UN Library Resources
The UN is an accomplished publisher, through their sales lists is not particularly large. It is just that anything they do publish is of a very high standard. Many documents are generated by the numerous meetings and efforts, so there is a second style of publishing, called Masthead or UNDoc documents, that are usually just photocopies. UNDoc are found in a collection of UN depository libraries around the world. (There is a good list at www.un.org/MoreInfo/Deplib/). Thus we have the UNDoc primary source documents and UN Sales Documents, given a sales document number and sold and shelved in libraries as books.
S/1997/742/Add.1, Report of the Secretary-General on the situation concerning Western Sahara: a brief breakdown of the estimated costs for completing the voter identification process in Western Sahara.
Other documents have wider appeal...
E.96.I.5, The United Nations and the International Tribunals for the former Yugoslavia and Rwanda - UN Blue Book Series
S/1997/742/Add.1, Abortion Policies: A Global Review, Population studies No. 129: A three volume, 650 page country-by-country look at abortion.
You can use the US Library of Congress Online Catalogue for a good approximate search of UN Sales documents. A search of UNDoc documents requires one of three comprehensive databases, like UN-Bis Plus, though you can also get the numbers to specific documents through UN periodicals like the Yearbook of the United Nations and the United Nations Chronicle.
With 300+ shelves of UN documents at depository libraries, the UNDoc files are excellent records to history. The UNDoc Current Index (ceased publication in 1996) is an extensive quarterly directory (of the non-cumulative kind) just for this purpose.
Further tools are available to help the dedicated searcher, like focused indexes and an annual list of current sales documents (also online).
Trouble with Age
United Nations publications do suffer time lags. The best documents appear well after the curve of public interest. Primary UNDOC documents will take up to 6 months before becoming available at a UN depository library and the Sales Documents are compiled after this. On the positive side, UN archives frequently extend back to the 1950s.
The UN has existed since the 1950s. The systems established to manage and distribute access to UN publications is at once both highly sophisticated and out-of-date. It is truly amazing to see 300 shelves of UN documents (a very big room mainly filled with stapled photocopies).
At the same time, it is only a matter of time before the whole concept of UN depository library is translated online. There is such potential savings (there are 359 depository libraries in the world but the UN pays for one in each country) and such an improvement in access.
All the links and a few of the forms for searching UN information reside at spireproject.com/un.htm
We pay a high price in both direct and indirect taxes for our government. These are intelligent people, paid to be informed. Government experts and documents are thus generally detailed, factual and reliable ... and helpful. It should not surprise you that government documents have a high quality, tend to have a little problem with time.
Central to finding government information on the web is the way the clear organizational structure is replicated online. Each country will have a primary website with links to the websites of each national government department. Each state will have a primary website with links to the websites of each state government department. Each department website will link to all sub-departments. If you wanted to see the website for the New Zealand statistical agency, just visit the New Zealand government website, then look for the statistical agency. If you wanted to see the website for the Mississippi government agency responsible for childcare, just visit the US government website, find Mississippi, then look for an agency that might be responsible for the family, then keep clicking till you find the page you need.
With a little more maturity, many corporate website were redesigned to present answers as they are needed by the visitors - instead of having marketing, accounting and distribution directories, websites were rearranged to have sections for customer sales, investor relations and distributor relations. Government website have begun the transformation too, with websites serving the perceived needs of visitors. Clever sites will present both structures but some will have an alternative structure linking you through to the agency website.
* There are two fine internet directories of international government websites, one by the University of Michigan Documents Center, another by the University of Southern California.
* There is a specialized, government-only webpage search engine called GovBot as developed by The Center for Intelligent Information Retrieval (CIIR). Altavista and All-the-Web also let you restrict a large global search to a specific domain. This allows you to search just for .gov sites.
* Government Publications are effectively organized in a national publication database. The US MOCAT database (Monthly Catalog of US Government Publications), the Australian AGIP (Australian Government Index of Publications (AGIP) and the United Kingdom Stationery Office publications list are all free online.
For information not available, many nations permit Freedom of Information (FOI) requests. This essentially forces government agencies to release information they can not justify keeping secret. FOI requests may cost you a token fee (and is often less for members of the media). The Electronic Frontiers Foundation (EFF) maintains a good FOI archive (www.eff.org/Activism/FOIA/), as does the Society of Professional Journalists (spj.org/foia/index.htm).
Commercial databases are simply collections of information presented electronically. Databases range in size from simple books made searchable, to several billion records in the larger news databases. The retail database industry is obscure. Costs are highly variable and difficult to determine in advance. Products with the same name may contain different information. Databases are frequently combined into larger collections of databases, (also called databases,) often several times, so an individual magazine or database may exist within several databases and several collections.
Within this confusion are a collection of definitive, must-search databases. Definitive databases are determined by successful marketing. Not necessarily the 'best', nor most useful, but the market-successful become definitive resources. From there, success breeds further value. Such databases will be invaluable in your search for answers. More discussion on the database industry can be found in section 9 of this FAQ.
At the edge of the database industry are a number of prominent databases that have emerged as free databases, delivered over the internet directly from their source. Look briefly at some of these databases:
* ERIC, (Education Resources Information Center) is presented by the [US] National Library of Education. Established in 1966, ERIC is one of the cornerstone databases for the education field and provides citations & abstracts to education-related literature.
* CRIS, (Current Research Information System) is produced by the US Dept of Agriculture (USDA) and includes Canadian, USDA, and Czech agriculture, food and forestry research. Projects sponsored by these or affiliated agencies are included
* Agricola is produced by the [US] National Agricultural Library and its cooperators. This is an important bibliographic database covering agriculture and all the related disciplines (including forestry & agri-business & alternative agriculture). Started in 1970, this has become an important database limited only by its bibliographic nature.
* Thomas, presented by the [US] Library of Congress, delivers US legislative information (including Congress, Representatives, Senate & the many committee reports).
* EDGAR, produced by the (US) Securities and Exchange Commission, delivers all public US company submissions as required by law. The information is factual and numerical - and includes both current and past submissions.
* MOCAT, UKOP and AGIP are the US, UK and Australian government publication databases
* The Library of Congress, The British Library, and The National Library of Australia card catalogues can be searched online.
* Medline is produced by the [US] National Library of Medicine and delivers references to all areas of medicine (including nursing, dentistry, nutrition), with some abstracts.
* The United States Department of Energy (DOE) publishes The DOE Information Bridge, a database with full-text and bibliographic records of DOE-sponsored research and development. Covers research projects in energy sciences and technology.
* BIOGRAPHY(r) Online is published at www.biography.com and includes 15000+ biographical abstracts - but most are really really short.
For more free bibliographic databases, I strongly suggest you read Bases de données gratuites (urfist.univ-lyon1.fr/gratuits.html) by Jean-Pierre Lardy. This directory has over 200 entries! Use the Altavista Babelfish to have a look at it.
Gale Research produces the Gale Directory of Databases (in 2 volumes). This is the definitive listing of databases in the world, for the moment. Most large libraries will have a copy. New editions are released every 6 months.
There are also smaller, more focused directories like Fulltext Sources Online published by Information Today or The Directory of Australian and New Zealand Databases by the Australian Database Development Association (ADDA).
You will access commercial databases through one of five basic sources.
1_ From a Commercial Database Retailer,
2_ From alternatively funded (free) internet sources,
3_ Through a Library or other venue with a site license,
4_ With the help of an Information Professional (searching for you),
5_ Directly from the source with a personal subscription.
Consider the Commercial Database Retailer as the department store of the information market. The industry is dominated by a handful of dedicated retailers like The Dialog Corporation, Lexis-Nexis, and InfoMart. Other retailers focus on certain types of databases.
Retailers select the databases they carry, and enjoy mark-ups in the region of 300% to 400% from which they provide customer service, support and promotion. So very much service and promotion is provided that these retail giants hold a pivotal role in the distribution of commercial databases.
The most important selection tool for databases is the database description. These are factual, accurate descriptions of what each database includes and how they can be searched.
Many of the database descriptions are online. To facilitate finding these, we have added links here and in other articles. Further descriptions may be available from retailer websites.
A list of database retailers follow.
* The Dialog Corporation (www.dialog.com), a merger of Dialog, Datastar and M.A.I.D. The largest database retailer by far, the databases are general.
* Lexis-Nexis especially carries full text and legal research databases.
* Questel/Orbit specializes in patent and technical science databases
* EINS (European Information Network Services) appears offer discount access to technical databases.
* Infomart Dialog (Canada) has Canadian coverage with many of the Dialog databases.
* FT Profile is the information wing of Financial Times (UK).
There are further database retailers specifically focused on the library market like OCLC's FirstSearch. Further databases are focused on business needs, like DowJones and Dun & Bradstreet.
In addition, there are always the individual databases which undertake the difficult task of retailing by themselves.
Databases are complex structures based on the inverted index and on a range of search technologies including Boolean terms, truncation, complex limits, descriptors, filters, ranking and more. Certainly the technology is becoming easier to use (look at the Reuters Business Briefing for state of the art), but there is still much to learn. An experienced searcher will locate far better results than a novice. However, an uninvolved searcher has a handicap, both in price and language. Sometimes it is wise to get help searching a database, sometimes it is not.
The commercial database industry is shifting to use the internet as the preferred delivery vehicle. Considerable changes are coming too - not the least a tumble in the price of information.
Another change is a move towards full text databases. Some databases include only bibliographic information, many provide abstracts, but only a small fraction include full text. This will frustrate you deeply as full text databases are so very very convenient.
Researching databases is incredibly difficult and cumbersome. They challenge the mind, stretch far beyond the simple skills of searching the internet, and since every minute is expensive, there is much added pressure.
But this is a skill like any other. Practice with the databases of your local research university at an off-peak time (mornings are good) and using the CD-ROM versions - learn on something free and not 2$ a minute.
A database is a collection of anything - meaning a database blissfully passes on the chaos for us to deal with rather than presenting a more logical/understandable front like the web (humour intended). This character has also blurred the contours of a database. Most small databases are merely digested versions of small books and directories, often made available to you at 50 cents a page. Of course, large databases are just hard to conceive, let alone describe. Word-searchable libraries? World knowledge snapshots? Commercial information marketing firms go further and group similar databases together into massive multi-database topic searches with phenomenal power.
A Myriad of Databases
A primary difficulty comes from the sheer number of databases in existence today. To get a feel for the size of this industry, stop by a large library and ask for the Gale Directory of Databases Volume 1: the partially definitive listing of global databases. The absolute number will astound you. This also explains why some of us are so excited about internet development. Just making the existing databases more easily available will transform our society. The Information age is just starting.
All research is guided by the resources at hand. Most amateur researchers suffer because they have very few resources at hand (or think they do). Research is also guided by the budget, the time and perhaps the skill. When selecting research databases, try to be aware of three further factors:
Research here is easiest on Australian, British and American resources. This may be unfortunate or of little consequence, but does bear consideration. Many large databases are also large only because of their range of information. Which is better, searching 6000 magazines or 600 business magazines. Depends on the research topic.
There are many databases which can claim definitive coverage but there are many more which should be kept in reserve. Just like the internet, a researcher is not expected to look at everything relevant, just enough to get to the solution.
Global Textline was a database of phenomenal size, indexing text from over a hundred newspapers globally, reaching back many years. Australian Education Index (AEI) includes the contents of a small book of Education related theses abstracts. Each topic may only include 10 relevant theses over 5 years. Size is a thus linked to database value. Searching Global Textline will always turn up leads. AEI will not.
Selecting a Database
Despite the factual nature of information research, word of mouth appears to be tremendously important in choosing databases. Some guides do describe the quality of various databases, and make valuable suggestions, but such guides also age rapidly as new products emerge. A rough understanding may emerge with practice. Our advice appears in other articles.
Mailing Lists, Newsgroups, Associations - each are focal points of discussion, exchange of information and professional development. Sometimes called Special Interest Groups (SIGs), these are the original sources of many fine research resources. Brilliant research sites in their own right, a mailing list, newsgroup or association can also be a fine contact point for experts, or the site of focused, specialized libraries.
The copyright mailing list is a group of more than 100 lawyers who focus on copyright. This list, and their Copyright FAQ, are the best resources on copyright law in the world; current, factual, and peer-reviewed. This is not unusual for a mailing list. As a source of experts, I once found an accomplished but poorly published scientist from an old message in a mailing list archive.
Having said this, discussion groups are not organized for casual searching. Even when discussion is archived and searchable, finding and searching past discussion tends to be difficult. There is more to this resource than just asking a question but the other options are not simple.
* Tile.Net/Lists (tile.net/lists/) has a fine index of mailing lists.
* Liszt is the second place to look.
* The Directory of Scholarly and Professional E-Conferences, known also as the Kovacs Lists is third.
* subject guides listed in the Argus Clearinghouse also refer to relevant mailing lists.
Search several list directories for more rewarding results. Also keep in mind some lists have too little or too much traffic for your purpose. Find a list with a manageable number of messages and a wide enough membership. This takes a little effort in interrogating the list management software for the number of forum members, a look at past discussion, perhaps a look for supporting websites.
If you have a newsgroup reader, you have a file called news.rc on your computer which lists all the available on your computer. List.com also has a searchable list of newsgroups. Duke University can help you find additional newsgroups that exist but require you to ask your ISP to bring in.
A more effective approach is to undertake a search of past newsgroup posts and select from the response a list of likely newsgroups to consider. Altavista allows searches of recent newsgroup messages. Deja.com has an even larger archive (to before March '95).
Another option is to search for an FAQ (like this one). Most summarize past discussion on successful newsgroups. The FAQ may be a brilliant informative document in itself, or the definitive pointer to further tools and resources. By virtue of its public origin, FAQs are far more likely to attract the peer review often very lacking from other resources. They are also open invitations to communicate with the knowledgeable FAQ maintainers.
* FAQs can be searched by title by sites like Oxford University and Universiteit Utrecht (Netherlands), or if you know a newsgroup, visit an html FAQ archive like the one at www.faqs.org
Associations are more involved than their internet companion. Associations are also more into paper publishing, conferencing and collating specialist statistics. As an example, the Australian Booksellers Association publishes the best benchmark statistics on this topic. When approaching an association, consider asking for their publications list.
Directory of Associations are national directories. The [US] Encyclopedia of Associations is produced by Gale Research. The Directory of Australian Associations is the definitive Australian source. Directory of Associations in Canada. Directory of Association of Asia.
Some association directories have emerged online, like Directory of the American Society of Association Executives. Unfortunately, the database is small & Americanocentric. A search for 'book' did get me the address of the American Booksellers Association, but not others. Of course if you have a name, you could also use a meta-search engine like Debriefing. Alternatively, the Library of Congress Online Catalogue allows us to search for association as an author.
There are three important research applications for mailing lists.1) Research through past discussion, 2) Directly ask members for assistance, 3) Become a participative member to pick up and exchange information. On a personal side, mailing lists are easy to use and a minimal investment in time (the information comes to you). However, mailing lists are difficult to develop and maintain. Few reach the potential brilliance of this form of communication, so many of the forums you come across will be non-existent or on their death-bed.
Mailing lists depend on four vital ingredients - Content, Participation, IT-support, and Management. Often, one of these go wrong and the forum dies. As a member, there are important obligations starting with participation, and ending with forum etiquette.
The better forums are private. Membership is not automatic, the list manager has more control, and often, more control and effort is expended developing interesting content and discussion. If you find a closed or private forum, persevere.
When a group of like-minded individuals come together to achieve an aim, they often create an association. What better place to research. Even better, associations often interpret their purpose as a place to pool and distribute information. Larger associations often maintain a small library of their own and many associations publish documents about their area of interest. Furthermore, if you are seeking an expert in a given field, associations are sure to have one, or two, or many. For the smaller associations, be polite but firm in describing your interest and be ready to buy whatever small book they do publish in your quest for further information.
An FAQ is created to enhance the discussion of a newsgroup. After a time, the initial members of a newsgroup would have discussed many of the standard topics to death, which newcomers will still find interesting. To prevent only discussing introductory topics (and annoying long-term members) an FAQ is created to record answers to standard questions.
Because one of the primary functions of a special interest group is resource discovery - and because FAQs are collectively created, they are valuable and generally reliable. I consider the Official Copyright FAQ the best document in the world on copyright law.
As an aside, many FAQs are also available as web pages. Trouble is, without an system to vet true newsgroup FAQs, you are far more likely to encounter FAQs which have not been vetted by the news.answers team. The Official Copyright FAQ is 70+ pages of topical and factual detail with links to further information. There are several other copyright FAQs with less than 10 pages, (and not particularly concerned with providing information). Access an established FAQ archive for your FAQs. www.faqs.org has a small list (www.faqs.org/#FAQHTML). Another longer list resides midway down this document (www.faqs.org/faqs/news-answers/introduction).
Special interest groups are problematic because the task of preparing and presenting guidance is secondary to their main aims. Those that do actively publish do so through books (with the association as the author) or articles or newsprint... Sometimes, as in mailing lists, almost as an afterthought, past discussion is indexed and searchable.
This situation is not likely to change. Technology could potentially aggregate past discussion from many mailing lists, but too much commercialism would swiftly kill open discussion. Then again, existing efforts like the archive of the business librarians list have taken a very proprietary view of messages within their discussion. Notice also that a database of newsletters failed commercially a few years back for lack of interest. No dramatic improvements are likely to emerge from this direction.
Libraries are integral parts to the research process if for no other reason than public funds are used to buy the expensive research tools you will occasionally use. More and more libraries are extending their reference collections to include CD-ROMs and computer resources.
Specialty libraries are special. Focus allows for far greater expertise and innovative research resources. Specialty libraries are prime research venues, and specialty librarians are considerable reservoirs of research expertise. All government agencies, and many large corporations & wealthy associations, have specialty libraries. While many may not invite public access, almost all are universally open to you.
* Very large libraries, by virtue of their sheer size, become important research resources. This would include the US Library of Congress, the British Library, the [UK] COPAC unified library catalogue, the National Library of Australia, and the National Library of Canada.
* To find a specific library websites, visit either Libweb (sunsite.Berkeley.edu/Libweb/ ) or Libdex (www.libdex.com) or a few other link sites.
* A directory of specialist libraries will direct you to the highly focused libraries found within corporate, association or government organizations. An Australian directory exists online. The Directory of Special Libraries in Australia by ALIA is the definitive source. American Library Directory is a commercial database and probably a print directory too.
Note: All these libraries will probably let you access information - if you come asking kindly with specific information in mind. Always ask how you would gain access, and assume access is possible (though not policy).
There are also a collection of mixed information directories which are research-worthy. Croner's A-Z of [UK] Business Information Sources and the Aslib Directory of Information Sources in the United Kingdom are prominent examples. These directories appear to be less than definitive but the ASLIB Directory (the larger of the two at 1500+ pages) is certainly something to behold. Aslib, under the subject "Egypt" lists the British Museum, the Egypt Exploration Society, the Tutankhamun Exhibition, and the York College of Further & Higher Education - all with really good contact details.
Zines, Magazines, Journals and Newsletters; each incorporate the valuable services of quality control, editorial input, and focus. Newsprint, though similar in concept, is best dealt with separately.
The trouble with using periodicals in research is their unfocussed view of the world. Reading through a topical periodical is such a passive approach to finding information. The information is likely to be interesting, but hardly likely to answer your questions. At best, you are 'keeping up-to-date' in your field.
The solution to this is the database search of either full-text or bibliographic/abstract information from a great many periodicals.
Before we reach for the database search, let us run through the ways to find periodicals.
* Zines are listed in three primary online directories: John Labovitz's E-Zine-list, the NewJour mailing list, the ARL Directory of Electronic Journals, and by browsing some of the university zine collections.
* Print periodicals are listed in three primary directories: Ulrich's International Periodical Directory, EBSCO's Serial Directory, and Newsletters in Print, and by browsing the periodical collections of primary libraries like the Library of Congress.
* A few further online lists of periodicals exist like one for US magazines and another for Australian Magazines.
Since periodicals are a passive form of research, a search for promising periodicals is not the usual way of doing a search. Organizations will often subscribe to promising periodicals then circulate them among interested parties, facilitating the passive collection of information.
The directories above represent one way to find promising periodicals. A better way is to search the databases for promising articles, then paying attention to promising periodicals which appear frequently.
Certain questions require country specific data. The internet is a fine source for this kind of information, dominated by data from large international organizations (the UN, World Bank and WHO) and government departments (CIA, UK Foreign Consular Office, Health Canada, Australian Department of Foreign Affairs). This works in our favour: such information attains a higher standard of quality than might otherwise be expected on the internet. The down side: current information is difficult to locate. Further commercial compilations exist with particular strengths in economic analysis.
The Spire Project maintains a very fine html article on country profiles, in many ways a flagship for our approach to assisted research. All the links are on this article, so we will merely describe available resources here. Start at spireproject.com/country.htm
As a fine example of liberating information from previously limited circulation, country-specific data has flowed from many a government and quasi-government institution. So much information, of such high quality, has become available that several commercial interests have abandoned the field altogether.
* International Travel Advisory Reports from USA, Canada, Australia and the UK cover details of importance to travelers like health care, crime, current security issues. These travel advisories only mildly overlap so try to read each one and take note of the preparation date.
* Country Health Reports are released online from the CDC, Health Canada, World Health Organization (WHO) and the Pan American Health Organization (PAHO).
* General and Demographic Country Profiles originate from the CIA, [US] Library of Congress, US Department of State, UNICEF, US Census Bureau, World Bank and the UN Statistical Division.
* Social profiles and detailed social incident reporting originates from Amnesty International , the Red Cross, US Committee for Refugees, the United Nations High Commissioner for Refugees (UNHCR), US Department of State, Refugees.org cover Human Rights, Refugees and Armed Conflict in great detail.
* Economic Country Profiles are released by the governments of New Zealand, Australia, United States, The OECD and the World Bank. More market related profiles also exist from the EU, the US and the World Trade Organization (WTO).
What this means:
The list of publishers above is literally a Who's Who of international diplomacy and observation. Embedded within this field is also a story of the liberation of information previously published in different and predominantly closed systems. As each individual publication emerges online, it adds to the wealth of information from other sources. Taken collectively, we have a powerful trend giving rise to very high quality information - a trend not unique to country profiles. In time we will see this trend transform many information fields.
For years I was aware of a small binder by the front desk of the US consulate help desk. The binder contained the latest bulletins and alerts thought relevant to overseas travelers. Today, you are far more likely to see this electronically as the US International Travel Advisory Reports, delivered electronically at travel.state.gov/travel_warnings.html
Almost all of the electronic resources, with the notable exception of the Country Indicators for Foreign Policy (CIFP) by the Canadian Department of Foreign Affairs and International Trade and Norman Paterson School of International Affairs, all these electronic resources were previously published in paper. So the above list is really a list of pre-existing publications now released on the internet. This is both delightful, since we now have rapid access to very fine publications, and delightful, since we can look forward to a future with country profiles specifically designed for the web.
The library resources, like the "Europa World Year Book" (now in its 37th edition) and the "Compendium of Social Statistics and Indicators" by the United Nations, publish data very similar to other publications currently online. The notable exceptions are the publications of the Far Eastern Economic Review and the Economist. These two financial papers publish economic profiles both in print, and through their periodical. This kind of data is a little higher quality than that found online, and does not suffer the time-lag which is the one accusation we can level against government information.
The commercial country profiles includes PERC (Political and Economic Risk Consultancy), the Economist Intelligence Unit (EIU), Bank of America World Information Services, and then a number of quasi-government or government publications for sale from Australian Dept of Foreign Affairs, US Embassies and the OECD. Additional publications exist and fall into one of these two categories.
The initial alternative information includes reading regional papers and periodicals or reading and searching current news. For more depth, there are international policy journals and scholarly journals with expert commentary under peer review, or for simple questions, the Ambassador, Consulate and Representatives both of your country and the target country can help you answer specific questions.
Country Profiles makes for a very good microcosm of information organization in action. Let us focus on how available country profiles have changed over the last few years. We have a few commercial publications, being offset by a range of free publications emerging from government and quasi-government sources, and encroached by other information resources of related information.
Once you have decided to reach for trade statistics, reach for the best. All the general statistics and trade links are of limited relevance compared to knowing the volume of tuna exported to Japan. We can try to identify specific exporting firms, potential markets and existing trade patterns. We list here statistics prepared by the national statistical agencies, certain directories of possible interest, and a database of port traffic.
Trade Data Online
Trade Data Online (strategis.ic.gc.ca/sc_mrkti/tdst/engdoc/tr_homep.html) is a service by Industry Canada, presenting trade information from Statistics Canada and the US Bureau of the Census. This free database presents trade data for both the US and Canada. Results either list imports and exports by product (down to the level of "pulp of wood and the like", or "footwear", or imports and exports by industry ("fruit farms" or "contract logging industry").
In every way, this is a brilliant tool, except the depth of categories. Results can be as specific as exports from British Columbia to Afghanistan, divided by month in CA$ or US$. For more detail, we need to reach for the paid services below.
Kompass directories list manufacturing firms by product. If you are looking for the manufacturer of plastic disk slips - here is where you go. They are a bit tricky to use, so read our simple guide first. Kompass directories list manufacturing companies, which may suggest potential exporters.
Kompass is produced by Kompass [US] or Kompass International. Print directories exist for most countries while Kompass databases cover regions (i.e. Kompass Asia/Pacific). Large libraries will have some of the print directories. Further descriptions can be found from Dialog,
Australian Exports by Austrade, gives the names of major firms divided by product and service. Volume of trade is not provided, but this directory, and directories like this, provide the names responsible for the trade numbers you can determine using other resources (like export statistics from the Australian Bureau of Statistics). The American Export Register provides similar information.
US Trade Statistics
The US Customs Service collects import and export information, but the information is developed by the US Census Bureau and Stat-USA (a commercial wing of the Dept of Commerce). The Trade Data Online listed above is a free version of this information but at a shallow level.
The National Trade Data Bank (NTDB) is a subscription service to US import and export statistics offered through Stat-USA. Costs are US$50/quarter or US$150/yr. This data is accessed through the Stat-USA website. The database extends down to the level of "0105190020 Turkeys, Live, Weighing Not Over 185 G Each (SIC0259)".
The subscription price also entitles you to a range of further economic data, so you will want to investigate this a little further.
The US Census Burea, also sells trade data collected by the US Customs Service. Start at USA Trade Statistics.
Canadian Trade Statistics
Canadian customs information is either available through The Trade Data Online (a free but at a shallow trade database), or through the Canadian International Merchandise Trade Database, also by Statistics Canada.
The Canadian International Merchandise Trade Database delivers specific imports and exports from Canada - and provides you with a quote for the cost. Works like a shopping trolley, and Statistics Canada accepts payment by credit card.
All the Australian Trade Statistics are prepared by the Australian Bureau of Statistics (ABS). Import and Export statistics are collected by the customs authority, then released as a paid service directly from the ABS prepared to the level of classification you need. Prices are arranged by quote.
Due to privacy concerns you will not be able to pinpoint who is exporting/importing but you will get totals, by state if you wish, for commodities. This is a paid service. To start, contact the ABS by phone.
PIERS - Port traffic database.
PIERS (www.piers.com) is a database of port traffic. Based upon the port documents (manifest & bill of lading), the complete database compiles this information into specific categories, countries and the like. The PIERS database covers imports and exports from the US, Mexico and a collection of south and Latin American countries. Of particular interest, summary data is also available through the website (sample). A report detailing the top importers of olives from Italy costs US$87 when I looked. Databases are organized as US or Mexico, Import or Export.
As each national statistical bureau records and monitors imports and exports, read the National Statistical Agencies article for directions to other country statistics. For those tempted to trawl for internet resources, consider International Trade Web Resources by the Federation of International Trade Associations (www.fita.org/webindex.html), a site recommended by Argus.
Business Benchmarks are statistical descriptions of the running costs of comparable businesses.
There are several ways to use benchmarks. Accountants use them frequently, as do bankers and investment advisors, to judge the health of a business. Certainly anyone buying a business will reach for business benchmarks as one measurement of business health and value. Equally as often, your accountant will do this work for you.
A standard business benchmark will describe various costs as a percentage of total turnover. They may include figures like turnover per staff, gross profit as a percentage of turnover, staffing costs as a percentage of turnover and such. Some benchmarks give more. These are the ones we are aware of.
* Small Business Advancement Electronic Resource
The SBAER (www.sbaer.uca.edu) publishes a collection of 33 small business profiles, free on the net but unfortunately slightly dated now. Start at www.sbaer.uca.edu/sbaer/publications/#industry
* US Industry and Trade Outlook 2000 (USA)
US Industry and Trade Outlook 2000 is an NTIS publication compiled by industry analysts from Dept of Commerce. Their blurb describes a 650 page volume, reviewing most important sectors of the US economy. If your library does not have a copy, the book is inexpensive at about US$70. See their webpage description (www.ntis.gov/product/industry-trade.htm).
* Australian Bureau of Statistics (ABS) (Australia)
The ABS publishes business benchmarks in their industry analyses. If the ABS has undertaken surveys, and you search their online catalogue to determine this, then they will have compiled information that can be used as business benchmarks. You may have to calculate the percentages yourself, the ABS tends to have older data than other sources, and focus more on industry. The ABS collects their data from surveys sent to businesses. Start with the current ABS Catalogue of Publications.
Other benchmarks are published as books.
* The [Australian] Bureau of Industry Economics publishes a series of studies on various Australian infrastructure industries. Each study compares between states and against best work practice, including costs, services and operating efficiency. All have the titles "International Performance Indicators ..." and you can get a list by entering this in the AGIP database of Australian Government Publications.
* The Locating Books article will help you find alternative books.
Commercial Benchmark Compilations
* FMRC Benchmarking Team (Australia)
The FMRC Business Benchmarks (www.benchmarking.au.com ) are Australian business benchmarks, recording the expected costs as a percentage and certain business ratios for a range of mostly small business industries.
I have not had time to review their new website but previously they came in two formats... a single sheet and a small pamphlet, which is little more than the single sheet with an explanation attached. Accountants use benchmarks frequently, and this may well be the easiest place to go to get them. The State Library in Western Australia has an aging collection in a binder held behind the business help desk and The Small Business Development Corporation's Free Advisory service in WA incorporate this information into their advice. You could also purchase these directly from the SBDC (formerly $250 for hard or softcopy for complete information or about A$40 each.)
Be careful of their age. Each industry is only analyzed every few years, and the libraries may not have the most recent version. Further, these do require some understanding of business ratios.
* Westralian Business Ratios (Western Australia)
John Watson, from the Economics Department of the University of Western Australia, has created a very professional set of business benchmarks on Western Australian businesses. Unlike most business benchmarks, these are annual, present quartile information and describe the statistics in a most professional manner (including sample size !). You may need the help of your accountant to get a copy.
We have listed just a few benchmarks here but information about benchmarks is so poorly distributed, and we get asked so frequently, we thought it worthwhile publishing this article anyway. If you know of further benchmarks, do inform us.
One further opportunity is Purposeful Benchmarking. Ideally you arrange an amicable invitation to peruse the best practice of, not your competitor but a business unit which does similar functions in a different industry. Thus, compare Airplane Turnaround times with an racecar pit crew.
The Benchmark Self-Help Manual is guide to the concept of creating benchmarks. Best Practice manuals and journals also cover this activity.
Company information forms the backbone to the information industry. There is real money here. Investors are eager, customers & suppliers are eager, competitors are eager to find good information. As a result, a wide collection of very client-centered research resources has grown up to deliver to this market.
Your research may take you into competitive intelligence and private investigation - talking to competitors, customers, suppliers, past employees and more. Another direction leads to information specific to an industry: perhaps locating export logs or chemical patents. For the purpose of this article, let's restrict ourselves to public, general and readily available resources: publications from the company itself, government disclosure documents, directory information, business news articles, compiled company profiles, and related profiles like credit reports or investment profiles.
Let's start with the obvious. Companies publish information about themselves - some of it quite useful & factual. Look for a company website.
* Use Altavista to find a specific commercial website. Specifically use the url:name function (like url:nike).
* Alternatively, use Debriefing (www.debriefing.com), a meta-search engine optimized for finding names and named websites.
* If you still have difficulties, consider a local or national search engine.
Government Disclosure Documents
Governments require all companies to release some information - some of this is made public. Much greater information is released from public companies.
* EDGAR (www.sec.gov/cgi-bin/srch-edgar), a database produced by the (US) Securities and Exchange Commission, delivers all public US company submissions as required by law. The information is factual and numerical - and includes both current and past submissions. Access is free on the net.
* SEDAR (www.sedar.com), produced for the Canadian Depository for Securities, is the Canadian counterpart to the US EDGAR database. SEDAR delivers the public securities filings and public/mutual fund profiles. SEDAR also includes some press releases. The search is very user-friendly.
EDGAR (and presumably SEDAR) are also basic ingredients to other commercial databases like EDGAR Plus on Dialog or company profiles like Hoovers Company Profiles. EDGAR Plus and Disclosure (another database) contain very similar data to the free EDGAR database but include better fields and standardized financials.
Basic Directory Information
Address, contact numbers and basic size may be all you need initially. Such information can be found through numerous book directories. Most directories are created from questionnaires, so the information is suggestive - not absolute.
Directories come in different forms; general information, businesses in specific industries or regions, registers like American Export Register & Australian Exports, and serialized directories like Kompass & Who's Who (i.e. Who's Who of Business in Australia). The commercial databases to these serials usually cover a far larger area that may be very useful. Kompass comes in national directories; one of the databases covers S.E.Asia.
Every library will have numerous directory titles available, though not always the most recent editions. Especially in recent years, a vast collection of directories have emerged with titles like Lloyds Shipping Register, Radio Airtime Sales, and National Directory of Multicultural Research - clearly a great range exist.
Some of the more popular directories have previously become available as commercial databases. A small collection of directories like Thomas Register of American Manufacturing, American Export Register and CompaniesOnline (Dun & Bradstreet with Lycos) are emerging free online.
The humble phone book is certainly available. Another option is to reach for phone numbers on CD-ROM. Australian Businesses on CD, American Business Information - A Business Directory (Dialog) and more.
Directories may also be used to determine what the companies produce and sell. The Kompass Directories index manufacturers by product. Australian Exports (by Austrade) lists exporters by product. Directories have other innovative uses too.
Corporate structure can be found using, again, a collection of directories: America's Corporate Families and International Affiliates, Directory of Corporate Affiliations (Dialog), Who Owns Who (by Dun & Bradstreet)
Company Annual Reports
Annual reports are brilliant at giving a concise review of a business or government operation and they usually don't lie too directly (though they do put quite a spin on the statistics from time to time).
Annual reports will be found in one of five sources:
* State Public Libraries,
* Stock Exchange Libraries,
* Direct from the Company,
* Purchased through Annual Report Providers,
* Annual reports may also be published on the company website. Wall Street Journal and Public Register's Annual Report Service -PRARS are reported as commercial annual reports providers.
The Simon Fraser University Library has compiled a fine resource for company annual reports: Business - Annual Reports (www.lib.sfu.ca/kiosk/mbodnar/anrpt.htm).
News Coverage and Press Releases
Many newswires contain copious amounts of information about companies - and describe products, mergers and fiascoes. Prominent newspapers specialize in covering business. In active research, this means searching the commercial databases of past & recent news. This is described in more detail in our news article.
News is generated locally, then distributed globally through the newswires. Associated Press, Reuters and the top of the line Bloomberg Business Newsall deliver business news targeted to the investor.
Press releases are released through BusinessWire and PR Newswire and a selection of national wire services. Current press releases are usually free online but past press releases are again archived as commercial databases. This information is also rather ubiquitously used in the preparation of company profiles.
Prominent business investigation also occurs through specific newspapers. The Financial Times and the Wall Street Journal can be very useful resources in this regard. Of course, these newspapers are also available as searchable databases. Business Electronic Newspapers (www.libraries.rutgers.edu/rulib/socsci/busi/busenews.htm) lists many of the business-related electronic news sources available on the internet.
Business & Trade Articles
Companies are also profiled in the trade periodicals. There are three ways to approach this. Firstly, you can attempt a broad search for articles about a company in a wide collection of commercial article databases. Secondly, you can seek articles in specific, topical trade publications by searching databases specific to the field the company works in. Thirdly, you can use what is close at hand, perhaps access to ABI/Inform or another popular business article database, and see what appears.
These alternative approaches each have pros & cons. ABI/Inform has a deep North American bias (as do many commercial databases) and indexes many of the more trashy/newsy local business magazines. Tightly focused databases may simply have nothing on the target company - or have only technical matters. Certain databases will allow you to specify during the search exactly what company you are interested in: you will read of these in the database descriptions.
To find trade periodicals, consider searching on a broad business database, then noting the titles that repeat themselves.
Commercial Company Profiles
A wide range of potted histories, financial histories and current information is available. The market is not necessarily centered in the US, but North American products are better promoted. This information comes in the form of small reports about a given business, prepared with investors in mind.
* Hoover's Online (www.hoovers.com)
* Standard & Poor
* Dun & Bradstreet
* Moody's - Moody's Corporate Profiles
* Disclosure (www.disclosure.com)
* Value Line Investment Survey
* Worldscope (www.worldscope.com), a global database.
For a fine, European dominated list of country profile retailers, read Sheila Webber's article: Company Profiles and Financial Information (www.dis.strath.ac.uk/business/financials.html).
A holistic approach: the most powerful tools present a variety of resources for your attention.
* Lexis-Nexis Company Library
* Dow Jones News/Retrieval Service
Investext (www.investext.com) - provides in-depth business research - access to collections of investment research, market research, and trade association research, authored by analysts at investment banks, brokerages and related consulting firms. The work is also available through EINS, Dialog and Datastar.
Company research need not stop here. There are many avenues of further research: Directly ask the company for sales literature: catalogue, price list, local sales agents, Monitor company employment advertisements, Articles in the trade and specialized press, Company registers: in addition to anonymous statistical compilations, the national statistical bureau will also have a register of businesses - by name - with address coded by industry code. This is used firstly with site analysis, but may also be useful for geographical analysis of businesses. Background information on company leaders: their history, experience and age, Patent research. Industry level research - see Industry Research, Large international firms may have books written about them - consider a book search, Interview past employees of the company, Interview their suppliers or customers, Local newspapers where the firm is located.
The task of finding information about companies is really a task of finding information thrown off in the process of running a business. Some of it is mandated by government (Edgar & Sedar), some of it by newspapers, some by the company itself (websites, price lists). In each case, some organization has stepped forward to collect and organize the information. Annual Reports on the web gave rise to web directories of annual reports. Corporate ownership - the directory "Who Owns Who" by Dun & Bradstreet.
Industry research will encompass many of the research tools and vectors described more fully in our other articles. Your research into the information industry (as an example) will certainly include a book search, an article search, perhaps some patent research, statistics and discussion groups.
What we have in this article are the resources specifically for industry level research - and leads to further promising directions like patent research, statistics and discussion groups.
With few exceptions, you will need to search for specific facets of an industry when you continue your research beyond this article. You will get no-where trying to search for "information industry" - but will find very factual information about the proposed changes to intellectual property of database contents (an issue critical to the information industry).
The web is a fine example of this: with the exception of Industry Canada & the US Census Bureau, I can think of no other sites devoted to 'industry'; few organizations package information this way.
There are numerous gems to be unearthed free from the internet. Industry news flows through news sources like AnchorDesk & Clarinet. Discussion groups may inform and dissect developments in industries with great resource and collective skill. Associations may occasionally feel it is in their interest to publish industry briefs & white papers describing their position. Without exception, you will have better success searching for specific facets of an industry which interest you.
Online Industry Information
Market Access Database (mkaccdb.eu.int), a project by the Commission of the European Union, presents some sharp analysis about market access for a collection of 30+ countries. Extends from overviews of barriers, to specific barriers in specific industries. Query the database by country.
The US Census Bureau publishes Current Industrial Reports. Just a few are online, and this is just one resource here, so it is better to search their website or review their catalogue.
Industry Canada, working with Statistics Canada, publishes a fine site devoted to Canadian industry statistics. These organizations are also responsible for Trade Data Online (strategis.ic.gc.ca/sc_mrkti/tdst/engdoc/tr_homep.html), a free database presenting US & Canadian Trade broken down to industry (SEC & NAICS).
Government Publication Databases
One of the first tasks to undertake is a search of the government publication databases. Governments spend an inordinate portion of their time monitoring industries - and write exhaustively. This will be one of your most promising sources of Industry data and description. Publications undertaken at a national level should appear in their respective government publication databases: AGIP, MOCAT & the publication catalogue of the UK Stationery Office.
National Statistical Agency Data
A second invaluable resource will be the national statistical agencies: the US Census Dept, Statistics Canada, the Australian Bureau of Statistics (ABS), the UK Office for National Statistics (ONS). Some of their data is published on the web and each have their publications catalogue online. Links and forms are prepared for you in our article: National Statistical Bureau.
Further Statistical Resources:
Association Statistics are usually tightly focussed on the industry itself. A case in point, the Australian Booksellers Association prepares an annual analysis of business benchmarks, and industry size, growth and development. Such publications are usually inexpensive and timely. Start by locating an association particular to the industry.
Benchmark Studies, undertaken by accountancy firms and associations focus on the financial ratios involved in business. The FMRC Business Benchmarks and the United States Trade and Industry Outlook (www.ntis.gov/yellowbk/1nty752.htm) are examples. Both present descriptions of business operating costs, risk and margins compiled by comparing financial data from various companies within an industry. The results are anonymous, but factual and again, relatively timely.
The Statistical Abstract of the US (www.census.gov/stat_abstract), free online from the US Census Bureau, gives you another avenue for finding industry related statistics. There are several statistical resource directories in most libraries, like Statistical Sources (by Gale Research).
Further Government Industry Studies
Governments do not always publish their work widely. Non-statistical agencies create vast quantities of government studies on all manner of industry, but this work is primarily undertaken as part of their industry supervisory role. Of course, this information is available to you if you can find it. If the information has arrived on the web, you may find it with a web search limited to government webpages.
If your industry analysis is local, approach the appropriate state government organizations. Here in Western Australia, for example, the state tourism agency maintains a list of all planned large tourism projects. This is a fine example of the potential value to be found here. Of course, this list is not widely published - or known - but one should not underestimate the industry information prepared by government agencies.
Further avenues could include researching changes to industry regulation, perhaps with congressional discussion or legal commentary. Such research may be internet based for the US (I am thinking of the Library of Congress Thomas Database). Consider reading sections of The Virtual Chase (www.virtualchase.com/coinfo/index.htm).
Industry research has also grown into a very active industry in its own right. There are many organizations who have built considerable expertise in analyzing and preparing research reports both as a retail and consultancy service.
Market Research & Industry Research Reports
Many of the larger market research firms also prepare market/industry reports for sale. These reports are only as good as their age, depth and reputation, and may be prohibitively expensive. They are, however, also very accessible ways to read an encapsulated concern of an industries changes and movement - and may save you from undertaking some of the work yourself.
* Find/SVP (www.findsvp.com) is a good example.
* Here in Australia, IBIS and Syntec Economic Services both specialize in preparing industry research reports - often for government. Again, some of this work becomes available to purchase.
For a fine list of such market research retailers, consider reading Sheila Webber's 1998 list: Commercial market research companies (www.dis.strath.ac.uk/business/marketres.html)
Your national embassies and trade organizations also provide international industry and marketing reports. This is undertaken as paid consultancy work.
Business Magazines and Trade Periodicals
Industry analysts are not the only ones involved in research. Considerable broad industry analysis occurs in the trade and business press. The most effective tool here, of course, is the article search.
There are two ways to approach this. Firstly, if you can refine your concept to a specific phrase which interests you, then try a broad search of business & industry periodicals. Alternatively, you can select a specific database particular to the industry you want to cover. For example: Aluminum Industry Abstracts (Dialog). This is covered in a little more detail our articles on Finding Articles & Commercial Databases.
There are also collections of databases focused on 'industry' in general. Industry Trends and Analysis: (Dialog) a mixed index/abstract/text for "broad coverage of industries, technologies, and management topics", and Predicasts Prompt: a "multi-industry bibliographic database, offering access to over 1500 trade journals, newspapers and special reports in relation to over 60 industries".
Many of the resources used in company research will describe the industry too. Annual Reports for industry giants will include information useful for industry analysis. The same directories like Kompass which can be used to identify the address of a company, can also be used to identify the companies which are active in a particular industry. Patents may be critical in certain industries. Thankfully, the US & Canada have considerable patent data free online. Patent research is covered separately in Searching Patents. Interview key analysts within the industry. These are the people writing the articles, the industry reports, the government analysts and, perhaps, critical managers & past managers from the industry. Import & Export statistics may help you understand and quantify the international nature of an industry. This is described separately in our article: Imports & Exports. Of particular interest will be the free internet access to US and Canadian trade statistics by SIC & NAICS thanks to Industry Canada.
As with corporate research, there are a very many rewarding avenues to search for industry information. The challenge will be in structuring your approach in a way that both suits your budget and desired depth. If we are successful, we aim to have compiled a collection of industry specific data from a range of sources, including a range of bias and background. A simple pitfall: collecting various resources which all depend on SEC financial data. You are equally likely to collect resources featuring data pulled primarily from the company's annual report or website. In this field, numerous references does not necessarily lend additional credence to information.
Industry Research could either be research into industry-groups (banking or transport industries) or research into specific industries (wholesale furniture or retail butchers). This is a good distinction to make as very different resources are involved. Industry-group trends may be found with national statistics, government trade reports and general market reports. Researching specific industries may better be served with association statistics, specific market reports, trade articles and business benchmarks. Select only the resources you feel match your research goals.
Secondly, collecting industry research need not be constrained to your national border. There are very good reasons to consider statistics collected from foreign governments or associations. Industries do not develop uniformly in different countries. Foreign industries may be predictive of industry developments yet to flow through to your country, or indicative of different standards and legislation.
There is considerable expertise in drawing conclusions from industry data: a skill beyond the initial scope of our work here. This is often the domain of experienced consultancy - though there is certainly no miracle to it. May I recommend a book; The New Competitor Intelligence by Leonard Fuld. Lastly, we have not yet described the categorization of industries using standard SIC or NAICS coding. In simple terms, each industry is divided into specific codes, similar to the international patent classification or the Dewey decimal system. The two systems SIC and NAICS are inter-related and will not cause undue difficulty. Trade statistics, digital business directories, and national statistical bureau industry data will all use the industry codes.
There are tools to assist you to either locate someone you know, or dig up background information. The internet has email directories and phone directories aplenty as well as tools to trace internet communication. Beyond this, there are tools to find silent numbers, business and asset ownership, newspaper articles and more. You will start with a name or email address.
Finding an Email Address:
* The Yahoo People Search (people.yahoo.com) is an important and flexible tool for finding email & address information.
* Switchboard (www.switchboard.com) also offers several people search tools.
* You may need to search the people databases from several internet websites to be successful. For further assistance, consider the FAQ: How to find people's E-mail addresses (www.cs.queensu.ca/FAQs/email/bigfinding.html) and the phone & address references on Yahoo.
People who Publish Online
Has the person published anything on the internet? The simple way is to search the internet for the full name of the individual in the hope they included their email address or real name on the webpage. Use Altavista and Debriefing for this task. For more depth, read the article: Searching the Web. Altavista has a very large, fast search engine. Type the name using quotes to keep the words together. Add in further information if you know using url:edu or keywords (use the + sign). Also, capitals matter with Altavista. Debriefing, is a meta-search engine optimized for finding people & named websites.
Finger is a lesser known internet protocol which sometimes reveals information about a person given an email address. It used to be more common and may give name & perhaps if a person is currently logged in. It is easy to make a finger request from a Unix command line (finger email@host). Some web-browsers will allow you to enter a finger request directly (as finger://username@host). Alternatively, use a finger gateway like this one from MIT (www.mit.edu:8001/finger?).
Tracing Online Communication
Deja.com usenet archive (www.deja.com) maintains a very large database of newsgroup discussion. The Deja.com's power search is a must-see and will give you a brilliant author profile. Here is a quick search; the power search has more flexible options.
Searching mailing list discussion is more difficult. If you know a forum a person is active in, see our article: Discussion Groups. Alternatively, search the web for the email address. Hopefully you will catch list discussion picked up by zines or directly by search engines. Use Altavista for this.
There are several tools available to you here: Printed Directories: White pages - if you know the name but not the address or phone number. Yellow pages & other business listings - if you know the business, but not address or number. Sometimes libraries and post offices will have the white pages to different states. A better alternative may be to search the white pages through the internet. For a very complete list, visit Telephone Directories on the Web (www.teldir.com).
Directory Assistance - if you know an approximate name/address combination, but not number. Directory Assistance is a service provided by your phone company.
Phone directory databases - usually prepared as a CD-ROM, listing all the phone numbers in Australia. this is particularly good for a reverse search: seeking the name and address from the phone number.
Biographical Directories and Databases
If the person is famous, newsworthy or historically important, this may be a worthwhile option. Directories like the series of Who's Who directories will list some basic biographical details, most likely prepared by the person involved. Who's Who directories exist for many categories and countries like Longman Who's Who, Marquis Who's Who or Who's Who in European Business.
Alternatively, consider the collection of biographical directories and databases like Wilson Biography Index (see SilverPlatter or FirstSearch), Wilson Current Biography (SilverPlatter), Bowker Biographical Directory or Biography Master Index. The Wilson Biography Index, for example, cites a large number of periodicals & books which include biographies.
There is also a simple biographical database online: Biography Online (www.biography.com), with 15000+ biographical abstracts - but most are really really short. Of course, for well-known people, consider an encyclopedia.
Local newspapers are a brilliant resource for information about individuals, and most anyone running a business will try to be featured in their local newspapers. The key here is local newspapers, and historical databases (not current news).
There are no shortages of electronic access to good news too. DataTimes presents a single access point to many of the North American newspapers. Global Textline includes access to a wide range of different countries. With both these news archive databases, you must be careful to specify exactly what you are looking for. You would be surprised how many David Novak's there are in my state alone. Use the full text databases in particular.
The asset search involves searching a selection of government databases for home and business ownership. The presence of a mortgage on a house is public knowledge (though the information is not particularly current). National business ownership databases, like ASCOT in Australia, will give you the ownership of businesses and association management. For a small fee through the department of business registration, or a collection of commercial retailers, you can search the ASCOT database by name.
One elegant suggestion is to seek help from a professional information broker from the area where a person lives. The mailing list InfoPro is a particularly large collection of brokers who routinely distribute this kind of information. Consider emailing a request for assistance to the list manager James and ask your request be circulated to the mailing list.
Reverse Telephone Directories.
Previously these were primarily police resources, but today they have become tools for telephone marketing. CD's are pressed with all the phone numbers in Australia, or all the numbers in the US. The search function lets you run this as a reverse directory just by searching for the phone number. Look in the yellow pages, or perhaps ask a librarian for leads to these resources.
Commercial Personal Information Profiles
There are commercial products supporting the needs of human resource departments, legal research and the police. Information is collected and distributed as like Credit Reports, or personal profiles. As an example, running a level three Missing Links search on CDB (for about US$15.00) will usually return a US silent phone number.
* CDB Infotek (www.cdb.com/public/) maintain a selection of commercial databases of personal information.
Further firms have been mentioned as active in this industry, including American Information Network (www.ameri.com), Know-X and IRB OnLine (www.irb-online.com).
There is a serious issue as to the morality of easy access to personal information. There is an equally important moral value in empowerment: what is publicly available to should be publicly known.
Beyond these resources we have to tools available to private investigators: rummaging though garbage cans, following the suspect, etc... There are also computer files and databases with better controlled access: drivers databases, police arrest records, voters registration, medical records, passport and immigration records, banking records. Most of the latter resources will only be available to you with the direct permission of the one involved. Further databases, like a database of known pedophiles, while available, would only be useful if you had previous suspicions.
A patent protects your investment in an invention. Copyright covers your effort in a literary or artistic work. Trademarks protect your investment in identifying a product or service to the marketplace.
Consider the striped IBM logo and the slogan Coke is it. A trademark is a word, phrase, symbol or combination identifying a product or service in the marketplace. This covers logos, marketing slogans, brand and trade names. In some circumstances, the trademark can cover colors or smells. Registered trademarks are trademarks granted additional legitimacy by the appropriate government agency. Common Law trademarks ('unregistered') are also protected, to a lesser degree. Both can be used to stop others using identical or similar marketing slogans, logos, brand and trade names.
This article delves into the task of trademark research, that is, finding comparable trademarks. Nothing in here pertains to the legal aspects of trademark protection or infringement.
Registered Trademark Databases
The first step in trademark research is to search the national registered trademark databases. These databases are freely searchable online:
* IP Australia (www.ipaustralia.gov.au) has the very user-friendly ATMOSS database online, and their more definitive (but nightmarish) Trade Marks Mainframe Database.
* The US Patent and Trademark Office (USPTO) provides US Trademarks online. Read the description/disclaimers/options for the US Trademark Database, or jump directly to the Boolean Search Page.
* The Canadian Intellectual Property Office CIPO (cipo.gc.ca) delivers free online, the Canadian Trade-marks Database - all pending and registered trade-marks in Canada. Canada also publishes some of the best advice regarding trademarks.
* Further countries are preparing English access to registered trademarks. Start with Rossco's WWW Corner which has a fine list of Patent Offices (www.pcug.org.au/~rossco/poffices.htm).
IP Australia (www.ipaustralia.gov.au) is the government organization responsible for Australian trademark concerns. Australia has about 800,000 registered trademarks, and access is freely available online through either the simple graphical interface of ATMOSS (Australian Trade Marks Online Search System), or through the slightly superior but difficult and non-graphical Trade Marks Mainframe Database (and the associated trademark viewer).
The ATMOSS database allows you to search using either the description of the trademark, or the trade mark number. It is returns similar trademarks, with trademark number, class, description, date, status, and perhaps an image of the trademark.
The [Australian] Trade Marks Mainframe Database is technically superior to ATMOSS as it is more current (about 3 days rather than about 2 weeks), has better field searching (by owners or phonetic) and includes references to correspondence regarding trademark registration. Unfortunately, the Trade Marks Mainframe Database is not graphical, and is probably not worth your time in learning. I am led to believe the superior field searching will gradually migrate to ATMOSS anyway. If you do wish to persevere, there is a manual online.
Common Law Searching
In most countries, but not all, registration of a trademark is not required to gain legal protection. Most trademarks are not registered, and enjoy considerable 'common law' legal protection under trade practices or fair dealing legislation. For this reason a trademark search must reach beyond the national registered trademark database, to search brand names, business names, and other sources of trademark usage.
To quote the Trademark FAQ by the USPTO: "A common law search involves searching records other than the federal register and pending application records. It may involve checking phone directories, yellow pages, industrial directories, state trademark registers, among others, in an effort to determine if a particular mark is used by others when they have not filed for a federal trademark registration."
Frequently Asked Questions About Trademarks (USPTO) (www.uspto.gov/web/offices/tac/tmfaq.htm)
The premise of a search is to find possible sources of trademark similarity. We search sites where trademarks appear.
Business names and trademarks are not the same, but are often used interchangeably. A business name search may give you leads to possible trademark similarities. Phone directories (white and yellow), and national business name registers list business names.
The internet is a fine site to search, especially since the search engines are prepared in a useful manner. I would search for word fragment in AltaVista, Debriefing, and Deja.com's usenet archive. See our articles: Searching the Web and Discussion Groups.
Of course, this does not account for similar pronunciation, or the graphical elements of trademarks.
Trademarks appear in trade magazines, but not often in the database formats, so this gives rise to the unenviable task of paging through likely magazines for similar trademark.
One uncertain resources is the Lycos: Pictures and Sounds search facility. By indexing the alt=" " text from html pages, Lycos compiles a list of pictures on the web. A search for butterfly, for example, locates 100+ pictures labeled 'butterfly'. This might work to your benefit if the graphical element you are searching for is simple and distinct. Altavista has a similar service.
Should you want to learn how trademarks are created, used and defended, consider these fine resources:
* Trademark References by the Canadian Intellectual Property Office (CIPO), including: What's in a Name? Using trade-marks as a business tool, Glossary of Intellectual Property Terms, Trade-mark FAQ and Guide to Trade-marks (start at strategis.ic.gc.ca/sc_mrksv/cipo/tm/tm_main-e.html)
* All about Trademarks by Gregory H. Guillot at www.ggmark.com (unusual clarity on trademark law) including: A Guide to Proper Trademark Use, How are Marks Protected
* General Information Concerning Trademarks by the USPTO (www.uspto.gov/web/menu/tm.html) including: Frequently Asked Questions about Trademarks.
In the countries with internet access to the trademark database, the libraries could be said to be redundant - except as a source for ample and personal assistance with your search. In other countries these libraries may be able to assist with searching.
IP Australia has a patent & trademark library in each state capital. These libraries provide free access to the ATMOSS database but also offers the much-needed assistance for the troublesome Trade Marks Mainframe Database. The US has The Patent and Trademark Depository Library Program (PTDL's). In Canada, consider visiting Intellectual Property Links: Canadian by CIPO for possible sources of trademark assistance. In the UK, we presume the Patents Information Network (PIN) provides trademark assistance, through the is no freely searchable database to UK trademarks.
Commercial Trademark Resources
One of the most invaluable resources in serious trademark research is access to several of the very large commercial trademark databases.
Lexis-Nexis (www.lexis-nexis.com) retails several trademark related databases.
The Dialog Corporation (www.dialog.com) retails a collection of TRADEMARKSCAN databases to European countries, Canada, and US (federal & state).
MicroPatent (www.micropat.com) offers access to a proprietary trademark database. More information coming.
In addition to the database retailers and producers, there is a lively industry of trademark search assistance.
There are numerous commercial firms on the internet selling trademark services; much of this is little more than an ad for trademark related litigation.
Watching services are another possibility: These are not expensive but following the leads suggested will be. I can not yet advise you on a reliable trademark researcher.
As a case in point, IP Australia provides a Business Names Applicant Search Service. A$40 buys you a search of the Australian registered trademark database by their trained staff. Contact IP Australia directly for this (Tel: +61 1300 651010) - they accept credit cards & fax/postal applications.
The Pharaoh called on Shakh to negotiate the annual royal donation with the priests of Karnak temple complex. The Pharaoh was not wise in such matters and had previously given far too much to the detriment of the state. It was not wise to voice such sentiments. Shakh instead set about negotiating a figure ample to their needs but insufficient to further expand the temple complex.
Shakh wisely chose to negotiate up river at the Kom Ombo temple - away from Karnak. Choosing words carefully, he deftly rejected the initial estimate of the temple's needs, then spoke calmly, eyes tight, that the Pharaoh had decided Karnak should supply the priests to the Egyptian army - at current expenses.
It was a clever ruse. The negotiated royal donation was significantly reduced and the priests were happy to be excluded from military duty.
If searching be a combination of science, art and experience, then the science of searching is the easiest of the three. There are just a few search elements to remember and search techniques to apply.
Firstly, there are the tactics associated with free text searching; that of Boolean, proximity, truncation, field searching, target searching and further enhancements.
Secondly, there are the basic classification schemes: the Dewey decimal system (for books) The WIPO and US Patent Classification Systems (for patents), the Standard Industrial Classification (SIC) Codes (for industry) and a number of additional classification systems founded on the same principles.
Thirdly, there is the way information is organized. A book has a table-of-contents and an index, large directories like Kompass and Gale Directory of Databases are arranged with so many indexes (geographic, subject, product, name) that the contact information is often separated and numbered, then referenced as a number. The results are initially confusing. Statistics similarly have ways of presenting information (pie charts, line charts, charts with ranges which do not reach zero) and again, this can be confusing the first time you see them.
Let's start with the technique associated with searching a text database.
Straight Word Searching:
All search situations allow you to ask for the presence of words in a block of text. Obviously it helps if you ask for the right word or words. If you ask for the right words, they you will quickly locate the information you desire. For best results you obviously want to choose a word or words which accurately describes what you are looking for. Prepare to search the text several times with different terms, and consider the possibility of different spellings for the same words.
Straight word searching is fairly ubiquitous on the internet. You can always search a webpage with the search function of your web browser. Alternatively, you can search by placing a large amount of text into a word processor and using the in-built search functions. Your word-processor can handle large files like website traffic logbooks and archived files of past mailing list discussion. There are also specialist tools like the shareware WinGrep (www.mindspring.com/~bgrigsby/wingrep.html) for searching many files on your computer hard drive. (Alternatively, consider AgentRansack www.agentransack.com).
The simplest refinement to straight searching involves searching for parts of a word - if you are interested in surfing, search for surf better yet, search for " surf" with the space in front of the word.
Some search engines don't allow searches for text fragments, and you must explain your intention by adding a truncation mark (usually * or ?) to the ends of words. For most professional researchable databases, alga? will include both algae and algal (as in algal bloom). I was once badly lost because of the spelling difference between aging and ageing. There are a number of improvements on this concept to. Sometimes there are special symbols for a non-space character car?a, sometimes there is automatic awareness of multiple spellings (colour & color). Sometimes there is even automatic awareness of synonyms. Often you are initially unaware important information is indexed under slightly different spelling, so truncation is strongly suggested for most searching.
An improvement on truncation is the opportunity to look directly at a list of words, either keywords, or descriptors. This allows you to see the range of spellings before you search. This is also ideal for searches of company names or proper places so you can select only the words you are interested in. In a simple way, some library catalogues present subject searches in this way: a list of subject categories arranged alphabetically.
Changing tack, searching for multiple words calls for "and, or, not" concepts. I want this word and that word, but not another word. It is simple enough. Many of the search engines allow for this with the -sign, and commercial databases often add brackets. Use of the not symbol is frowned upon in textbooks (too easy to dismiss information you are interested in it is said), but the 'and & or' is absolutely necessary for complex questions like I want [(spaghetti or noodle) and pasta] or (Italian and cuisine). With most internet search engines, but not all commercial searches, you will find 'and' is assumed.
The next dramatic improvement fixes the position of words relative to one another. In this category we have adjacent (often written as adj, next, or "inserted in quotes"), near (by how many words), or in the same sentence. Often it is wise to stretch the distance a little (within two), but where available, proximity is best way to remove the dross without affecting the value of information. "Patent near Research" is much more precise than "Patent and Research".
By separating information into different fields, we can selectively search different portions of the information. I want the title to show the words "Patent" and the abstract to include the words "Patent Research". Field searching is a common way to refine a search, but be aware searching titles is very likely to remove some desired information, where as searching descriptors and not abstracts may dramatically improve the content.
Are you really interested in information more than 15 years old? Library catalogues frequently have many aging books, and date limiting is very wise.
Ranking and the ability to search multiple databases are some of the further enhancements that select databases permit. There are also advances that do not have a grand impact - like natural language. Natural interpretation allows the searcher to phrase a question with common sentence structure. The computer then interprets what you want. In theory natural language is liberating but in practice the strengths of Boolean, proximity and field searching far exceed the benefits of natural language searching. Lastly, there are special techniques like target searching available on a few systems that bear discussing. Sorting allows you to shape the presentation of the information. When applied to financial information, this is particularly valuable. Alerts allow you to automatically repeat a previous search and have the information sent to you. Multiple database searching allows you to search a collection of databases concurrently. Ranking positions certain information at the top. These techniques can be valuable in certain circumstances.
These technical options improve the blunt system of simply asking for a word. You will find most search functions allow for some of these options and all commercial quality databases provide for numerous functions. The good news is an experienced searcher can accomplish wonders - collecting articles of 70%+ interest regularly on expensive database. The bad news is most of the best of search technology is not implemented on all the databases you will search and only occasionally on databases free on the internet.
There are several search techniques associated with library catalogues. Beyond the simple author/title/subject search, we should also consider searching by Dewey number, and searching first for any title - then selecting the subject fields.
The Dewey decimal system is similar in many ways to the patent classification system. Each step is divided into 10 - getting more and more specific. See this CAL State Dewey list (www.calstatela.edu/library/guides/Dclass.htm) to get an idea of its structure. This number here refers to a book called Australian government assistance to local government projects:
The Dewey system is arranged by Discipline, not subject groupings. Each digit to the right becomes progressively more detailed. The system works well in organizing books - and libraries expand it to suit their needs - but it is different from a subject catalogue. Because it is arranged by discipline, subject fields may be split.
In searching, we want to duplicate the walk to the shelves and browsing other publications that share similar numbers. We do this electronically by searching/browsing books that share most of a number. Drop a digit - expand the field of interest.
The Dewey system is a bit congested in certain areas, giving rise to very long numbers. For this and historical reasons, several national libraries do not use the Dewey system. The Library of Congress, for example, has its own classification scheme (Outlined here lcweb.loc.gov/catdir/cpso/lcco/lcco.html ).
We can do better than searching the subject index of a library catalogue. Try instead to search for a book which interests you - which you can usually find easily with a simple title search - and then selecting the subjects that book are indexed under.
Many of the library catalogues are making this particularly easy by incorporating links into the catalogue results. A quick look at the Library of Congress, for example, will show how all the subject fields are linked to further searching.
We can show this in action by looking at the book Earth Time  by David Suzuki, at my State Library. As you can see down the bottom, it is indexed under Social Ecology  and Human Ecology .
This kind of 'locate then expand' is an effective search technique used in a number of situations. In commercial databases, we may search for a company then expand to make sure we catch any different company spellings. We may also wish to search for a book, then search for books by the same publisher.
All patents are given a special number. Unfortunately, each country has a distinct numbering scheme: US patents are assigned a consecutive patent number (currently 6 million+). Australian patents have an alphanumerical which includes the year. Canadian patents are numbered.
Above these numbering systems, we have the International Patent Classification (IPC), by the World Intellectual Property Organization (WIPO). Most every country uses the IPC to classify patents, save the US. US Patent Classification is similar in many ways.
International Patent Classification
Thanks to the World Intellectual Property Organization (WIPO), the International Patent Classification (IPC) works as a universal classification for patents. Started in 1975 and periodically updated, we currently use IPC 7th Edition.
Section, Class & Group. The International Patent Classification looks like this: A 02 J 1/00
At the heart of the IPC is the unique coding of every invention by its specific form or function. The system is highly specific and logical, and includes numerous cross-references to other codes of similar form or function. Think of this as the Dewey Decimal System for patents.
The first letter is the section - one of eight broad categories labeled A through G. 'A' represents Human Necessities. 'B' covers Transport.
Each section is divided into Classes. Each class includes two numbers. In addition, each class is divided into subclasses, the letters which follow the first number.
Each subclass is then divided into groups and subgroups. The number before the slash is the group, the number after the slash is the subgroup. Subgroups only have two digits, with further numbers considered as resting behind a decimal point: 3/46 then 3/464, then 3/47.
Thus A 47 J 27/09 includes the safety device on your rice cooker and B 63 G 11/00 covers your various aircraft carriers.
The IPC system is fully described in these published directories:
The Official Catchword Index by World Intellectual Property Organization.
International Patent Classification: Guide, Survey of Classes & Summary of Main Groups
International Patent Classification: Section G - Physics
International Patent Classification: Guide
Thanks to the World Intellectual Property Organization (WIPO), these full documents are online. We now have direct access to the International Patent Classification (7th Edition): Official Catchword Index, Guide to the IPC, and the complete Class and Section books.
Note: The International Patent Classification includes plenty of internal references - indicating this group is similar to another group; motorized boats take precedence over boat function. These internal references are important to effectively searching databases. There is more to the IPC, and we strongly recommend you read the Introductory Manual to the International Patent Classification (IPC) found on the WIPO website.
US Patent Classification
US Patents are classified with 400+ main classes and thousands of subclasses. Sound similar to the International Patent Classification? It is. US patents are numbered sequentially.
This means you can find US patents:
- by full text searching through the USPTO database CASSIS (found at US patent libraries),
- by bibliographic & abstract text searching online through the USPTO or IBM Patent Library,
- by US Patent number by US Patent Classification class & subclass - to list similar patents by an effective combination search
- by the searching recent notices in the Official Gazette... available online.
The USPTO allows you to search or browse the US Manual of Classification online. The Internet Patent Search System lets you to browse US Patent titles by class/subclass.
A little more information can be found with the Patent Guide to using CASSIS, at the University of Michigan.
Patent Search Strategies
Here are the avenues open to you:
1_ Full text search and retrieval through a commercial database.
2_ Free bibliographic & abstract searching online followed by selective patent perusal/ordering.
3_ Paging manually through the relevant official gazette (the US gazette is searchable).
4_ Retrieval of the titles & abstracts within appropriate class/subclass then selective review and patent perusal/ordering.
This last avenue is particularly resourceful and swift. Start by reaching for The Official Catchword Index, a book by World Intellectual Property Organization (WIPO). This will tell you the possible class/subclasses that will interest you. You could word-search a patent database and note all the class/subclasses found. Lastly, you can always reach for the three separate printed guides that lead you from section to subclass.
The result should be a collection of class/subclasses that may interest you.
With this information, you can now browse all the patents in the class/subclass. This process will help you locate all the patents that may interest you since patent classification is more reliable than free text search. (Note, both British and American spelling appears in patent databases.) This also allows you to quickly review the patents in other countries.
If you are undertaking a novelty search - is a patent sufficiently unique from other existing patents - then you must review more than one country. There can be a significant delay before patent applications reach other countries without affecting the protection. Case in point: Australia only accounts for 7% of the world's patents.
Further Search Strategy
Patent search strategy is further discussed in the Introductory Manual to the International Patent Classification (IPC) found on the WIPO website. You may also wish to reach "Searching for Patents" (www.ummu.umich.edu/library/PTO/newpatsearch.html) from the University of Michigan, and "Patents" by Simon Fraser University Libraries (www.lib.sfu.ca/kiosk/nelles/patents.htm).
Trademark law is designed to protect consumers from confusion. The law can work to protect business investment in brands & slogans, but only if the business behaves in particular ways which protect consumers from confusion: actively using the trademark, working to restrict the trademark from becoming generic, routinely searching for unauthorized use. For a very clear description of trademark use, and the responsibilities of trademark owners, read the short webpages A Guide to Proper Trademark Use, and How are Marks Protected both by Gregory Guillot.
Trademark Law has implications for searching: Just because a potentially conflicting trademark has been found does not mean it should concern you. It may be simple to show or argue that trademark ownership has lapsed and become abandoned unintentionally.
A common law search involves searching records other than the federal register and pending application records. It may involve checking phone directories, yellow pages, industrial directories, state trademark registers, among others, in an effort to determine if a particular mark is used by others when they have not filed for a federal trademark registration.
The system may appear particularly legalistic, and it is. Recent Australian Trade Marks Office Decisions (www.austlii.edu.au/au/cases/cth/ATMO/recent-cases.html), information ultimately supplied by IP Australia, displays this vividly. However, much trademark activity is self-evident. In Australia, A$350 and a minimum of seven and a half months will usually earn you a registered trademark. Should you choose a trademark and find another has used it, you will most likely receive a 'cease & desist' letter and forfeit the value you may have invested in the trademark.
This leads us to the importance of commercial trademark databases, watching services and other commercial services. Searching both prevents investment in an unusable trademark and inadvertent infringement by others - a responsibility of trademark owners.
A concise list of the 42 classes of the International Trademark Classification codes courtesy of Master-McNeil Inc. WIPO is in charge of the full class description, currently The 7th edition of the Nice Classification, but this is rather lengthy. IP Australia has a simple search feature of classification terminology.
Trademarks are assigned to a particular class of product or service. A slogan or mark, for example, could be registered for use in movies but not computer products. The situation has changes recently but let us explain the difference down the page a bit.
Originally, all goods and services were broken down into 42 classes. These classes are international divisions organized by WIPO (World Intellectual Property Organization), so are the same from country to country. Registered trademark documents will explain at length the types of products & services covered by a particular trademark.
There is some bleeding between categories, and trademark examiners are unlikely to grant requests for nearly identical trademarks in similar categories, but class plays a role in granting trademarks.
Recently it became necessary to list specifically the products or services to be covered, and the 42 classes have been expanded to a collection of specific sub-classes, which is reminiscent of patent classification, but far less useful.
Class is important as trademarks are class-specific. You can search by class in certain registered trademark databases, but this is not particularly a good search technique: you are far too likely to miss a comparable trademark.
Trademark Picture Descriptors
Search Image Descriptors, by IP Australia, here abbreviated, needs basic words - simple like bird or butterfly.
One difficulty with trademark searches is that all the tools apply best to words which appear in trademarks. What of the picture? The solution appears to be image descriptors. I am uncertain of the international nature of image descriptors, but at least in Australia, there is a standard set of image descriptors. IP Australia allows you to search for other trademarks with a particular picture element - irrespective of the words involved. But to do this, you must first select the appropriate image descriptor.
Trademarks are just one element of intellectual property rights; patents, copyright, industrial design rights, circuit layout rights and plant breeders rights. As certain registered trademark databases are free online, some trademark research can be accomplished quite simply by the novice.
1_ To find existing trademarks similar to one you plan to register.
2_ To find existing trademarks similar to one you plan to use as a trademark.
3_ To see if a trademark is similar to a business name you consider using.
4_ To search for possible infringing trademarks.
This is further explained in this help file by IP Australia.
Misc.int-property has a lively usenet discussion on Intellectual Property. Access the newsgroup directly: misc.int-property or search the past discussion through Deja.com's usenet archive).
For a lively discussion of how trademark law affects internet domain names, consider the trademarks-l mailing list at Washburn University (read the Scout Report description scout7.cs.wisc.edu/pages/00000138.html).
Lastly, we have not yet researched the categorization of industries using standard SIC or NAICS codes. In simple terms though, all industries are given a specific code. Sub-industry is given a more specific code. More and more specific codes refer to the production of more and more specific items. Of course, some companies will be involved in a collection of industries.
Two competing standards, the SIC and NAICS, have different codes but the same coding system. Each code system can be mapped on the other, so will cause you no undue concern. Trade statistics, digital business directories, and national statistical bureau industry data will all use the industry codes.
Information has value. It also has other qualities that will assist you to judge information you may consider buying.
Accuracy: the factual nature of the information presented. If the statistics purport to show a particular trend - how large is the margin of error? How large is the sample size? How likely are there to have been factual errors in their development? The measurement of statistical error is now a refined science in some fields. A statistical result can be inaccurate when the sample size is too small, if the margin of error is too large, the sample collection procedure incorrect, or a number of other situations.
Reliability: the support for trusting the solutions, both from additional resources and from being able to duplicate the conclusions. This includes the reputation of the researchers. No matter how inaccurate and biased you may believe certain facts to be, successful independent support of a suggested fact does improve its value.
Bias: conscious or subconscious influences that affect information. Bias can occur in collection, preparation and presentation of information. Most information you find will be tainted. Secondary information is deeply affected. Statistics are not necessarily less biased.
We counter bias in several ways. Firstly, we try to be aware of bias. Where is bias likely? Which direction would the bias affect the information? Secondly, we try to collect information with different bias. This is why research based solely on government research, no matter how accurate and reliable, is less valuable. Often information from different countries can counter bias. Thirdly, we need to accept bias is likely to exist. This is why primary sources are often more valuable than secondary sources. This is why tertiary sources, like experts, can rarely stand alone.
Age: The date information was created or compiled will feature prominently in the value of information. Dates given sometimes mean the date information was created, or the date information was compiled. How old is a book compiled in 1995, which took the author 10 years to finish? I find statistics often forecast information, prominently displaying recent compilation dates but still use old census data or the like to draw their conclusions. Information on the internet typically has no date, and can be severely challenged because of this.
Purpose: purpose merits further discussion. When you are uncertain about potential bias, you can look for reasons to distrust the information instead. Suspicion is not equivalent to bias, but it can be thought provoking. Privately, I have heard repeated rumours important national statistics have been fudged in different countries. A government research report investigating the price of books in Australia would have a political purpose, a purpose that provides the climate for some potentially significant bias. A tell-all book by industry experts often includes a tremendous quality of insider experience difficult to find elsewhere. While there may be a purpose of self-aggrandizement, the purpose is less a climate for significant bias. Medical research has perhaps the greatest climate for significant bias, and this suggests the greatest standard of proof and external, reliable support.
Accuracy, reliability, bias, age and purpose are very important in research. This is what leads us to an appraisal of value. For years, the tobacco industry funded 'independent' research finding smoking minimally harmful to health. It is now likely there may have been errors brought on by accuracy, and bias. Certainly, purpose was in doubt. As new studies show smoking is harmful, we can also say the original research lacked reliability. In some topics, like the internet, research is perpetually suspect because it also ages so quickly.
I have seen further discussions that add 'Coverage' and 'Authority' to this checklist. Both have bearing on the value of the information contained. By coverage, we mean how much detail is invested in covering a specific topic. Sparse or shallow coverage is closely tied to missing critical aspects of information. News stories frequently have limited coverage.
Once you are acclimatized to these elements, you begin to see potential for error in a whole range of information. Real-estate association figures, expert opinions, Toothpaste advertisements and National GDP figures all occasionally display some degree of warping and manipulation, clouding the truth. The solution is awareness, comparison and careful analysis. As a personal aside, this is part of the reason for my personal dislike for market research: it is often taken far more seriously than warranted and mean far less than suggested.
Of interest to you now, the internet offers you a very good look at the information industry. Most organizations involved in the information industry publish exhaustive product descriptions on the net. Most commercial products are delivered electronically.
Professional Search Resources
As a profession, researchers have diverse skills and needs. Constantly working with information, in a competitive market, professional information seekers are often starved for high quality information about new research techniques, skills and sources. This can be found through discussion groups like BusLib-l, websites on library science like LisNews.com, associations like the Association of Independent Information Professional (AIIP) and the Society of Competitive Intelligence
Professionals (SCIP), events and conferences as listed in the journal Online & CDROM Review.
As a more introductory resources, start with the a selection of books and webpages like:
- The Intelligence Cycle, courtesy of the CIA library - a single-page summary of the research process.
- The Information Broker's Handbook by Sue Rugge and Alfred Glossbrenner, McGraw-Hill. Third Edition (1997) - a must-read for those interested in the business side of information research.
- Secrets of the Super Searchers by Reva Basch. Unfortunately a 1993 book, but unique as a look into the field of information brokers. Published by Eight Bit Books. (Dewey 025.524 BAS)
- Online is a good bimonthly magazine for information brokers. (Dewey 025.04).
There are a number of interesting periodicals, most owned and marketed by Information Today Inc. BUBL lists a number more . Others are electronic publications, like LIBRES : Library and Information Science Research Electronic Journal, a biannual scholarly journal and Information Research .
The commercial databases of interest are LISA (Library and Information Science Abstracts), ALISA (Australian LISA), Information Science and Library Literature.
The links for these resources and more are on the Spire Project at spireproject.com/links.htm#3
The Professional Search
Professional research demands a more effective, timely use of resources at hand. It is challenging, and it is an occupation.
Unlike research undertaken for your own needs, professional researchers often know little about the topic they are asked to investigate. We may not know the phrases which accurately describe a specific concept, we sometimes don't recognize gold if its labeled copper, but we have to do everything fast - lest the cost escalate above the expectation of the client.
Client? Yes, professional research starts with the client.
Professional research involves far less book and library work, and far more interviewing, database access and online article purchasing. When money is involved, time becomes very precious. The first luxury lost: the luxury to get to know the topic in leisurely detail.
Instead, professional research starts with a careful description of exactly what information is desired (and why). You must quickly build a good plan about who you will ask and where you will look. This is, after all, your primary skill others have great difficulty in duplicating - traversing the information sphere swiftly and skillfully.
Many researchers today can search databases. Most researchers are familiar with library work. Personal research has the added benefit of being part of the learning process. So why reach for a professional?
The first unique skill we must refine is our knowledge of the research tools. Computer databases may be easily accessible, but are not easy to search. Interviewing is conceptually simple, but is not simple in practice. Each aspect of research can and must be refined.
The second unique skill: interpretation. Working with information frequently allows us to better judge the reliability and bias of the information we retrieve.
Most information you find will be tainted. Secondary expertise almost always present information in a biased way. You will counter this bias both by being aware of the bias and by interviewing someone with a different view. An inventor proclaims a devise in near completion - do we believe? Obviously it requires further study. This is often lost on amateur researchers - by collecting information from a variety of different resources, with a range of bias, we can create a superior assessment of the value of each item of information. Research based solely on government research, no matter how well done, is unprofessional.
The third unique skill is speed. We must be able to provide research as a service, as a business, quickly. This goes beyond research to the banal work of copyright and legal protection, selecting effective research tools, finding fast expertise to supplement your own.
The skills of professional research are like the artist. They take a lifetime to learn. The work is just business.
The Database Industry
The commercial information sphere existed in the 1970's and earlier. It is far more developed, far better organized, far better funded, almost always far more valuable and expensive than every other research resource.
For the most part, commercial information is arranged reasonably uniformly in large databases of full-text or bibliographic information. Some databases are small, single source documents, while others are vast unfocused collections of, for example, all the news from the last 15 years.
Most directories and journals can be made into a database, but single-source databases do not enjoy much financial success. The market is too limited and the cost of promotion too high (except in a local market with newspapers). To overcome this difficulty, single sources are grouped together into larger collections of databases on a particular topic. These large database groups have become primary tools in commercial research.
Developing these databases requires considerable expertise and expense. Sometimes data requires abstracting, interpreting, and as with some Lexis-Nexis and WestLaw databases, even expert legal interpretation. Sometimes firms develop a portfolio of databases. Sometimes firms build just one.
The marketing and consumer billing of such databases is then provided by a relatively small collection of large database retailers. A list can be found in our "Commercial Databases" article. As an indication of the size of this market, Knight-Ridder sold Dialog & Datastar for a figure approaching half a billion dollars.
This industry consisting of a wide collection of players, each improving and developing the information from individual periodicals, journals, news items - all very confusing for the end user. This is elegantly illustrated by the database descriptions for Lexis-Nexis databases (their preferred term is libraries). See www.lexis-nexis.com/lncc/sources/ as an example of specific databases. In particular, see their library on patents.
Many single-sources appear in different commercial databases. Further, different databases sometimes include different information from the same single-source. One database may include just abstracts, another may include fulltext, chemical indexing and more.
As a result, most researchers are unfamiliar with what exactly is being searched.
This state of affairs is not unproductive. Searching a 'Database about Patents', is uncomplicated. You receive information on patents. It is simple, informative and incomplete. Of course, researchers are busy people. Time is critical. Results matter. We are familiar with this system from searching the web too. Just what are the differences between All-the-Web, Lycos and Altavista? If we fully understood the complexities of each available database, yet still have a few databases to consider - would our search be better? Often not. This system of incomplete information also leads to great customer loyalty to database retailers. Comparative information is dropped in favour of simplicity. Ultimately, I am hard pressed to compare prices let alone describe the differences between information products.
Prices actually model many a developed industry, remarkably similar to the telephone or banking industry. As one friend commented, "bullshit baffles the brains". The prices are complex on purpose. It becomes very unrewarding to compare prices, and any conclusions are only valid in specific circumstances - and will not hold in others. This trend, familiar to us as a multitude of banking changes and telephone pricing schedules, reinforces our need to stop price hunting and trust our favoured information retailers.
This is not to say we should not compare prices, just that you will find comparing prices a most unrewarding experience. It really requires you to search and retrieve the same information on different systems - and this does not even begin to touch different databases, or database groupings, or variables that change over time like download speeds.
Optimistically, there are actually very few important databases in each field. It may be simple to browse each of the databases in your field and compare directly. You may never need to know more than a few databases intimately.
Realistically, you will yearn for a simpler solution.
The commercial information industry has distributed information this way for several decades. It is both sophisticated and quite difficult. You will need to become experienced with inverted indexes, search techniques (Boolean, truncation, proximity, field limits ...) and properly phrasing the question in a way that will be answered by a database search. I have always found the value of a database search directly proportional to the length of the search query.
If you are incompletely skilled at database research, you will take longer, pay more and locate far more information (or unwisely discard more) than desired.
This is very different from searching Altavista and Webcrawler.
Doing your own research offers an opportunity to more closely influence the research process. Sometimes only you understand the topic and sometimes you can more quickly discard unimportant details. Certainly it is becoming simpler to undertake some work yourself.
Many of the commercial databases are also available in a CD format. Substantial subscription costs limit their availability to large research institutions and libraries, but exceptions exist. I believe world books in print costs AU$5000+. Provided you can find casual access, it will cost you far less. Keep an eye on the age, though. Sometimes (and only sometimes) online information is more recent.
The decision between undertaking research on your own or seeking external help is really a decision based on your research expertise, your budget, your access to information, your time, and the importance of finding all the information available. It also depends on your access to some decent research assistance. I will soon be able to help with this.
What I do know is a newcomer to the commercial information sphere will seriously underestimate the difficulty involved in searching, and underestimate both the cost of research and the cost of research assistance. Keep in mind this same system serves the needs of large commercial conglomerates, professional legal research, and well financed government studies. The commercial information sphere contains far more valuable information than you need. Sometimes the internet is just an interesting sneeze in comparison.
¤ Article: The State of Databases Today:2000 by Martha E Williams, tracts the development of this industry with survey results. Found as the foreword of the Gale Directory of Databases.
Squeezing the Info-Broker
I was reading an interesting article by Anthea Statigos in ONLINE  that stirred me to thinking about the future of Information Brokerage. The article in question outlined the shift of information brokers into the marketing department, towards new roles in negotiating information access licenses, helping people understand and select appropriate resources - and oddly, in overseeing the intranet development process so as to deliver the information people need.
The article premise is rather accurate - as far as it goes. But I wonder if the true message behind this shift is the decline and death of information brokering as a profession? If information brokers (also known as information professionals) are moving to new roles, are they vacating the old roles, the traditional roles in the research process?
In my library, I reach for the Information Broker's Handbook  for a relevant quote:
"The heart and soul of the information broker's job is information retrieval. But many individuals offer information organization services as well."
So, Information Retrieval, and Information Organization. Anyone who has seen the simple information retrieval options incorporated in recent information packages can be in no mind that the information retailing industry is certainly minimizing the need to reach for an intermediary. Technology is certainly closing the gap - but this development has always been in the cards.
A central difficulty for information brokers is a simple maxim: provide better results than clients doing the search themselves. Often working in unfamiliar territory, a researcher may find it very difficult to excel. There are two dilemmas here. Firstly, while we may pride ourselves in accomplishing unique requests, we have expensive costs associated with one-off searches. There is little likelihood someone else will ask a similar question. There are simply no possible economies of scale.
Secondly, our search difficulty is not shared by the client. The client has difficulty with the technology - certainly. The client does not have difficulty with recognizing the wheat from the chaff, the gold embedded in the articles and at a basic level, the search words you will need to get to the right stuff.
There is a very good reason why university students are pushed to learn basic and sophisticated search technologies.
There is another take on this story.
Creating Value in the Network Economy  includes a chapter by Philip Evans and Thomas Wurster.
"emerging open standards and the explosion in the number of people and organizations connected by networks are freeing information from the channels that have been required to exchange it, making those channels unnecessary or uneconomical."
"Newspapers and banking are not special cases. The value chains of scores of other industries will become ripe for unbundling. The logic is most compelling - and therefore likely to strike soonest - in information businesses ... All it will take to deconstruct a business is a competitor that focuses on the vulnerable sliver of information in its value chain."
And in the back of my mind comes the thoughts that maybe the information retrieval function we have been providing is just one such information business. This business, attempting to be the pinnacle of the research process, is ripe for unbundling. Not only can our function be incorporated directly into the advertising and technology of the information resources we use, but our skill can also be coded into simpler and simpler guides and resources like my work on the Spire Project.
Perhaps as an industry we never managed to secure our captive market.
Initially, this will affect that mainstay of information brokerage: commercial database retrieval. And like the newspapers that will begin lose the profit center of classified advertising (ripe for unbundling and delivered electronically,) additional pressure will be applied to the business of providing information research services.
Eventually, we retreat to other areas as information professionals: Information Organization, Research Education and Training.
Somewhere in amidst this story lies a new role for researchers. The need for research certainly exists and is forecast to grow dramatically as the information age develops. What is lost, sadly, is an understanding of the ease at which this work will be done. This is certainly destined to move away from being an industry for professionals working at $50/hr to $150/hr + costs! Others can provide this work, easier than now. People we will most likely call researchers - and not information brokers.
This is more than a push towards specialization. There is another way to see this transformation. The information broker was a retail point for wholesalers who are now firmly selling directly to the consumer. There is much less of a need for an intermediary between database retailers and information consumers - and there is a firm trend in this direction.
Information brokers defined their role in the information industry as masters of the difficult technology of research, capable of finding most anything. Come to us when you are lost and we will find the answers - for a price. We know the technology, the meta-resources, the tricks used to find information. We routinely retrieve a higher quality of information, far faster, than you can yourself. The standard model: a library run service offering primarily database search & retrieval for their patrons.
This business model is coming to an end.
Yes, perhaps the information broker is dead. Soon to be replaced with low-wage researchers and research assistants, and high-end information executives and research trainers. Like it or not, most of us will incorporate a little more research into our current work, and reach for a little more intelligible research resources. Everything else will be accomplished by true specialists.
 Online (a periodical with some coverage of library & information research. July/August 1999 p71-73, by Anthea Statigos of Outsell Inc.
 The Information Brokers Handbook p.21, by Sue Rugge and Alfred Glossbrenner. Windcrest/McGraw-Hill. 1992.
Creating Value in the Network Economy, Edited by Don Tapscott. Chapter 2: Strategy and the New Economics of Information by Philip Evans & Thomas Wurster. p.18 & 25. A Harvard Business Review Book.
The Information Service Industry
Private Detectives, Professional Database Researchers, Library Researchers, Legal Researchers, Commercial Database Producers, Commercial Database Retailers, Magazines, News Organizations, Libraries, this is a big industry. Information Research is just a process linking together people seeking information with people who provide it.
It seems in vogue to reconsider all businesses as being in the information business. My accountant and your stockbroker both provide information services. While I agree these two professions are intensive users of information, I purchase their interpretation of information. It is not a trivial difference but nonetheless serves to cloud the true size of the industry just involved in selling you access to information.
From university days, I was aware of the large commercial database retail giants (Dialog, Dun&Bradstreet) and the database producers. I also met with some of the firms distributing largely to the library market (like SilverPlatter). Little further information about these businesses leaks beyond the research industry.
Some of the businesses are aimed primarily towards the library community. Database subscriptions are unlikely to interest an individual. Few are appropriate to businesses. Let us briefly scan just the products and services intended for a consumer.
Commercial Database Retailers - These organizations devote their effort at bringing commercial database information to individuals. Dialog, Datastar, Infomart, Lexis-Nexis and others will assist you to access information only available through commercial databases. (See our article, "Commercial Databases".)
Current News and Current Awareness - If you want to know of new articles and news important to you as it is reported, then there are a selection of services available: news by email, news by newsgroup, news by periodic automated database search, and other novel approaches. Costs for this service have fallen dramatically: effective solutions start at about US$10/month and are not strictly dependent on range & quality of information. (See our article, "Newswires & News Databases".)
Information Brokers - There is a whole industry of specialized researchers who will try to locate and compile research to your specifications. The backbone of this industry is payment for access to commercial databases, but different information brokers will gladly enter into any effort required to locate information. Information brokers, business librarians, legal researchers and others all use the tools described in this website, as a service for their clientele. (See our article, "Research as a Discipline".)
Patent Assistance - Patent searching is one of the more difficult branches of serious research. Some of the resources are free on the internet, and commercial patent databases are readily available through the database retailers. If there is serious money at stake, you must consider legal assistance. Certainly use lawyers for patent applications (beyond the scope of the Spire Project). But a patent can also be a research tool. Patent research can provide you with what is often the first appearance of costly commercial research. This is both a source of cutting edge solutions and competitive intelligence.
Media Monitoring - Certain firms solely focus on monitoring TV, radio & newspapers. These firms typically run teams who page through newspapers looking for matching articles, then post or fax to the client. New technologies are also advancing into this field.
Document Delivery - Most local bookstores will gladly help you locate a book from their directories but if you want a book from abroad, or an article from a journal or magazine, you will need the assistance of another set of information workers. A distinct but similar approach assists with the distribution of journal articles. Many of the document delivery firms are closely tied to information organizations. Little information is available about these organizations.
Trends in the Information Sphere
For the past few years, individual database owners/maintainers have been flirting with the idea of making paid access available through the internet, rather than the existing system of allowing database retailing firms to promote and market their databases. I have heard rumours most database producers earn up to 30% of retail price when delivered through database retailers - 70% being retained by the database retailer.
The internet is not a commercially viable alternative...yet, but some databases have emerged with alternative funding despite this (Library of Congress, ERIC, Medline). Others are creeping in around the edges by offering subscribers access at a much reduced flat annual fee (Computer Select at one time). I expect most database producers are waiting for a meaningful way to charge. Digital money holds the key but despite the hype, practical use appears to be a medium to long-term reality.
A second trend is internet publishing itself. Gradually, the information is getting easier to locate. (Don't laugh please - its undignified.) We are also getting better at using the internet as a tool to disseminate information. We have the very visible, if perhaps short-lived, search engines but also other efforts like archives of FAQs, archives of guidebooks, applying the Dewey decimal system to the internet, specialist directories, subject guides, specialist search engines. This will be a lively field for several years to come. As it gets easier to locate the good information, perhaps the lines between commercial quality and internet quality will begin to merge in places.
The third trend is the very promising prospect of paying for information by the page through the internet - viewing the results in a web page immediately. There are some technical hurdles yet, but certain elements are already appearing in ventures like DialogWeb. This step may prove profitable for ATM vendors and owners of internet cafes, pubs and kiosks. It will also herald a dramatic drop in the cost of information.
Are We Developing an Informative Internet?
Several serious glitches have delayed the further improvement of the internet as an effective information resource. Oh, sure it is the world's largest library and thousands of new webpages are published every hour. But this trite statement disguises how slow the informative value of the internet is developing.
The internet holds so very much promise. Marketing mantras tell us so, but few of us grasp this technology will completely rewrite the rules of community, government and the exchange of intellectually valuable information.
One of the hurdles is vision. We are not yet delivering the information pertaining to community, government and the exchange of intellectually valuable (improved) information. We are only proceeding quickly with market information and computer-related information. We are still toying with further ways the internet can transform other areas of our life.
We should have achieved more by now.
The net is still very disorganized. A number of developments promise to eventually make the internet less confusing and better organized. To date, we have several cumbersome techniques, a large collection of search tools and a great deal of potentially interesting links.
As mentioned, thinking about who is publishing assists us with our search. Applying this to where information is emerging - and we learn much of the best information is not reaching the internet. Certainly, the commercially generated information is not reaching the internet (covered below). The large research studies paid for by public funds and slowly aging on the shelves of government and non-government organizations are also not coming online. Government, institutional and commercial organizations primarily publish brochure-ware - as befitting the presentation of market information. (Even offering to publish such documents freely does not appreciably affect this trend as the restrictions are not financial, but mindset. See our past work.)
We should recognize few of the more valuable documents emerge online.
Further Reading: Socially Responsible Publishing on the Internet ('97)
(Available on request)
A Census of Regionally Important Documents on the Web ('96)
(Available on request)
The internet excites me with the promise of a real community rebirth arising from this technology. For the first time in history we should be able to discuss in an informed manner any number of issues from crime to taxation. Tied into this are issues of government transparency, international assistance, anti-corporate market reform and community involvement. Unfortunately, my experience with mailing lists and more recently with a newsgroup confirm the difficulties in developing discussion. Discussion groups function as notice board. Unfortunately, the difficulty in developing participation, and in moderation, are just a little too cumbersome to be successful. For many discussion groups, the chaff overwhelms the wheat, and the information content is far from considerable.
The financial rewards are also minimal for establishing and maintaining discussion groups. Dramatic improvement to the informative value of the internet is unlikely to emerge here.
Further Reading: How to build a discussion on the Internet (by David Novak - available on request.
We have alluded to the importance of editorial and organization on the internet. There are several severe limitations to this - first and foremost the difficulty in gathering financial rewards for meaningful work improving and organizing information.
I am being circumspect here. There is money available - just not where it is needed. The most important resources in professional research are the contents of the commercial information sphere. This sphere existed decades before the internet, is far better funded, and is far larger. To compare commercial and internet information is almost heresy. A bridge between these two, internet and commercial, emerges slowly.
Digital money should grease the exchange of information by dropping the cost of exchange considerably. Today, credit cards provide this service. This works, at times, but digital money would allow for small amounts of money to change hands. This appears to be a critical threshold for bringing much of the commercial information to the net.
About 5 years ago I was introduced to the Thesius Model - an economic model to pay the intellectual investment in publishing and organizing interactive multimedia. Years earlier there was Xanadu. While I have serious reservations about both, they do illustrate the intellectual foundations for effective use of a tool for exchanging small amounts of money. It opens the doors to direct delivery of copyright work - which in turn opens an effective economic model for publishing improved information on the internet.
Without digital money, proprietary information can only be exchanged digitally by gift (that is free - the initial driving force of the internet information sphere, or by credit-card purchase of access to passwords to external networks - the current method of accessing database retailers.
This has the unfortunate effect of limiting the interest both of internet users in the commercial information sphere and the commercial information retailers in the internet. Oh, there is movement in both directions, but not at the scale experienced in other industries.
Further Reading: The UWA Theseus Project (www.arts.uwa.edu.au/TheseusWWW/)
The Xanadu project (www.xanadu.com or concise summary - www.sfc.keio.ac.jp/~ted/XU/XuPageKeio.html)
A Look at Information Congestion
Finding information on the internet is a skill. Finding information on the commercial information sphere is also a skill. There is a great degree of overlap. The awareness of the general public as measured by use of commercial resources is very limited. This is further seen from the simple use of search engines & the abundance of simple web search.
To hammer this point in, let's take a momentary look at search engines. Most searches end in 1000's of results: here are the first 10. Do you really think the first 10 or 20 or 100 sites listed are particularly better than the next? No - you have a random selection of resources. A selection generated by computer based on the most simple of criterion. (We should also mention how some search engines sell placement in search results).
Remarkably, the search engine is the much-vaulted entryway to the world of information!?! Clearly search engines will not dramatically improve the informative value of the net - not by themselves.
Multiplication of Information
One complication of poor information organization is an inflation of information overlapping nuggets. Information on the internet is so difficult to locate we have almost a continual need for more publishing. Information must exist in numerous locations to reach an intended audience. Promotion of the simplest nature - recognition for the best for a given topic - becomes exceedingly difficult. Only when 20 sites publish or report a given fact does it become accessible.
Curiously, this is the state of affairs in the wider community. Promotion is an expensive specialty. Numerous copies, distributors and references are required to generate any kind of significant awareness. Why should the internet be different?
Actually, why should the internet be the same? Definitive like the US Census Bureau have no need to duplicate this information; to have alternative presentation sites. Yet such sites appear the exception. Consider a search for the best resources for patent research, we are greeted with 954 websites (Altavista search for "patent research" Jan-19-2001). Presumably, most of these sites discuss patent research - Right? There is no technical or theoretical need for such confusion. I wonder if such duplication may be more of an affliction than natural tendency.
It is relatively difficult to earn money from publishing improved information, or organizing information already on the internet. Given the intense interest in this technology, a collection of models have emerged. A brief tour of these models will highlight the financial limitations to improving the internet as an informative resource.
- - - Working for fame (but not payment)
This model works well in open source software programming, and some of this ethic certainly extends to publishing information.
Simple altruism/complete lack of justification
School students and internet novices in particular may not need to justify anything. Unfortunately, such work is usually neither consistent nor persistent.
- - - Commercial promotion
Promotional funds can be used to publish information. Most promotion is short-sighted, limited to presenting market information (like product information), but in time government and associations will fund publishing in-house information for purely promotional reasons.
- - - Invested commercial businesses
There are certain commercial opportunities to earn money through banner advertising and sponsorship.
Direct payment for improved information (perhaps with digital money), direct payment to authors (Theseus model, royalty systems), and direct state sponsorship need not be necessary to fundamentally improve the internet as an information resource. Academic peer-reviewed journals do not pay for articles. Commercial periodicals are supported by advertising, and the token subscription costs of magazines usually just covers distribution costs. Fame motivates many efforts, not just online, and we do not feel the need to habitually justify everything we do.
In no small way, as more people become adept at publishing quickly, important information will move on the net faster. Similarly, information will also gradually become better organized. Economic models will not improve the informative value of the internet like direct payment. Most current limitations have economic solutions. Unfortunately, my reasoned opinion is no economic system will arrive in time to make a difference.
We know something of how information gets published, and how many important documents do not reach the internet. We have described how information is organized on the internet and how limited editorial vetting and organization have given rise to certain traits which give rise to the traits like superficial indexing, information duplication, and a need for research skills.
Financial rewards and financial tools are unlikely to solve these difficulties. We can only hope for a gradual growing out of our current difficulties. We will have more of the same for several years to come. It is simply the nature of the internet (as currently constructed).
For you, a greater understanding of the internet will assist you to judge the worth, likely source and likely venues of the information you seek. The same is true in the larger world... database, book & article. Each has different traits and qualities, reinforced over time. Your understanding of these traits and qualities in part defines your skill as a researcher.
As to the future of the internet, on the positive side, there are certain qualities to internet communication that make it uniquely valuable. Internet communication is inexpensive, relatively rapid, and increasingly accessible. On the negative side, the internet is badly vetted, potentially very time consuming, and up against very well entrenched systems that have been running for either decades or millenniums (considering databases or books). Elements like a promised but functionally absent digital money, and the lack of a meaningful way to recoup the costs of vetting online information, make matters worse. Despite this, despite ALL the teething and fundamental difficulties, the internet is sufficiently superior to ensure considerable continued effort to improve the informative value of the net.
The Multiplication of Information Effect.
Just as the internet permits a multitude of voices and perspectives, so it permits - and promotes - a multitude of the same information. Yes. For a several reasons we shall explore first, the internet multiplies the amount of information there is on a topic. This insight can be used to improve searching for information, as I will show at the end of this article.
The internet is a system of communication. Like all other systems (books, articles) the internet systems affect the way we communicate in different ways. The absolute number of books depends on what is thought can be commercially viable. We could say books permit, and promote a limited number of books on the same topic.
The internet does the opposite.
The sheer ease of publishing information on the net is one factor in information overkill. The net is an easy place to publish information, requiring only individual effort. There is no budgetary concerns, nor does attracting an audience initially enter into the publishing process, as they would with articles or books.
The ageless state of the internet also rapidly builds information. Old information is not removed from the web automatically as in mailing lists. Old books go out of print and past magazine articles are shelved, indexed and categorized so we must intentionally include them in our search. The web is not built this way, and information well past its natural expiry date remains.
A dramatic change is also occurring as our society becomes digital. In the pre-internet economy experts and specialists in every field are distributed to meet needs. In the networked world, expertise is not only shared more rapidly, but is required in less places - whether we speak geographically or intellectually. Said another way, in cyberspace, competition for expertise is most fierce. To be an expert, you need to be more expert than others within reach - and since gradually more and more experts are within reach - digitally - we form a glut of experts.
Oh, this is not a doomsday message - merely a middle ground on the way to increased specialization and focus. Historically we can easily see Newton was a Scientist but Einstein was a nuclear theorist. Today we have quantum theorists. The future is full of very long job titles.
A by-product of this movement is a current glut of experts - perhaps a permanent glut of experts. With more people connected and satisfied with distant communication, a vet who writes about immunizing your dog becomes one of many you can reach for, in several countries. Previously we may have been limited to those in your state - but no longer! Now we can pick up immunization recommendations from any number of experts previously separated by distance or with minimal overlapping media outlets.
We can see this clearly on the web. I wrote an article on country profiles and yes, as expected, the UK, US, Canada & Australia all write and publish traveler advice notices on the web. Are they different? Occasionally. Is this a case of multiplication of information? Yes. We have reached beyond the applauded internet trait of permitting a multitude of communication and reached a state where similar information is interpreted by different organizations, and distributed electronically.
This is not unique to the internet. News stories also contain considerable overlap from one newspaper to another. A search for dog immunization on one of the large news databases will result in numerous articles all presenting essentially similar information. Business periodicals also have considerable overlap, and while each may attempt to differentiate their articles from others, there are severe limits - and besides, most likely articles do not have an overlapping clientele.
But on the internet, there is overlapping readers. An article written for the web is an article written for everyone. Anyone can read it. Thanks to the popularity of search engines, it can be available to anyone. At least in theory.
This leads us to internet promotion. Information on the web is sometimes so difficult to locate we have an almost continual need for more publishing. Real traffic is difficult to promote normally, so websites devoted primarily to delivering information have a real difficulty reaching their audience. This translates either to the need for expensive commercial promotion, which often can not be justified, or into reaching only those who search carefully for your information. The latter means multiplication of the same information.
In writing this article, I see the effects mentioned will lead to changes in the future. As I write "attracting an audience initially enter into the publishing process", I think to myself this will obviously change. Attracting an audience will emerge in time as the primary step in publishing. There are many places to take this discussion, but my job is a researcher, or rather an internet-focused search theorist. (Long job titles will be in vogue). Let us focus on how these changes effect this internet as an information resource.
1) Any effort to organize the internet is diluted because of these efforts.
2) Any effort by the researcher to find different perspectives will be confounded by the number of people with the same perspective publishing in the same medium.
3) Certain fields are more heavily hit than others. Internet advice on what search engines to use is ubiquitous. Java Programming hints are numerous. More specialized topics (like internet-focused search theory) are less affected.
4) Viral marketing - a catchword for sure, hopes to achieve promotion by seeding many sites with information. Perhaps an innovative way around accepting the multiplication of sites delivering the same or similar information.
In phrasing the question you wish to answer, before the search, experienced researchers will focus on what information is likely to be available in numerous overlapping versions. These questions can be answered with the search tools that cover information in a more random manner: Search Engines do this very well. Tightly focused questions, less likely to be distributed so completely, should be approached with different tools: mailing lists and nexus points, long complex search queries and index points.
In conclusion, the internet will become far more cluttered than we had expected. I had previously predicted that search engines would grow to meet the needs, but this is not to be. Search engines will continue to serve up answers available from multiple places in the world. There is market enough in this, and minimal need to tackle anything more.
A search for information on the internet is not essentially different from the standard information search process. You still need to start by outlining carefully just what you are hoping to locate. You also need to be aware of the peculiarities of the internet as a researchable resource (or rather a collection of resources). If you expect instant delivery of exactly what you require, free, then you need a reality check (and I am sure you will get one real soon). Sadly, the printed media tends to overlook this.
As with all resources, the more familiar you are with a given resource, the more efficiently you will work. Get to know the internet for a time first. Understand how it works. Then re-adjust your expectations and file it as just another collection of resources, perhaps preferable in certain circumstances.
A Structured Approach to Searching
Much of this book has been devoted to describing what we could call a structural approach to finding information. We build a question, select a format and then search in an essentially static manner. There are only a few resources of interest for each format.
On the internet, we again do the same. If you want to search online periodicals (a specific format for information with specific qualities that might be appropriate) there are just a few sites to review. The search is simple and straightforward. Search then read then reassess if it helped answer your question.
The structured approach has been a simpler way to introduce a far more important application. Searchers know where answers are already - without ever having read the answer before - without having studied the topic. This is, after all, one of the few reasons to even consider paying for professional search assistance.
How does a searcher know where answers lie?
By building up a clear understanding of what information is out there, where it resides, and how to get to it, a searcher learns to anticipate the location of answers. Anticipation is everything.
Know Where to Look
Let's look at information itself. Information passes from producer, to organizer, to consumer. It travels many paths in this journey. Superficially, we can observe internet communication travels via email, newsgroups, and webpages (and others). Let's call these tools.
Looking deeper, we observe information emerges from just a few generalized sources: knowledgeable individuals, informed government employees, grant funded educational projects, commercial organizations and a few others. Each source produces a particular type of information, distributes (publishes & promotes) in particular channels, and hopes to pay for (or justify) their effort in a particular way.
Efficient internet research is infused with an understanding of who publishes, where and why.
Before information reaches the consumer, it passes through a vetting which organizes and filters both the quality and the presentation style of the information. Let us call these systems. The FAQ is a pivotal piece of a system that may start with a post to a mailing list or newsgroup, involves the vetting of the FAQ maintainer, then proceeds to an FAQ archive then to the end consumer. The webpage is published by someone who has justified their time and expense, is indexed by a search engine or definitive-topic-website or webring or what have you, and then is found and read by the end consumer. The internet has many such systems.
Each system again defines many of the traits of the resulting information. FAQs are semi-authoritative, collaborative pieces, often dense and factual. Private mailing lists are sometimes more informative, discussive, as well as serving as a notice board. Newsgroups involve far less natural vetting and quality control, but excel in distributing popular volume resources like graphics. Search engines don't vett, but can be searched.
Each system reinforces the uniqueness it brings to the whole internet. When I blindly declare "Information Clumps" at the start of this FAQ, I am really describing a trend whereby certain information accumulates in a particular location, others out of self-interest add to the pile, and further information reinforces both the logic and uniqueness of that pile of information.
It is just a short jump from this to understanding how FAQ archives grow but maintain a good quality, how the grand internet search engines began to lose value about 15 months ago then recently began regaining a position of strength, and how ftp archives still exist for many computer topics.
The internal logic to the organization of information is based on simple principles. It defines the environment within which we strive to improve the internet as an effective information resource. We take this understanding and build sophisticated expectations about what kind of information rests at which format.
Further Reading: Searching the Web: Strategy (spireproject.com/webpage.htm#5)
Make your browser work for you. All browsers allow you to open multiple windows panes. Open a few and send them off in different directions fetching information. You do not have to wait for each page to return to you before you read. With a little practice, you can juggle four window panes, collecting information from different tools, following different trains of thoughts, reading your way through four websites as they are downloaded.
The technique is a little like reading four books at once. It certainly keeps your mind nimble. Worked successfully, multiple windows will double the speed of searching and free you from the speed of your internet connection.
Three technical tips are involved. Firstly, a second window pane is opened by selecting File : New : New Window. The shortcut key for this Control+N. Secondly, in Microsoft Explorer, depressing your shift key as you click a link will open the distant file in a new window. In Netscape, depress the control button as you click a link. Thirdly, if you are running windows, the Alt + Tab button jumps between window panes.
Taken together you can read down a page, find something interesting, shift+click a link, continue reading the original page, then flip over to reading the second page in a new window.
Keep in mind, juggling windows is difficult and requires practice. If you do this in public, be prepared to lose novice surfers who are not ready to use more than one window.
Bookmarks are a fine tool for beginners to build. It is not, however, the best organization of tools for a searcher. One of the roles of the Spire Project has been the construction of a far more effective tool, based on having the more common search tools and supporting information close together, on your own computer.
Beyond being a plug for you to look at our free shareware SpireProject.zip (spireproject.com/spire_latest_version.zip) and single-page shortcut Spire Project Light" (spireproject.com/spir.htm), there is a serious issue here.
If you are familiar with the use of search engines - and you have fast access to the search box for the search engines - you no longer need the Urls for specific resources. With a name, you can always quickly locate a page. Besides, Urls change. Far better to just keep a list of resources by name.
At the start of this FAQ, we mentioned a searcher knows where to find information.
"Knowing of specific resources is helpful. Knowing the tools to help you find resources, the meta-resources, is vital."
Fast access to information resources is valuable. Fast access to the tools to find information is critical. Build your launch pages with these tools in mind.
Pharaoh: There is mutiny afoot. I must kill these insolent heretics.
Shakh: Good Idea. So who is involved?
Pharaoh: I don't know. You must find this out.
Shakh: Find out what?
Pharaoh: Who my enemies are, of course.
Pharaoh: People who want me dead.
Shakh: But not those who want a better ruler...
Pharaoh: No not them.
Shakh: What about the ones that want a better ruler, and would not mind you dead.
Pharaoh: That sounds like everyone.
Shakh: And those that want you dead but would never do anything about it.
Pharaoh: Well, so long as they don't help anyone else.
Shakh: Then you just want the ones who will try to kill you.
Shakh: Good. Now we know exactly what we are searching for. We are seeking those who will try to kill you. I shall straight away investigate.
Napoleon was an expert tactician, except at Waterloo. The recreation of past battles is not a favorite pastime of mine but is an exciting topic all the same. The battle terrain was set. The troops have known abilities and limitations. The movement and direction of the army units is your responsibility. Do you have the strategy involved?
Early in his career in an important fight against the Prussians, Napoleon employed a dramatic tactic where he initially held an important hill in the center of the battlefield, then surrendered the hill to the Prussians. The Prussians, confident at this stage, marched the majority of their army around the hill to right, between the hill and a lake, to push the fight on to Napoleon. Napoleon, however, retook the hill with a costly attack up the hill by some of his best units. Success left him in control of the high ground, much of the Prussian army below, moving between the hill and the lake. Unable to dislodge Napoleon from the hill a second time, and unable to withdraw the army from their exposed position, Napoleon pushed on to defeat the Prussians most decisively.
The armies were almost evenly matched prior to this conflict and success seemed unlikely. An average general would have fought in a bland way, retreating or perhaps fighting to a stalemate. Napoleon inflicted a decisive defeat. Such generalship goes beyond technical skill to encompass a vision, a strategy, an art.
If I have not been careful, I will have presented searching as shopping in a supermarket. The goods are in a large store but there is a decent enough structure to find it. Third aisle for baby food. Go there and look around.
Of course, we have discussed two further types of search improvements.
There is the skills around properly asking questions. You want a question which accurately describes what you are looking for but you also want the question to be framed in a way which the resources can answer.
There is also the awareness of where information SHOULD be. If you know what kinds of information exist and you ruminate long enough on the likely motivations of publishing, we can make some fairly detailed judgements on the whereabouts of the answers you are looking for.
There is further skill in dealing with the technical difficulty of information overload. You have limited time and limited resources. Finding information is often a hit or miss affair, so there is an art to selecting the right words to search, the right Boolean prefixes to attach to search terms, the right search tactics to employ to get the most out of each situation.
For much of this, you need only experience. If you know in advance a skilled searcher can handle the task of sifting reams of data for useful information, then you can focus on how its done, practice, and learn. The search technology itself is simple.
The trouble lies in retrieving from databases with far too much information for simple word selection. It also flares when you are dealing with databases charging up from $2 a minute and an additional cost per item retrieved. You decide very quickly to get good at searching once you receive a bill for $200 of irrelevant information.
The simplest solution to this difficulty is to practice. You will find all research libraries provide access to slightly older articles through CD-ROM databases. Search these to hone your skills.
I saw a small book on search techniques from an early course in my state library - but it is very basic. Most librarians build experience in using search systems either internally, or through a series of courses given by travelling database officers like the periodic training by Dialog-Insearch. These are expensive, but include some free time searching the expensive databases (no, they don't let you take information back with you).
Now, there must be something else I can share with you on this topic. First, learn something about how the databases are built in the first place. It helps if you know what an inverted text database looks like.
Second, something personal about technique... I always find the uglier the search query, the better the result. Honestly. A search combining numerous elements improves your chances of getting it right.
Third, I always try to change my search techniques to match the medium. I am likely to be more careful of broad searches of expensive database, where as free databases often lead me to gather 50 articles, then weeding them out by hand. (most CD-ROMs allow you to select only the ones you want). Always bring a 3.5'' floppy with you when visiting a library on the of-chance you want to download and look at results another time.
Fourth, I almost always find the initial challenge is in locating those specific terms that appear in 80% of the documents that interest you. When searching the internet for information about government use of the web, the specific terms required were government and publishing (not even government publish was close) All other search terms gave far to much garbage. Yes, of course, being an expert in a particular field is an edge in already knowing these special terms.
There are two escape hatches here. If you can find one or two articles that interest you, often you can browse these articles for those special words. Sometimes even, the descriptors of an interesting article will give you a specific subject heading. I've heard this technique called the "Pearl Development Technique" but I just think of it as a good idea. The second escape hatch is the use of free databases to prepare you for going online. If you have ready access to a CD-ROM database, search this first - get the right search words on the free databases, then go online.
Oh, of course, there is also the issue of just asking someone involved for the proper words. I like to ask my clients if they know what words are likely to be used. It's not a mark of an amateur to be asked, by the way.
A couple of side issues
1) Keep an eye on the type of document you are searching. If you want full text - don't go looking in bibliography databases. More to the point, don't start word searching databases with really big files without using the proximity indicators and descriptive fields. I hated paying for that 20-page document which included all the words I was interested in - but on different pages.
2) Also, keep an eye on the quality of the documents you are retrieving. I know a search of newspapers sounds impressive, but they are rarely capable of explaining anything in depth and are notorious at being advertorials. I try to keep newsprint for locating experts - not for information. I have also been trapped by obscure magazines with appealing articles, only to learn the magazine is one of a large number of very basic business magazines which use fillers or just doesn't like to pay for good journalism. A single article of 5 pages from Scientific American blows 20 small fillers out of the water. In fact the length of an article is a hint of depth.
Oh, if you are looking for some really good books on this issue, try the manuals Dialog sends you to start, look for text databases in you library, then proceed to one of the search books recommended at the end of our 'research as a discipline' article.
Basic Techniques to research change slowly, though the technology is improving and specific information resources are in rapid flux. It makes for interesting times.
So many resources. So many techniques. Its strange to have written down so very much that is dull and tiring yet get it right. You simply must muddle through all those links to get a decent result.
Yet the end result is to portray searching as an intensely dull experience. We have very few choices. The information exists in certain clearly marked places. We merely need collect it.
If we are not careful we will present you the idea that searching is more like shopping in a supermarket. The goods are in a large store but there is a decent enough structure to find it. Third aisle for baby food. Go there and look around.
Actually, this is the general approach to searching. There is no art, no talent, just skill and knowledge of the technology. Want a webpage on dogs - go to Yahoo and type in dogs. Want a telephone number - take out the white pages and remember the alphabet. Want a book and you are near the library, walk in and ask a librarian. Alternatively, walk in and type a few words in the library book database.
But there is more - so very much more. And all of this makes for exceptional searching.
Let's look at an example. We want information on how to improve the schooling of your exceptionally gifted child. A simple request. What do we do?
The art is a kind of magic, of choosing just the right words at the right times, and in phrasing your request for information in a way that tightly describes your interest without removing information that should interest you. The art of searching relies heavily on an understanding of what is possible within a given system. Much of this, you guessed it, involves creative visualizing.
Searching is an attitude. It is a way of looking at the world, and at information, quite distinct from the norm. Statistics are mentioned on TV and you subconsciously weigh the value. You listen to experts and wonder who pays them, and so where the potential purpose bias could come from. Searching is an attitude with little tolerance for spin, puffery or questionable interpretation of statistics.
Searching can be a very negative attitude - and this is our last lesson. Search with a critical mind, but also know at some point you must say enough. Enough searching, it is time to make a decision. This line is not defeat, but acceptance that decisions are made on incomplete information. Make your decision when you are ready.
Shakh admired the art on the wall. Meaning within meaning. The divine representations stood offering the pharaoh recognition. In exchange the pharaoh offered a just reign. The scene worked well. Such work was one of the few ways the pharaoh could communicate with the gods.
Yet there were other layers to the picture. The gods were depicted as pleased with the work of the pharaoh. Their recognition was a reward for the years of ruling Egypt.
There, further in the picture, was reference to the accomplishments of the pharaoh. Much of the writing was dictated by tradition, and the individual scribes were all instructed in the tale, so meaning was particularly important in what was different from other tombs. It was the small differences that made this work unique, that elevated the work from that suitable for any important person to that fit for a king. Birth in a village close to the Nile. References to the pharaoh's re-conquest of Nubia. The special position of Horus, the falcon god.
Then there was the technology. Sparkling stars on blue covered the ceiling. This was a new development, unseen before in crypt or building. It had a pleasant effect, expanding the space within the tomb, making it look larger than it really was.
And then there was the artistry to the carving. These were fine scribes, clean and precise. The work satisfied him well.
Walking out of the half-completed tomb, Shakh sighed, wiped the gathering sweat from his brow, then gave a small thought to the poor sap he used to work for. The old pharaoh had never learned information was power, thought Shakh, sighing regally.
Acknowledgements: I would like to thank my wife Fiona, whom I love and cherish dearly. The Spire Project is a great effort several years in the making. I trust you enjoyed the results.
David Novak - firstname.lastname@example.org - SpireProject.com and SpireProject.co.uk
Copyright (c) 1998-2001 by David Novak, all rights reserved. This FAQ may be posted to any USENET newsgroup, on-line service, website, or BBS as long as it is posted unaltered in its entirety including this copyright statement. This FAQ may not be included in commercial collections or compilations without express permission from the author. Please post permission requests to email@example.com
|The Spire Project - a better way to find information.|
Like this? Download the entire website as free shareware and receive our monthly update notice.
Article List | Project Background | Feedback | SpireProject.com | SpireProject.co.uk. Copyright©David Novak 2001.