From: david@spireproject.com (David Novak) Newsgroups: alt.internet.research,sci.research,alt.answers,sci.answers,news.answers Subject: Information Research FAQ v.4.7 (Part 1/6) Followup-To: poster Approved: news-answers-request@mit.edu Summary: Information Research FAQ: Resources, Tools & Training Archive-name: internet/info-research-faq/part1 Posting-Frequency: monthly Last-modified: April 2002 URL: http://spireproject.com Copyright: (c) 2001 David Novak Maintainer: David Novak ![]()
_______________________________
|
| Many of us unwittingly digest great amounts of information in the course of a day. Our information needs are more modest and usually repetitive. When we have questions, we reach for a small collection of preferred information sources close at hand with a collection of assessments as to what is credible and trusted. As a child, these sources include the school library, an encyclopedia and parents. All the sources are trusted. As an adult, these sources include the state library, the newspaper, bookstores and current magazines. Adults understand truth has become a little more relative, but when the evening news declares presidential hopeful George W Bush is ahead by 3% (on a sample of 707) we slip into thinking he is leading. There is more to information literacy. It is, after all, a profession. There are tools you know nothing about and techniques you have never heard of. There is a specialized vocabulary just made to confuse you. Research, or rather information research (to distinguish it from lab-coat style research) is so very much more involved. Yet there is great simplicity to research too. Just under the murky mist of confusing resources rests a solid platform to stand on. In any one field there are just a handful of databases, directories and periodicals to consider. After decades of library and information industry evolution, clearly valuable sources have already floated to the top, monopolizing their respective fields. Most cities have just one or two primary newspapers. Large industries like book publishing have few book databases and a handful of primary book distributors. Enters the internet: not so much a change of information as a revolution in access to information. Previously you could justify having just a handful of preferred information sources because these were the sources easily available. Today, and the future, is filled with information close at hand. We are dropped into a morass of competing information just waiting to capture our attention, and strain both our capacity to absorb information and our capacity to understand the differences between sources. A great segment of our community will fall back to tried and true information sources they grew up with: state library, bookstore, local newspaper. The better alternative sources will be ignored for no particular reason. The rush of the information revolution will push past them. They will only hear of changes when their information needs suddenly change - and they are confronted with a vast collection of unfamiliar options, and struggle with understanding what sources they need. A smaller segment of our community, by virtue of frequently tackling questions best answered with unfamiliar sources, will be driven to understand the information world: to become truly information literate. There is another story here too. The way our society handles information is undergoing some very fascinating changes. Any predictions for the future should acknowledge the tension and flow of information in our society. Take, for example, the vast surplus of information emerging on the internet, and the convulsions of the commercial information industry in response. Rather than focusing on how information is organized, we can also focus on how information becomes organized. The who, where and why of information, the sociological perspective, adds meaning to the phrase "information revolution". It was another warm day. The young Egyptian boy strode purposely out the gate towards the river. The Nile was low this time of year. Very abundant with fish and bird life. With luck, Shakh would return at sunset with food for the pantry. Mother would be pleased with that. Shakh knew fishing had changed little over the last hundred years. The walls of his family's ancestral home had just such a scene of his grandfather fishing on the Nile from a small reed boat. The thinly carved relief was complete with spear, fish, ducks and Shakh's grandmother nearby holding lotus flowers. Shakh stopped by old-man Jacob on his short walk to the bank of the Nile. He liked the old trader. Years ago Jacob had traveled to the Levant and brought back many strange artifacts. Some even came as far a field as the Harrapan people who were said to live beyond Sheba, across the waves, some three years journey away. He especially liked the small black head carved in a style so unlike anything else Shakh had seen. The Harrapan people lived on the banks of the great Indus river in modern-day Pakistan. A great civilization almost on par with the Sumerians and the more distant Egyptians, very little remains today. They built vast cities of clay brick with rectangular city blocks. They built drains, public toilets and state granaries. They were the first to populate the Indus river valley. (see www.harappa.com/indus2/index.html) Little remains. The Harrapan civilization fell with the arrival of the Aryan race and the intervening millennia treated their past poorly. The arrival of Islam erased much of their history as did the shifting Indus river itself. The British used the bricks from one ancient city in the construction of a great railway. Only today are the archaeological digs once again unearthing the past. I search for Harrapa on the internet. Nothing special, just type 'Harrapa' into any of the popular search engines and I uncover harrapa.com, a website devoted to some recent information from these digs. Looks good. Pictures of ancient pots. Children's toys. A map to an ancient city. Of course, Shakh would have known of the Harrapan civilization. While it is uncertain ancient Egyptian ever visited in person, goods and rumors traveled far from trader to trader. Ancient Egyptians, while not accomplished conquerors abroad, did travel and mix with distant peoples. Shakh lived in a civilization centuries distant from us, yet both you and Shakh know a similar amount about the Harrapan civilization. The intervening years have not made everything clear. Even the information revolution has not changed the facts. Both you and Shakh have just a single source of information about the Harrapan civilization. You have the pictures on harrapa.com and our short excerpt here. Shakh has the old-man's art object to look at, the old-man's myth of a civilization beyond the waves. This story carves the act of searching in deep relief. Searching is a skill, a trade and to some a profession. It is also just a simple task of finding information - something we do every day, in so many ways, without any of the difficulties we will get into later in this FAQ. The difficulties only emerge when you want to do something spectacular. Should you wish to know something specific about the Harrapan civilization, or understand something contentious - then we require a greater degree of expertise and experience. The search becomes a challenging adventure in its own right. The Nile was always a slow river but three months out of the year it burst its banks and flooded the fields, bringing life on the banks of the Nile to a complete halt. For these three months Shakh's family would move into the ancestral home in the streets surrounding the great pyramids. It was an old home, centuries old. Well suited to their needs with a storeroom for food, separate rooms for the parents, and an active social life in close proximity to others. In many ways, this was the most exciting time for young Shakh. For the rest of the year he lived in relative isolation in the village by the Nile. For these three months, he lived in a city, bustling with activity, construction and recreation. Shakh had expected this year to be like the last but his father secured Shakh an important position - he would be in training to become a scribe. Father had grand plans for young Shakh, plans that extended far beyond life as a scribe. What's more, with luck and further prosperity, Shakh's father had the means to secure his further advance. Much of ancient Egypt is available for us to read off the walls of the many remaining buildings. They were not a literate nation, yet were able to adorn almost everything with writing and pictures. They lived in the most enlightened society of the day. Years later, Egypt would gift the fledgling Hellenic state a full third of their Greek vocabulary. This is part of the reason for such an interest in travelling to Egypt. It is the visual symbols that inform us and draw us in so deeply. Standing before the great religious statues, we begin to feel how it was to live and work in that day. To run amok as a young student, waiting for the Nile to subside once again. Yet, there is much more to knowing ancient Egypt than just the monuments and wall reliefs. Years of study has recovered their lost language of hieroglyphs. Years of archaeology has unearthed their daily lives. History and Archaeology are fine examples of searching in practice. Both fields struggle openly with the bias and uncertainty each new fact brings forth. Malta is a small island off the coast of Sicily, close to Tunisia. Should evidence emerge of ancient Egyptians living on Malta, what does it mean? Was Malta an Egyptian conquest or an occasional station for their fishing fleet? This uncertainty applies to all information, in all situations. One of the first events for the new regime in Pakistan was to acknowledge that important national statistics, like the national GDP figures, had been fudged to a serious and significant degree. Important national statistics are not intrinsically true because of their source. This is not a problem solely of underdeveloped nations. Rumor suggests that during the height of Singapore's land value bubble their national figures were unreliable too. Searching is a skill and an attitude. In this FAQ we progressively unfold the way information is found. Initially, let's cover a simple way to find information; a structured approach to an everyday problem. Afterwards, we shall look more closely, and with more complexity, at the world of information. |
| Searching is simple. It starts with a question. It ends with an answer. Everything between is searching. Much of it has to do with the tools you use. Select the right tool and you can get to the answer almost by default. Luckily, for any given topic there tends to be just a handful of must-use tools. For more complicated questions, there are usually plenty of people to ask for assistance. The answers you are seeking will be found in a selection of different formats. In this I mean books, articles, interviews, and more. This is a very convenient concept and forms the foundation to all our work both here and in the Spire Project. Few research tools cover more than a single format; those that do, tend to cover each format poorly. Start a search by selecting the specific format you are seeking. Then, select your preferred search tool from a small collection specific to that format. To get the information, simply follow through and read, search or interview. Everything follows naturally. There are just a few formats to consider. Books . . . . . Dense, factual, comprehensive and a minimum of 6 months to a year old. Articles . . . . . Shorter than books but focused on one topic. News . . . . . Short and shallow. Immediate. Statistics . . . . . Factual. More reliable. Theses . . . . . Very thick. Deeply researched. Esoteric. Webpages . . . . . Immediate, mixed quality, with limited factual support. Interviews . . . . . Immediate, varied quality, partly digested. Each format has a selection of simple tools to find information. Many of these tools will be on the internet - which may mean easily accessible. A word of caution: try not to confuse search tools that happen to be on the internet with searching internet information. The Amazon.com book catalogue is a search tool useful in locating books. Though on the web, searching Amazon is part of a book search, not a web search. A search of the Reuters newswire is a news search, not a web search, even though Reuters releases current news on the web. Each format should remain distinct in your mind. Tools to Find Books 1) Some books, particularly classics, are free on the internet through efforts like Project Gutenberg. 2) Libraries allow you to read books. Library catalogues are frequently online. 3) The largest libraries, like the Library of Congress and the British Library, list millions of books in their online catalogues. 4) Most currently available 'in print' books are listed in national Books-in-Print databases. 5) Each country maintains a special government publication database. 6) Lastly, online bookstore catalogues like that of Barnes & Noble, list a sizeable portion of current in-print books. Tools to Find Webpages 1) Global search engines index hundreds of millions of webpages for free text searching. Consider Altavista and All-the-Web. 2) Global directories list resources by category. Consider Yahoo or the Open Directory Project. 3) Regional search engines and directories focus more tightly on regionally important topics. 4) Lastly, more specialized search tools, from search engines which focus on specific topics (like maths or government webpages), services which link you to important topic-specific websites, and services which manually review websites, all can take you further. Tools to Find News 1) Current news is found in newspapers and the evening news. News clips can be delivered electronically, or purchased through specialist news clipping services. 2) Newswires redistribute regional news to a larger audience. Many newswires release their text news free online. 3) Specialized search engines like NewsBlip and TotalNews aggregate current online news. 4) State libraries archive past copies of regional papers. 4) Individual newspapers maintain libraries of previous articles. Many are available as commercial databases. 5) Larger commercial databases unite the news from many prominent newspapers. These databases of news articles stretch back many years. This story is repeated with all the formats information comes in. To drum this in with repetition, searching starts with a question. Select the format (book, news or webpage). Next, select one or more tools from our short list of search tools for that format. Want to understand the lifecycle of the spider? A book should prove useful. Let's look at either our local library book catalogue or a big commercial bookstore catalogue like Barnes & Noble (bn.com). Search. Read. Voila, the lifecycle of the spider. If searching appears a little boring at this point, you have not visited a library recently. The excitement comes in finding the information. The rest is dull indeed. The information revolution washes over us, picks us up and pushes us forward like so much driftwood. From now on our lives will forever be awash with information. We will eat it. Breathe it. Live in it. Drown in it. Some of us will even learn to live for it. Those most capable will have the skills to search, sift and sort information. The information revolution is not about primary research, lab coats and discovery. It is about a surplus of information. The searching we have just discussed is not a particularly creative process. Simple searching is not sufficient to deal with the great tide of information moving against us. But then, simple searching lacks finesse. Simple searching is, well, simple. Searching is one of those most delightful tasks where skill is everything. A search without talent will give you just a taste. Like pottery perhaps. Anyone can get something but only an expert can accomplish wonders. Quality information, reliable answers, effective coverage of resources; it takes skill to get to this level. Advances in technology and the delivery of search assistance has made searching easier than ever before. Many search tasks can be accomplished without any experience. With more challenging questions a novice will get results - results they will be proud of. But not results they should be proud of. With experience, you will recognize how much more is possible. Let's proceed by adding a little more complexity. Your value as a searcher is directly related to the number of resources you can reach for quickly, and your skill at phrasing a research question. Consequently, as a searcher, you will work hard at building ready access to a range of resources. You also work hard at understanding the special characteristics of collections of information. The technical name for complex searching is 'Information Research'. I prefer to think of information research as an effort to locate answers, efficiently. Information Research is not vague browsing of available information for something that interests you. It is not browsing the library bookshelf or reading the newspaper, nor is it internet surfing. Information research is searching with a purpose ... and it is hard work. Research is also an art form. The skills, tools, and resources we work with are only the canvass and paints of an artist. Research extends from commercial, legal, reporting, through the skills of interviewing, database searching, and research analysis using books, articles, experts and patents. Research is so large a field, involving so many skills, tools and resources, you will quickly find you do not wish to learn it all. At the heart of information research lies a simple motto: "Someone, somewhere, probably knows the answer." To quote The Information Broker's Handbook (Sue Rugge and Alfred Glossbrenner): "As information brokers, we shouldn't consider ourselves capable of providing solutions... What we 'can' provide, and what sets a really good information broker apart from the rest, are resources. We can provide the client with the kinds of information he or she needs ... that make it possible for individuals to solve their problems." Let this sink in. We are not experts in the field we are researching. Collecting information on the moons of Jupiter? Do not pretend to be an astronomer. We are only experts at the tools for gathering information. A Quick Introduction to Effective Searching. 1) Searchers work hard to properly frame the question. 2) Searchers know the technology, know where to look. 3) Searchers know you can ask. Step One: Properly Frame the Question The preparation of your question is critical. There is a galaxy of difference between a young student asking, "I am interested in trees", and a specific, attainable question like "Where would I find a tree surgeon I can talk to?" The information sphere is very large and rather confusing. Each item of information has aspects of authenticity, accuracy, reliability, and bias. Information comes in many formats: interviews, books, articles, statistics. We learn about information from many sources: literature, discussion, resource lists, experience. There are also personal issues: budget, time, depth and purpose. With all this to think about, we must be very careful about each question we ask. This issue is vital once we start an article search, and can easily mean the difference between 5 concise articles, and hundreds of general articles. The essence of our question is the manner with which we approach the information sphere. The question directs our efforts. One key is to treat searching as an art, much like painting or photography. The true mark of an artist, and the primary step wanna-be artists miss, is visualizing what you want before you begin. When searching, sit down and visualize what a successful search would look like in this situation. How many pages? How many documents? What kind of authors and what kind of quality of document? Go through the whole gamut of different types of research tools and describe it. Would a simple three-line newspaper article be a success? Would a 20-year-old dissertation be acceptable? Would a short conversation with an expert suffice? Would all three together suffice? (This approach works exceptionally well with internet research too.) If you can phrase a question in a way that lends itself to your resources, you are far more likely to get the answers desired. Oddly, this often means you are asking for places where the information resides rather than asking directly for the information. A novice starts with a question like, "What can I do for my exceptional child?" You should rephrase this question immediately. "What resources will help me help my exceptional child." These are both valid questions but the second question has a distinct answer - the first is far too vague. Other questions could be "What are other parents doing for their exceptional child?" or "Who can help advise me on how to teach my exceptional child." Now we shape the question to get precise answers. "Where do I find a definitive list of associations?" (or a search for "+association +directory") works much better than, "What association works with exceptional children?" What about, "Who would know of associations for exception children?" and, "Are there pamphlets of advice for parents of exceptional children?" and, "What umbrella organizations/specialist libraries exist for exceptional children?" Questions are not right or wrong, just better or worse at illuminating certain aspects of the answer. Make sure your questions illuminate something useful. There are ways to frame questions for commercial databases, for research assistance, for interviews, for getting the truth from to your children. Your skill in phrasing the question has a lot to do with the results. Poor questions tend to come back and haunt us later when you miss relevant information. Set aside ample time to refresh and reframe your questions. Step Two: Know the Technology, Know Where to Look. Research rests on understanding the technology and an awareness of the resources. In the example above, a directory of associations does exist. Here in Australia it is the "Directory of Australian Associations", found in most important Australian libraries. The Australian "Department of Education" has a major interest in promoting exceptional children. In Western Australia, Infolink, a community information service, should have a record of major community groups for exceptional students. I have no direct knowledge of umbrella organizations or specialist libraries, though I expect both the education department and Infolink would. A quick search of some large libraries may help us find some of the pamphlets. Knowing of specific resources is helpful. It is great if you live next door to the president of Mensa. You have easy access to someone knowledgeable, able to give his or her take on the situation. Knowing the tools to help you find resources, the meta-resources, is vital. So what if we do not know exceptional students come under the Department of Education. Do we know who to ask to find the government department involved? If you do not know of the directory of associations, who or where would you look for one? Being unfamiliar with meta-resources is a serious handicap - you will find yourself searching hours for something a professional would do on the phone while drinking coffee. Keep in mind the Spire Project is dedicated to providing you some of this experience. Our web articles should suggest directions to look. But there are limits to how we can help. At some point you simply must sit down with the Kompass Directory, or the Gale Directory of Databases, or the Australian Bureau of Statistics library, and become familiar with getting to all the relevant information. Another must, for all searching, is experience searching electronic databases with complex research queries - a difficult task only made better with practice. As a general rule, if you don't use Fields, Proximity and Boolean search terms, you are doing it wrong. Most people do it wrong. Step Three: Know You Can Ask. There is very little mystery about professional research. Lots of people are experienced in different aspects of this field. My personal weak point is in direct interviewing where as I am a pioneer in secondary resource research. This is OK. In fact I use this liberally to determine the skill of professional researchers - do they know their own limits? The field is much too large to be an expert in all its aspects. The positive site to this is many people welcome requests for help. I enjoy asking librarians questions. I also ask my customers, my suppliers and other professional researchers. Never get caught in the trap of feeling you know what to do. The joy in this profession is that most people do not expect you to be an expert in their field, just an expert in your field: particularly the meta-resources. Even if it requires a polite reminder, customers will appreciate you asking them for likely keywords in difficult searches. I always make a habit of asking librarians if I am missing something. A librarian is always fluent in their collections and I frequently locate real gems this way. (As an example, my state library arranges computer books in two sets, one Dewey and another in an alternative structure. Who would have guessed?) Especially if you are just a student, always keep your ears open. You will frequently find yourself in the presence of some expert in some facet of research telling you something you already know. Consider carefully before you interject... Your expert may be about to explain something new to you. Information research is a dedication to learning. At its heart is a collection of specific research skills, an awareness of research tools, and a gifted mind. - Oh, and a large amount of coffee. Without knowledge of and access to relevant research-worthy resources, your research will be severely limited and doubtful. This is why much of your work becoming an effective researcher involves learning about the resources and meta-resources for your field. Much of our work in the Spire Project is drawing your attention to relevant resources. Before we progress to specific resources for specific formats (books, webpages, news), let us attack head on the role of the internet in information research. This should surprise you. |
As Shakh became more proficient with writing, father wrote more frequently of the family deity. Horus, the falcon god, had long watched over his family. Horus sees all, his father would write, and even across the many miles separating you from us, Horus will watch over you and keep you close. It was a great comfort to Shakh to have the family deity looking after him. Shakh too devoted himself to a life of watching and knowing. We have discussed how information comes packaged in certain standardized formats like books, articles or news clips. Each format has particular qualities and standards that reflect the way the information is prepared. For example books are dense, factual, comprehensive and a minimum of 6 months to a year old. So how can we apply this newfound wisdom to the internet? Let's start at the beginning. The internet is an inexpensive and pervasive system for the delivery of data. It is also the medium of a dramatic shift in the way we access information. A (1) dramatic drop in the cost of publishing is fuelling (2) the liberation of information from previously closed systems, leading to (3) an emergence of alternative funding for certain public resources and (4) an eagerly awaited 'direct to consumer' commercial information industry. The first mental knot to untie is the separation of internet resources into distinct formats. Electronic books share most of the qualities of books published on paper. News stories found on the web share all of the qualities of news in your local newspaper. The fact they are electronic or appear as webpages has nothing to do with it. News is news. Electronic books are almost books. But if online news is news, and online books are almost books, and both are not internet formats, what is an internet format? The search-by-format method is a concept to simplify and understand the many information resources which exist in the world. The concept is only as valuable as it is successful at enlightening us. As to the internet, we have more to learn, but could safely divide the internet into several formats at this time, perhaps webpages, online discussion and ftp resources. Yet this is largely superficial. The real value comes from understanding the qualities of different types of webpages. We shall divide the webpage format further. Must we really learn this? You would be pardoned for equating searching and the internet. Much of the hype surrounding internet search tools builds the illusion that the skill of searching can somehow be distilled computationally then delivered to you electronically. Through the wonders of modern science, you can have the best information at your finger tips without having learn anything of search technology. This is a pervasive lie (or marketing fiction). The electronic research industry has been around for decades and has worked on this problem for some time. No upstart internet guru has invented a technique to suddenly transform the search process. Such thinking would work in section two (Searching is Easy) but is the first illusion we must shatter for you to progress. Case in point, Lycos and All-the-Web search engines use the same database of webpages. This database is growing rapidly, it stood at 350,000,000 webpages in June 2000 and hopes to reach one billion webpages by the end of 2001. It stands as a grand achievement in organization, right? Wrong. Years ago I was using a unified database of news called Global Textline (no longer available but replaced by others). It had an astounding four billion news articles available for advanced text searching! Four billion news items, representing many years of news from all over the world. This was superficially 10 times the size of the current All-the-Web search engine. No, the internet does not even hold the record for being the largest information field. Oh, it will surely surpass the quantity of commercial information, and superficially we could say it may already have achieved this. But the internet is not a new medium for information research. It is emerging as a new resource, not a new phenomenon. The internet is a new medium for business - most businesses have never incorporated the immediacy or global nature of internet involvement, so considerable rethinking is required. The internet is a new medium for publishing for almost all of us; very few of us published electronically before the internet emerged. The internet is NOT a new medium for research. Information researchers have been working electronically for years. The internet is just a new resource we can reach for with strengths, weaknesses and peculiar traits we must appreciate. By way of an example, let us compare Link Analysis as used in Google and Raging (of Altavista) with the process of editorial vetting as used in scientific journals. Through the magic of link analysis, we can make certain assumptions about the value of a webpage by adding up the number of other pages linking to that page. In its simplest form, webpages with at least 100 inbound links from other websites are judged to be quality, valuable resources. A webpage without any inbound links has the suspicion of being of poorer quality. After all, no one has thought it valuable enough to add a link to their further resources page. This logic has some serious shortcomings. Firstly, the process rewards long-term projects that have been online long enough to earn links. A brilliant new webpage would have few links - yet. It would be ranked poorly, undeservedly. Secondly, link analysis rewards websites over webpages. The pages with the most links are often homepages. Rating homepages over second level webpages works at odds to keyword searching. Our keywords will be found in specific, perhaps second-tier webpages. Links go to the top level. Thirdly, link analysis is a mass market, popular technique. You are banking on the intellectual finesse of a mass of mindless computer users much like yourself. It is the same kind of popular democratic selection that votes B-grade actors into the presidency. Let's contrast this with the process of editorial vetting used in scientific journals. Each article is reviewed by a selection of knowledgeable peers who understand the topic is great depth. Each article is further improved by the editing of the journal editors, and by self-editing, for there is great competition and prestige at stake. Only a handful of the many submissions are judged worthy and appear in the printed journal. Success places the successful in the standard of record; stamped with an external statement of truth and importance. Of course, the logic of editorial vetting also has shortcomings. Firstly, the process is time and effort intensive. Many of the most important journals will delay six months or more between submission and publication. In our digital era this is increasingly unacceptable. Secondly, the number of submissions accepted are at odds with the pace of development. So much more happens in the world than can be digested in this manner. Thirdly, editorial vetting supports the clannish behavior leveled against the upper echelons of science. New and novel developments have difficulty floating to the top if the peer review process should not be open to new ideas. If link analysis is popular and democratic, editorial vetting is elitist and autocratic. Both approaches have pros and cons. Once you have absorbed the drama between link analysis and editorial vetting, please do not retain the belief that your search needs will be completely solved for you. Searching is a complex, overgrown garden and its time to get your hands dirty. So what does the internet have to do with searching? The internet changes searching in two ways. Firstly, the webpage is a new format to contend with. "Webpages are often of unknown age, of only guessed at quality and potentially the easiest information to retrieve. There are many points of entry to web resources but search tools differ. Try to match your search tool to your question." (See spireproject.com/webpage.htm) The internet is also a conduit to many of the pre-existing tools for searching other formats (books, news, interviews). With an internet connection, we can reach database retailers and many commercial quality databases like LOCOC, ERIC, MOCAT and AGIP directly from the source. We can also remotely search the catalogue of most libraries in the world. These are not new resources, just new ways to reach them. In this day of interconnectivity and change, it is too tempting to declare the information industry is in rapid flux. Everything I have learned suggests this is not so. There are some changes associated with new channels but by and large the process of searching for information remains the same. Let's look briefly at news as an example. News articles are written by the reporter, sold to international newswires which then distribute these stories to interested newspapers and news channels, that incorporate the news into your newspaper or evening TV news. News would also be added to commercial databases of past news. These databases are then provided to database retailers like Dialog or Lexis-Nexis who sell occasional access to you. With the internet, newswires have also provided their text news to online sites. Text news is thus available for you to browse or search. I draw your attention to several facts. The fundamental nature of the industry has not changed. Journalists and newswires still impart upon the news the same nature as before. It is short, shallow, immediate. It is created to journalistic standards. If you wish to search past news, you must still reach for the commercial database, most likely through a database retailer. Searching for news online only goes back two weeks at most. Lastly, to date only the text format for news is widely disseminated. Sometimes a couple of pictures are included but the visual news, as used in the evening news on TV, is sure to remain priced beyond public consumption. So what has changed? There is another venue for you to pick up the news. There are opportunities for new databases to be created, some of limited time (like totalnews.com - a database of current news on other websites). Little else has changed. The creation and dissemination of news remains pretty much as before the internet arrived. Let us look even more briefly at book publishing. Books are produced by authors, improved by editors, published by publishers, marketed by bookstores, then purchased by you. Today we have a couple of new online bookstores - and a large number of new old online bookstores (existing bookstores now selling online). We have a collection of free books online (largely classics like Shakespeare, which strangely, were immediately published as really inexpensive paperback classics available in airports everywhere). There are also a range of very useful commercial quality book databases which have become free to search online. I am thinking the government publication catalogues (MOCAT [US], AGIP [Australia] and Stationery Office Online Catalogue [UK]) and the online catalogues for the Library of Congress (LOCOC) and the British Library. Lastly, the online catalogue to the large bookstores like Barnes and Noble, Amazon and The Internet Bookshop (UK's WHSmith) can provide a free and fast database of books in print, though not as good as the commercial Books-in-Print databases. Of course, any local bookstore will offer to search books-in-print for you, so this is not as revolutionary as it might at first appear. In summary, we have a collection of recently discounted book databases we can more easily search, we have additional sites to buy books, and little else. The creation and dissemination of books remains pretty much as before the internet arrived. Has the book industry changed? Not really. The most remarkable change has been the emergence of group discussion online, the emergence of a new format for information (like the webpage) and the opportunities to connect faster to a whole range of pre-existing searchable resources. This is the reason why we discuss searching-by-format. Later, at the end of this FAQ, we return to this topic and show that the real revolution is not in resources or industry or search tools but a revolution in immediate access. Access, it turns out, enriches the art of searching. Pessimistically. On counterpoint, as an information resource, the internet can still be much too limited for many situations. If we are not careful, searching the internet becomes no better than browsing the shelf of your state library. What most impresses me about the internet is the promise of changes in the future. The internet as a system suggests radical improvements to the current decade-old systems that have attained their search-worthy status. What impresses me most are the improvements mostly still in the future, not yet proven, set to remain promising ventures for a time. This is not to say internet research can not be rewarding. In some fields like computer studies, the internet has already surpassed parity with books, articles and associations. Just when you will consult the internet as a research-worthy resource depends on cost, effort, and the quality of the information returned. This judgement call requires more than a little experience. Value is important. I sincerely hope we can suppress our enthusiasm for free information in favour of a truer appraisal of the value of information. Make no mistake, commercial information is brilliant. It is almost heresy to even compare commercial information with the results of a few hours on the internet. Internet Information Theory Let us agree the internet is great fun to surf but more challenging when you have a specific question in mind. To improve our search skills, we begin by understanding how information is arranged on the internet. Contrary to myth, information is not disorganized but rather organized very carefully along clear patterns. Many patterns are specific to the information format (text document, webpage, email message, printed article). Further patterns match the way we become aware of information, or are specific to the information systems (mailing list, FAQ, peer-reviewed journal). Your understanding of the strengths and weaknesses of each pattern, each format, each system, guides your search for information. We shall start by shattering the internet, and commenting on the many pieces. Three Definitions of the Internet Do be careful when using the word 'internet'. 1_ The internet is a physical network; more than a million computers continuously exchanging information. The internet allows us to transfer information around the world. 2_ The internet is a landscape of information available on almost every topic imaginable. This information appears almost chaotically distributed to the world but holds clear patterns. For instance, linking information together are various structures like government web links, search engines and FAQ documents. 3_ The internet is a community of 500+ million individuals. These are real people who choose to interact, discuss and share information online. In this example, let me just draw your attention to the way most of our research effort focuses on the second definition: a landscape of information. Much of the best information originates in the third definition: the internet is a community. Sometimes it is far more effective to ask real people than search the information cyberspace. What I just mentioned is not so important as the technique I just used. I broke the large seemingly chaotic system into smaller pieces: pieces that hopefully make more sense. Eventually, when we've made sense of the little bits, perhaps we can comment astutely on the big-picture. Information, transaction, entertainment There is a triad of functions to all online activity: Function - Activity - Unit ---------------------------------------- Information - Research - The Fact or Conclusion Exchange - Business - The Transaction Entertainment - Play - The Experience Each internet function grows at a different rate and moves in a different direction. The development of forums is firmly in the smallest segment dealing with information. This segment is quite poorly organized and confusing. The entertainment function in contrast is well financed and graphically innovative with clear, profitable opportunities. Much of the web is prepared with Exchange or Entertainment in mind. "Brochureware" (purely promotional webpages) is rarely required for research but is critical to securing a transaction. Entertainment related or just entertaining websites abound. Let us recognize just how few webpages are information & research related. My own experience suggests we are just beginning to see the movements towards profiting from providing information. Direct selling of information is still chaotic and unrewarding. Information Formats The way information is packaged has a great bearing on the content, quality and use of the information. This theme is evident throughout the work of the Spire Project, and is particularly applicable to internet information. Webpages, text files, software, email and database entries each have particular qualities. Each shapes, constrains and restricts the informative content. These particular qualities apply irrespective of the information involved. Books are dense, factual, a little old. Articles are short, sharp, more recent. News is puff, introductory, immediate. Each way the information is packaged, each format, presents the information to set standards. Information formats on the internet are the same. Webpages are graphical, technical to produce, and not easily updated. FAQs are easier to maintain, text only, and attract more peer review. Mailing lists are simpler still, text, short, immediate, very peer-reviewed, characterized by discussion and resource discovery. Newsgroups are characterized by extremely low costs, vulnerable to trashing, poorly managed. Email is simple use, one-to-one discussion. Let's look at books more closely. Books are created by authors who have something to write. Books are printed and marketed by Publishers to the bookstores that then provide it to the readers. Each facet of this process defines the resource. Books have quality, editorial vetting but minimal peer-review, marketable value and a potentially lengthy preparation time. When it comes to research, why look for a book when investigating digital money? Books would just have the wrong qualities - would present the information poorly. We need a more current format (digital money is a fast moving topic), and a more peer-reviewed format (books have editorial vetting but not intrinsic peer-review). Why not search for a mailing list, an FAQ, or an association website. These formats have qualities more appropriate to our question. Information Preparation Information flows also impress patterns on internet information. Most information is transplanted to the web - first created elsewhere. The source of information imparts as much pattern as the eventual format the information takes. Information may appear as a webpage, and conform to our expectations for all webpages but the information may have been prepared from the discussion on a mailing list - and thus enjoy a more topical, specific, timely and peer-reviewed quality. Let's look at FAQs. The best resource in the world on copyright law is the musings of a group of copyright lawyers who form the copyright mailing list. The copyright FAQ supported by this group is a logical document summarizing much of the discussion of this mailing list. FAQs are vetted by the news.answers team, then automatically mirrored around the world. From its origins in the mailing list, the FAQ is a peer-reviewed document, often full of links to further resources, topical, knowledgeable and factual. As an FAQ, the document is not immediate, graphical or financially rewarding (some FAQs stagnate). Only some internet information is created within the internet environment. The concept of 'brochureware' describes the common traits to promotional webpages directly prepared from paper promotional brochures. One of the more exciting trends is the movement of information from the dusty shelves of government offices and association libraries to their more accessible websites. The quality of information retained in your average government agency, from quality research reports, to detailed studies, to current industry monitoring is very high. These qualities are then brought over to the web format. Such web-documents tend to be isolated (not linked to other related resources) and perhaps a little behind the time line but of a generally high quality. An exciting holistic view of the internet information landscape is based on these descriptions. Imagine, for a moment, information flowing through a collection of systems. At certain points, information groups together, and generates new, perhaps higher quality information, which then flows in a different system, a different direction, to different people. The flow of information from one person to another, from one format to another, imprints qualities to the information along the way. Each organization, or subsequent re-organization, imparts specific styles and conventions and quality to the result. Publishing Motivation Let us proceed to a third set of patterns. Information appears on the internet for one very specific reason. Someone Publishes (DUH). The motivation behind publishing colours the information. This is a pattern we can use to quickly judge the contents of a webpage. Ask yourself who is publishing, and why. One of the biggest publishing segment a year ago were individuals publishing documents derived from their personal expertise. A typical document would be one with minimal peer review, a list of aging links to further resources, simple graphics, variable to short length, prone to bias but moderately reliable because the publisher knows their topic well. These pages are often located on web pages with private sub-directories (usually starting /~name/). Commercial sites publish mainly for the promotional value. Their secondary purpose is to provide sales information to prospective clients. Rarely do commercial sites go beyond this. Commercial webpages often reside on their own domain name, as a .com, or in sub-directories - without the tilde symbol. Commercial sites also tend to age badly. They are very noticeable from their front page. Government agencies are emerging as valued publishers. Slowly their dormant information becomes available through this new medium. Currently almost all government documents on the internet also appear in print, meaning they are factual, exhaustively reviewed, tend to be a little old (but age well), and come from highly paid knowledgeable people who believe it is their duty to inform others. Such documents are lengthy and appear on .gov domains. These patterns are simple to see. Grant-funded projects create brilliant research resources and hold much promise in pushing the limits of this technology. I am eager to see the results of the US Patents project, and appreciate the value of having Supreme Court rulings on the internet. Often such projects focus deeply on content. Most projects reside on educational servers and are widely discussed within knowledgeable groups. Associations publish association-kind-of-things. Most are initially just like the commercial webpages. With time such sites become much more factual and research-worthy. Most associations are dedicated to developing awareness of their chosen topic, albeit coloured by their chosen bias. Few associations are significant publishers but in time, this segment will begin to liberate dormant information within associations. Let's summarize. The key is to always watch who is the publisher. We can assume a great deal, quickly. We are unlikely to find the latest changes to patent law from government or commercial publishers. Such organizations are simply not motivated to present such information. Promoting Information Publishing is one achievement but you and I will never read any information until we learn it exists. This simple fact creates even more patterns to internet information. Knowledge of information moves through set routes on its way from writer to reader. Promotion is not simple. It is a process that takes time, effort and perhaps money. Information without serious promotion tends not to be promoted far from the source. Another way to phrase this; you must search close to the source to find poorly promoted information. A search engine indexes pages relatively indiscriminately. This also means a site of quality is not likely to reach your attention. The odds are not good, and from a promotion point of view, search engines generate minimal traffic to your webpage. Search engines also drop you rather randomly into a website. It is often necessary to move up a directory to understand the purpose and motivation of a site you find interesting. Information published through advertising tends to have a financial payoff for the promoter. This kind of information tends to be promotional information. Brochureware. The alternatives are to promote a webpage or website through one of the referral tools. Each such tool accepts links on some criterion. Each tool you use to locate information also selects particular types of information for your attention. If you arrive at a document by recommendation through a mailing list, the document is likely to be recent, on-topic and specific to the purpose of the mailing list. Alternatively, (for poor mailing lists) it will be wildly off topic and trash. You are unlikely to see referrals to old documents or documents of historical importance. These are the qualities most acceptable to the mailing list environment. Directory trees, FAQs, guidebooks and related promotion tools all work as historically important documents. In the past, such resources list, describe and alert people to relevant information for the field. Slowly, over time, this function becomes acknowledged, reinforced and promoted. Time is the essence of this fame. Webpages or websites found through historically important documents, by their nature, tend to be long lasting websites with lasting importance in the field. Such documents point to other similar documents or websites that have achieved a long-lasting importance. You are unlikely to find specific documents but rather sites that focus or bring together information. In short, there is little motivation to link to specific webpages, when a link to an important website is just as good. Similar generalizations can be made of each type of promotional tool, and become important in rapidly seeking our information which matches our intention, as well as summarizing the likely motivation, and bias, of webpages we are interested in. Information Clumps Information Clumps. Information is created, nurtured, develops, gets transplanted, gets arranged and then becomes visible through a process which brings similar information together. As we have discussed, there are factors deeply affecting all information on the internet. Motivation, Preparation, Format and Promotion all define the quality and content of any given item of information. With so many influences, we should not be surprised to learn information naturally groups together. In reality, there is nothing natural involved - it is a social phenomenon reinforced each time you and I visit or read one resource but not another. History can explain some aspects of internet development. As a small collection of sites become dominant in particular fields, by collecting and delivering better content to more people, new sites find it progressively more difficult to capture attention. This dynamic works for websites reaching out for visitors, and discussion groups reaching out for subscribers. In each case, seniority counts. Seniority counts in several ways too. Promotion is directly related to quality, interest, traffic and time. The longer a site is active, the better the footpath develops, the more people visit. Secondly, quality content is directly related to access to quality content, peer review, and time/money. Important existing sites gain in every way. This results in a grand system where the first-in, best-dressed, can capture the high ground and secure a grand lead in awareness and footpath over competitors who follow. Yahoo is a prime example of a directory tree, not even the best in most areas, which has achieved unparalleled traffic & awareness. This competition is equally evident where no money is involved. Perhaps your association wishes to create a new referral website, or an open mailing list, or an informative guide. All sound concepts, effective projects. However, if older, established resources exist, the work will be long and arduous. Despite the marketing message, the internet is not a world where the best information floats to the top. The internet will not let you to reach millions. You must compete for the attention, participation, devotion and assistance in a manner very similar to building a business. In concrete terms, information clumps on the internet. The best resource could appear on any internet system (webpages, email mailing lists, ftp-archives, FAQs, online databases, newsgroups...) but we can be fairly certain the best information will congregate in just one or two. Consider this as an application of the 80:20 rule. 80% of the good information will be found on 20% of the formats, arranged concisely by 20% of the search tools. Consider our article "Searching the Web" (spireproject.com/webpage.htm). We progressively search different web tools, looking for the most worthy. Searching the internet is the same. You must touch each system to see which system is dominant, where the information is congregating for your topic. Bringing this together In summary, we have broken down and discussed various qualities of published information and promoted information. We have made sweeping generalizations and educated guesses about information on the internet. Now what? When a painter begins to paint, they have already visualized some of the image. They already have a concept of the finished result. Internet research is no different. We start by building a vision of the information we seek. Who would publish it? Where would I find it? What is its motivation? How would we find it? We now have a practical vision. The address is one of the keys. The web address (or URL - Uniform Resource Locator) for any item of information gives us a surprising amount of information - particularly as we are making generalizations about information patterns. We can guess if information resides on a personal webpage, a funded university project, or a commercial project. The information resides on a .gov website? - the quality is likely to be higher and conform to our expectations of government resources. We use this new-found experience in three ways. Firstly, we restrict our searches to the most likely sources. Secondly, we quickly jump through lists of resources (such as those generated by search engines) to the sources that match our expectations. Thirdly, our assessment of information quality can be guided by our snap-judgements of its origin and purpose. Internet newcomers often expect to have instant access to the latest information at the touch of the button in beautiful colour and peer reviewed quality prose. Who is publishing this? Where is this information coming from? Who would help us find this? Such a vision is fantasy. If we were instead to look for an association website, dedicated to a certain type of research, or an informed newsgroup, maintained by people passionate about sharing this technology, then we have made four steps forward. We are clear about where to look for the answers we seek, and we will know quickly if the answers are online. Let us now leave this discussion on internet organization and internet theory. This is tough newly discovered territory, more than a little rough. I fear it will make most sense to people with considerable experience with the internet. Let us now explore the fertile grounds of understanding more familiar formats like books and news. ___________________________________________________ On the second year of his training, Shakh began to piece together the many rules and guidelines to understanding hieroglyphs. He had thought the lessons would end once he learned the glyphs but no, there were long and convoluted rules governing the translation of sounds into glyphs. Simple rules govern the placement of glyphs on the wall - certain glyphs lose their meaning when placed apart. Then, there was the art of writing. The glyphs had to be the right size and shape. If you were about to finish the line, you could squish certain glyphs just a little to make room for the next glyph. If you did not plan well, you would leave the line hanging, a word unfinished, a sentence incomplete. Then Shakh started to learn hieratic - shorthand glyphs for less formal situations. It was all very complicated and cumbersome. Shakh did not like the technical nature of writing. So much to learn and still so far from writing clear, interesting results. His seasons in training went very slowly. The Nile rose then fell then rose again. A great deal of dull information must be comprehended, absorbed, internalized. Nothing spectacular. Nothing of particular interest. Just a mass of rules and guidelines to help you move within the world of information. On the third year of medical school the aspiring doctor begins to memorize a vast linked-array of drugs, symptoms and afflictions. The next three years are spent developing this mental array; refining, building, adding experience, so that one day a doctor may look at a symptom, think of possible afflictions or drug reactions, then proscribe drugs or call for further tests. The whole process of learning this array is intensely dull. In the first part of this FAQ we explained in detail how an information search involves first selecting a suitable format (book, webpage, news, interview ...) then searching a few important tools that help us find information in that format. The first format we will look at is the humble book. Shakh arrived in Edfu on a small boat in the company of his father. It was a short walk from the dock to the Edfu temple complex. A fantastic sight. A noble sight. The temple included a vast library of books and manuscripts - a warehouse of knowledge about Egypt. Not that there were many manuscripts in total. The time and expense it took to create even a single copy made the library a prohibitive expense open to only those in certain need. This was not a public library, but an elitist library, open only to those who could justify the gifts required to enter. There it was, open before them, long shelves of scrolls arranged by rough topic. Amazing indeed. Shakh shivered slightly in the cool air. This would be his life for the next few years. Books have such meaning to us as a society. We have a vibrant emotional connection. Books exude a solid proof of value to a larger community. They are important resources but the additional awe is amazing to behold. Try ripping a chapter from a book you own in public. The stares and discomfort is almost tangible. Some book-lovers get upset about slight creases in books, treating books as if they were important museum quality manuscripts - something to hold with awe and treat gently. Being a book writer is similarly impressive. It is a mark of an expert. A knowledgeable expert. A knowledgeable expert we should listen too, should pay money for the chance to listen to, should pay, listen and carefully not crease their work. This attitude is silly. A book is a package of information, prepared along certain guidelines, with a purpose. In research we look for books on a topic that may help us answer a question. These books tend to be large, lengthy, detailed, verbose, heavy. Books are not good at describing cutting edge developments. They generally summarize popular consensus. They avoid criticism. When searching, they can make horrible resources. Books are also large and physical creations. They must be stored. They stick around. They have a limited shelf life but libraries are forever over-stocked with dated publications of limited use and value. They are also long - troublesome things to read. Books come in different flavors. There are the books by industry insiders who tell the truth, rip the facade about a particular industry. Such books make brilliant resources. There are also books by journalists, prepared without insider knowledge, more of a novel of a newsworthy situation. Such books tend to the verbose, circumstantial, light on facts. Certain questions simply beg to be answered by reading a book. Such questions are usually general, introductory, timeless. For such questions a stack of news articles would lack cohesion. A collection of articles would be too precise, not give you the larger picture. Such questions need the 100 pages of description, pictures and the considered framework that books embody. Finding a Book As an information format, there are certain tools and resources you need to be aware of to effectively search for books. Thankfully, many of these tools have emerged on the internet. These include: - A database of the free books on the internet from projects like the Online Book Initiative and Project Gutenberg. Includes many copyright-free classics (but not ebooks - a different concept). - Three government publication databases for the US, UK and Australia. The US and Australian databases are comprehensive. The UK database is incomplete. The complete database is commercially available - The book databases of large online bookstores is incomplete but useful as a fast search of current books. Some include background information. I use Barnes & Noble, Amazon, Borders and the UK Internet Bookshop (of the WHSmith bookstore chain). - The largest libraries of the world, like the US Library of Congress and British Library hold more than 20 million publications stretching back many years. The online book catalogues are not good for the latest books, but are brilliant at earlier works. - Local libraries and state libraries are noteworthy as finding a book in their database also means you have found access to these books. - The definitive resource is the collection of national Books-in-Print databases like [US] Books in Print, Australian Books in Print, French Books in Print... These databases are commercially available online, as print directories (yuck) in libraries and often from publicly available to search from good bookstores Book Databases Information about new books is organized in a collection of national "Books in Print" databases. This information is publisher-verified, includes forthcoming titles, and is naturally updated far faster than the library and bookstore catalogues. Books in Print, produced by Bowker, delivers publisher-verified information on US books. British Books in Print is produced by Whitaker & Sons, delivers publisher-verified information on UK books. Further national book indexes include Australian Books in Print (Thorpe), Canadian Books in Print (University of Toronto Press), Les Livres Disponibles/French Books in Print (Electre), Italian Books in Print, German Books in Print and others. All these directories are available as print directories (not particularly convenient), as a commercial database (through database retailers), for subscription (bookstores frequently subscribe) or through Global Books in Print (through not really global, is a group of book databases). With regards to the print versions, there may be recent editions in your state library but don't bother. The directory is not user-friendly as you must page through each month's subject categories. A more convenient alternative access point is your favorite large bookstore. For about Au$4500/year, many bookstores subscribe to Global Books in Print on CD-ROMs, or a national 'books in print' database. There should be no cost for searching, but ask for the date and the database name so you have a clearer idea of what is being searched. Further Book Resources Book Reviews are a viable tool in a book search. The tools mentioned above will give you very little information indeed - mainly title, author, format and price. You will usually want more than this before you buy a book. Book reviews are published in a range of book-related journals and newspapers. These are compiled into a commercial database of Book Reviews, like the Book Review Digest by H.W.Wilson or Book Review Index by Gale Research, or individual book reviews from the like of the New York Review of Books (www.nybooks.com/nyrev/). A state library may provide access to the Book Review Digest Database. Online book reviews are further discussed in Locating Book Reviews (www.lib.monash.edu.au/hss/guides/fsreview.htm) by Monash University Library. Barnes & Noble, and to a lesser degree Amazon, have additional information in their book database. Since it is free, it makes for a fine immediate alternative to searching book reviews. Future developments in book-related discussion groups holds out more promise in harnessing the opinions of a book-reading public. Quality issues remain (and the anonymous musings listed in Amazon.com and Barnes & Noble There are also book finding services with specialty book databases - like a database of second-hand books. Books on Demand is a directory of out-of print books available for reprinting (and includes price and order information.) Strategy Obviously title searches are not effective tools to discover new books. Not all books on Vincent Van Gogh include Vincent in the title. Subject searches, work well only if you can grasp the indexing. Apply these effective search techniques: 1) Browse the subject listing and select the subjects which interest you. 2) Read the subject listings off a book you know interests you - then search for other books in those subjects. 3) Search for other publications from suggestive authors (especially when the author is an association). Library catalogues, like LOCIS can illustrate these techniques. Let's say a title or subject search lands you with one of the books listed in LOCIS. This catalogue lists the applicable subject titles. Looking at books placed in the same subject category works well. A word about Book Types. Just as internet information comes in different qualities and formats, books also come in different styles and flavours. Books written by industry insiders are characterized by personal stories and expert wisdom from an author telling all the secrets. These books are worth looking for, and the short bio may give a clue. Books written by Journalists have a different flavour, slightly more newsy with less factual than, let say, Government books (far more factual than most), and frequently updated books (far more current than most). Try to find the style of book suited to your needs. Information Theory The book industry has reached a kind of plateau where fairly definitive databases exist for listing books. There are databases for government books, out-of-print books, second-hand books, current books. The internet has changed some elements of this mix, as business models try to support moving existing databases to free access, and others use this change to try to present more definitive databases. Book reviews have never properly been used by the book industry, so the big change appears to be a move from book titles (as in most book databases and library catalogues) to rich information (like Barnes & Noble) which includes reviews and readers comments. ___________________________________________________ Articles hold a definitive value, a statement of quality and currency. Sometimes articles are long, unique and informative works. Sometimes articles are short, simple, trite; a rehash of common knowledge. There is a range of ways to access articles - though none are particularly inexpensive. We also have difficulties paying copyright - so most paid research assistance is restricted to certain, more expensive tools. In all, articles are cumbersome, cumbersome and time-consuming to work with. They can also be brilliantly rewarding. There are three difficulties with article searches: 1_ Finding the articles which interest us. 2_ Getting our hands on a copy. (Many articles you locate may be impractical to access in person while electronic access can be expensive.) 3_ Copyright permission, (which can be potentially simple or exceedingly expensive). Of course, the main stay of article research is photocopying an article directly from a journal. Find a library nearby which holds the journal then read or photocopy it then and there. This process can be improved by using the online library catalogues (to see if they hold the journal) and by searching a database of library holdings (often available for free by asking or calling a librarian at your state library). As you could expect, some commercial businesses will undertake this work on your behalf, for a fee. The difficulty with this process, of course, is this does not help you discover what articles will interest you - this only works if you have a useful bibliography to work from. In recent years, a concerted effort has been made to bring you full text articles electronically. Commercial databases in general have moved from being strictly bibliographic to many full text articles. A system of full text articles on CD-ROM has a brilliant future. Up to 500 journals are updated frequently in this inexpensive format. (Most Research Libraries have this station.) Some of the commercial full text databases have emerged online too. Northern Light presents this. Unfortunately, the better quality articles are not included in these databases. It is not an absolute rule but to date, many of these commercial databases are filled with regional business papers, newspapers or similar middle to low quality publications. There is another system for accessing articles, which comes to us from a very long time ago. Inter-library loans are a system worked out between libraries so articles can be exchanged between libraries. Naturally you need the assistance of a library - and a great deal of patience. Such requests can take over a month to arrive. Lastly, there is always the option of direct purchase of periodicals from the publisher. Commercial Services Carl Uncover service (fatback articles). CARL (www.carl.org) is one of the great library groups in North America established a service to provide articles by post or fax. Carl promises to fax articles provided you use their system to check one of their many libraries has the required document. Northern Light - online database of articles Northern Light (www.nlsearch.com) is a search engine of both the web and their own database of articles available for purchase. The rates are cheaper than Carl (up to $4.00 per downloaded document) and the articles are delivered over the internet (not faxed) but the range is smaller. Information Theory Many of the databases will begin to offer their services either as a pay-per-view, or through reasonable direct subscription methods on the internet. This has been predicted for years but depends on the emergence of a fine way to purchase cheap items on the internet: digital money. No effective digital money has emerged yet, and most databases will either wait, or try one of the existing incomplete methods. Essentially, critical mass has not yet arrived, and it now appears that the true fall in price of information is waiting on an effective digital money. In preparation, magazines and newspapers are purchasing all the rights possible - especially the electronic rights. More appears on this topic later. ___________________________________________________ Webpages are often of unknown age, of only guessed at quality and potentially the easiest information to retrieve. There are many points of entry to web resources, but search tools differ. Try to match your search tool to your question. To start, you will need to learn something of the different tools - described below - and four basic search techniques: Boolean, Proximity, Field Searches & Truncation. Global Search Engines Altavista (altavista.com) includes a very large, fast search engine. It allows for Basic Boolean AND + NOT - OR | Proximity " " ~ (near - within 10 words of each other.) Several Fields: title:"Spire Project" domain:gov url:edu link:cn.net.au and Truncation/Wildcard (*) Of import, Capitals matter with Altavista. All-the-Web (www.alltheweb.com) is important because it is large - really large - with a flexible search facility. Allows Partial Boolean + - Simple Proximity " " and Several Fields a title field search normal.title:spire url field url.all:.au link text and link url fields normal.atext:spire link.all:cn.net.au All-the-Web is not case sensitive. The same database supporting All-the-Web supports Lycos. Inktomi (via hotbot.lycos.com) provides its substantial web directory through other companies, in this case, HotBot. also allows searches by region, by date, and more. Debriefing (www.debriefing.com) is our meta-search engine of choice. Use this to find names & named websites. Accepts Partial Boolean + - Simple Proximity " ". Capitals matter. Google(www.google.com/) is a new style of search engine which ranks sites with more care and concern. This works well for sites you know a little about in advance. Unfortunately, has no useful field searches. Allows Partial Boolean + - Simple Proximity " ". Unfortunately, No Truncation not even for plurals! When searching for a topic with precise descriptive terms, use a broad search engines. Always place the Boolean +symbol before each search word (like this: +word1 +word2) to insist all words appear in the results. Quotes keep words together ("word1 word2"). These two simple steps dramatically improve results. Keep adding words and search limits until the number of hits is reasonable. For more global search engines, there are numerous lists to consider like the W3 Search Engines page at the University of Geneva (cui.unige.ch/meta-index.html#INF) and the Industry Research Desk (www.rbbi.com/links/sengine.htm). Meta-Search Engines & Google If you know something of the destination already, like a title or company name or full name, try using a search tool that excels in finding named websites. There should be little difficulty in finding such sites with either Google or a Meta-Search engine, but don't get excited and use these on other occasions. Categorized Lists When searching for information that lends itself to a particular category or topic, start with resources which group information in categories. With few exceptions, these resources index websites, not webpages. Also, keep your search words simple as these are small databases. Yahoo (yahoo.com) is the largest of this type of directory tree; the definitive site. Accepts Partial Boolean + - Simple Proximity " " Truncation * and Several Field t: (for titles) u: (for urls) and a date field through a form. The Open Directory Project (dmoz.org) is a Netscape effort to, presumably, mute the strength of Yahoo. It is very good, and very similar to Yahoo. Looksmart (www.looksmart.com) is another significant directory. For an alternative, try the World Wide Web Virtual Library: Subject Catalogue (vlib.org/Overview.html), a distributed network of subject lists, not nearly as dominant as Yahoo, but far more "scholarly" shall we say. This virtual directory has been around many years, previously famous from www.w3.org. Reviewed Sites When seeking specific fields of study, when topics are clouded with many similar, low quality sites, start with resources with a greater degree of personal attention. Peer review and vetting produce resources with more quality but limited coverage, better suited to this situation. Also, keep your search words simple. The Scout Report (wwwscout.cs.wisc.edu) is one of the oldest and most highly regarded e-newsletters introducing new internet resources. Residing at the University of Wisconsin, the Scout Report describes research, education & topical sites. The Scout Report Signpost provides a quick search of previously featured sites. BUBL (www.bubl.ac.uk) is a British site which reviews internet resources then indexes by Dewey decimal number. I prefer their Dewey presentation but the collection is not large (though the largest of the library projects I have seen). The Argus Clearinghouse (www.clearinghouse.net) is a vast collection of internet guidebooks. We can search the titles & descriptions, but then click on the highlighted keywords to find related guides. I suspect Argus is not successfully keeping pace with internet development. AlphaSearch (www.calvin.edu/library/searreso/internet/as/) is similar to Argus. This one indexes important nexus sites and should be browsed. The Britannica.com (as in Encyclopedia Britannica www.britannica.com) has been remolded as a free guide to books, periodicals, web and their encyclopedia. This encyclopedia is perhaps the best. FAQs can be searched from an FAQ database like the one at www.faqs.org WebRings list sites by topic. Each webring is maintained by a volunteer at an uninvolved site using standard software. The primary sites are currently Webring.com and bomis.com Specialty Tools For issues with a particular government, url or language origin, consider using tools designed with this in mind. * Altavista can be limited to specific domains (gov edu au) with their "domain:domainname" field search. "url:url-segment" is also useful. Read the Altavista Fancy Features for Typical Searches. * GovBot (ciir2.cs.umass.edu/Govbot/) as developed by The Center for Intelligent Information Retrieval (CIIR) is a search engine which indexes exclusively a great number of government webpages, a unique resource. * Altavista also allows for a field search by language. Searching for a Japanese site? Consider searching only webpages in Japanese. * Purely regional search engines may also be the answer. Aussie.com.au, for example, is a search engine indexing only Australian websites. There are fine lists of regional search engines and directories like SearchEngineCollossus, Search Engines WorldWide, SearchEngineWatch and Yahoo. * Topic-specific search engines, a new arrival, has a very promising future. Ideally you will find a search engine like ChemGuide (www.fiz-chemie.de/en/datenbanken/chemguide/)covering over a million chemistry related pages. Search Engine Guide (searchengineguide.com) and Gary Price's Direct Search. (gwis2.circ.gwu.edu/~gprice/direct.htm) list topical search engines. * Lastly, there are some commercial databases aimed at the software and internet industries. Consider OCLC's NetFirst (articles from magazines describing the internet). Conclusion For many of us, searching the web is simply typing words into a search engine. I hope I have shown there is more to it than this. What may not be clearly evident from a brief overview of resources is that each resource has a particular difference, a particular focus, a particular angle that helps us answer certain questions faster than other tools and searches. Yes, in the simple world of Yahoo and Altavista you pay no attention to the specific differences between alternatives - you are left with the worst of these two tools. Your results are general, timeless and imprecise. Contrary to myth, global search engines are not the best place to start most of the time - just some of the time. On other occasions, start with a directory, a meta-search engine, a guide, an FAQ... We should be able to identify which tools excel at locating what kinds of webpages. (There is no simple search of everything.) There are more insights into effective internet research. Information clumps; Information is not established in isolation but instead develops in context, is reinforced, and becomes a trend. The publishing motivation & promotion purpose can help us rapidly judge the content of a website. The webpage address can tell us a great deal about both the website structure and the type of publisher. Once skilled, you can segment and search the most promising areas of the web quickly and efficiently. If you do not quickly find your answers there may be other, more appropriate resources. Consider asking for help in an appropriate discussion group, or reviewing printed literature instead. The Web is only one resource among many. If your primary interest is Search Engines, consider reading A Higher Signal - To - Noise Ratio (www.dpi.state.wi.us/dpi/dlcl/lbstat/search1.html) by Bob Bocher & Kay Ihlenfeldt, Sink or Swim: Internet Search Tools & Techniques (www.lboro.ac.uk/info/training/finding/sink.htm) by Ross Tyner and The Search is Over (www.zdnet.com/pccomp/features/fea1096/sub2.html) by Adam Page. For even more, read Searching the Internet (wwwscout.cs.wisc.edu/toolkit/searching/) a publication in the Scout Toolkit and browse Search Engine Watch. Strategy Searching the web is more skill than most of us acknowledge. The web is a manifestation of the demon professional researcher's work with all the time in the commercial information market. There is constantly the fear you have missed that single important site with everything. Consider the researcher's motto: Someone, somewhere, probably knows the answer. But how long do we search for gems, and where do we look? To decide, we must learn about internet structure and organization. Why is information published on the web? Why is it promoted? Let's review the reasoning behind effective internet research. There is so much more than putting words into search engines. #1 Motivation We can make some very astute generalizations about a webpage very quickly if we can judge the reason it was published. Not only is this an important step in analyzing any information, but this tells us a great deal about the contents of the webpage. Yes, merely determining a site belongs to an association actually specifies the quality, motivation and type of information we will find. Associations either publish what is termed 'brochureware' (promotional material), or if well advanced, present research work previously restricted to the association library: important research studies & the like. Commercial interests have much more difficulty delivering useful resources. The importance of projecting a corporate image comes first (lots of 'brochureware'), and service descriptions come second. On occasion, commercial interests will support a worthwhile service tied closely to their own service - thus banks present interest rates - bookstores present their book database. The certainty with which we can make these judgments will astound you. Corporate websites never publish "changes to patent law". They simply don't have the motivation. Only an individual would publish this, most likely not on the web but though a mailing list. Information is not distributed randomly. Consider Format, Preparation, Motivation and Promotion. Consider this, then Visualize the information you seek. #2 Promotion We can make further snap judgments about web information from the way you get there. Promotion is very difficult on the web, and it is hard to find poorly promoted information. The tools you use to reach information pre-determines the type and quality of information you will find. Search engines index webpages indiscriminately. Advertised websites must have a pay-off. Directories focus on established websites (not webpages). Link pages also link to established websites but put more thought into the selection of resources. Both usually focus on general sites. For specific or current resources, we need to move to mailing lists or active nexus point. Yes, when we find a webpage through the Scout Report (a prominent resource discovery newsletter), we can assume the webpage has a high quality of information, is reasonably current and has a general appeal (within the interest of the newsletter readers). Let's put this in reverse. If we are looking for a recent document by a prominent library committee, we will not find it through Altavista, Yahoo, or normal link pages (except accidentally). We may find it through specialist newsletters, active nexus points, or through mailing lists. #3 Visualize When an artist begins to paint, they visualize the image. They already have a concept of the finished result. Internet research is no different. We start by building a vision of the information we seek. Who would publish it. What is their motivation? Who would promote it? Where would I find it? Information Clumps. Information is created, nurtured, develops, gets transplanted, gets arranged and becomes visible through a process which brings similar information together. Your understanding of this process, including motivation and promotion, must guide your search of the web. Only then will we know where to look, and quickly know if the answers are on the web. ___________________________________________________ Shakh was invited to travel with the army on the conquest of Nubia. The Egyptian army was not in need of further soldiers but there was a need for a witness. Shakh would write the official chronicles of the army's exploits. He would be expected to send a simple diary on papyrus back to the palace and then to compose numerous descriptions for memorial walls. He may also be consulted for paintings on the pharaohs tomb. It was a fine offer, and he relished in the prospect of increasing his value exposure. The war was not swift, nor was it entirely one-sided. In the end, superior numbers had its effect and Nubia was once again reunited with Greater Egypt. Reporting was initially a challenge, since very little happened from day to day. Slowly, Shakh got a handle on the process and focussed on the grandness of the venture. Two years after floating up stream, Shakh was able to do his finest work, the parade of captured soldiers past the Pharaoh's representative. News articles are typically light and biased. Do not believe a news item is a great critical analysis of current events. Most news is produced under time restrictions, for prompt consumption. In research, news often proves particularly useful for locating information about individuals or businesses. News is also critical in creating a timeline of events, in recording events of regional/national/international importance. News prepared by individual reporters is collected together by large news organizations, then delivered to other news organizations around the world. Your local news organization does not have a reporter in Iran, but rather buys the story off a newswire, then packages it in your evening news hour or morning newspaper. You have probably heard of: United Press International (UPI), Reuters Global News, Agence France Presse, Associated Press and Xinhua Chinese Newswire. These very large organizations make their information available to you in a variety of ways. News collects in commercial databases of past news, some single source, others, large multi-source databases. Current news is also packaged into large multi-source systems delivered by email or newsgroups. Many newswires are available online free of charge. Free News Critical to the changes on the internet is the emergence of free access to text news. Individual newspapers present news free. Newswires present news free. News sections to larger sites like Yahoo present news from many sources, free. News-only search engines will help you find information from a great many sites with news. The process of finding current news is about as slick as imaginable. Here are a few players in the market: * Yahoo News (www.yahoo.com/headlines/) is leading this field with web delivery of current news from Reuters, Associated Press, and others. Yahoo also includes a free search for one week's news. * Voice of America Newswire (VoA and now voanews.com) delivers news in English & many other languages. * The Washington Post (www.washingtonpost.com) offers their own current news for searching, as well as the Associated Press wire, each searched separately for the past week. * Fox News (www.foxnews.com) presents current news online (both current events and sport news). CNN news (www.cnn.com) is another searchable site. Both repackage some newswires and present them online. C|news (www.news.com) does this too. * Newsbytes (www.newsbytes.com) is a newswire solely on computer topics, computer, telecom and online world. InternetWire and other specialty newswires also present news from their website. * United Nations Radio: The World in Review is one of many news shows with the transcripts online. Unusually, the Vatican's newswire is not free online. * Obviously many more exist - and thankfully we don't need to create a list or manage the sources. The Spire Project has a clickable map of English language newspapers. There are definitive lists of global newspapers like Gary Price's gwis2.circ.gwu.edu/~gprice/newscenter.htm#International dailyearth.com and ipl.org/reading/news/ Commercial Resources The commercial segment of the news market is obviously being squeezed by the copious quantities of free news online. There are, however, still some viable markets, principally enterprise solutions (companies are willing to pay for slight improvements), past database access, and surprisingly the Wall Street Journal (US$49/yr). To these markets we have Clarinet and Newspage. World News Connection is US Government service presenting translated news (quite a gem) as a searchable database. Unusually, prices start at US$25/7days - yes one price for the news! Of course news alerts can be arranged from the commercial news databases through the database retailers, and each newswire like Agence France Newswire, Canada Newswire, Xinhua News and Associated Press all are unique databases, and all stretch back many years. Further databases like Newswire ASAP and what used to Global Textline are massive databases of multiple newswires and newspapers. I recall at one stage Textline had over 4 billion pages. Conclusion News articles are typically light and biased. The sheer quantity of news in the large news databases make this a useful resource to fall back for any tightly focused research topic. I once discovered an obscure scientist working in a unique field from a small 3 paragraph article in a local farmer's newspaper in England (Global Textline Database). Newswires and News Databases are just two elements of a large industry which extends to the your local newspaper and to further specialty databases. Most newspapers maintain their own local news database, and some make this available electronically. A manual clipping services may also be the option - certain firms manually page through local papers looking for advertisements or articles. While on the topic, certain newswires like Business Wire and PR Newswire essentially distribute certain types of news for money. Yes, anything in these newswires is there because the company paid for it to be there - $500 and up most likely. Other newswires earn money in the reverse process: from the media who read or publish their work. Associated Press or Reuters are created from news organizations. Others like Voice of America (VOA) are alternatively funded, but with reasonable reliability. There are also a range of focused newswires such as Newsbyte (computer issues), PR Newswire (product releases), and Middle Eastern newswires. Further newswires can be found at Yahoo. Strategy I can think of four ways to use this information for research: 1) As an alternative to your evening news or morning newspaper. Online news is available 24 hours a day, in more detail, from respected news organizations. 2) Search past news to locate information unlikely to emerge in journals or magazines. News includes a great deal of local detail and personal information unlikely to be found elsewhere. 3) As a historical record of events, perhaps the basis of a timeline. 4) Current Awareness and Alerts so articles come to you as they are reported. News stories by email will become a large industry over the next two years. Information Theory Just how inexpensive can news become? US$25 gets you access to past translated news! VoaNews.com keeps a searchable directory back a month for free. Many newspapers still have extensive archives of news, though they hope to one-day charge for them. In a way, no-one is making money from news. It is only worth the advertising revenue for distracting you from reading the news - and that is falling too. With the freedom of moving information through the internet, several free services will send you email when an news article matches your interests (an Alert). The future will see much more "compile your own" newspaper - especially since it could conceivably be compiled at minimal to no expense depending on the technology (frames anyone?) An intriguing lawsuit recently stopped TotalNews (a news only search engine) from displaying news articles in a frame. If allowed to speculate for a moment, News-for-Pay may also become a viable businesses. Perhaps this is just being cynical of journalistic standards and the accepted standards of promotion. Perhaps it is also recognition that Businesswire and PRWire are just two of several newswires where you pay to have your news included. Obviously news today is biased towards advertisers (through advertorials) and promoters. Perhaps this will become automated some day - like Yahoo's "we will look at your site right away for $200". Naturally, the links and many of the forms to news resources discussed here can be found at spireproject.com/newswire.htm and also our All-in-one page: spireproject.com/spir.htm ___________________________________________________ Theses and dissertations are professional papers completed for higher degrees. That is to say, they are long, dense and often very esoteric and convoluted. Trouble is, most theses and dissertations have no more than 12 copies ever - one always to the University Library, one with the author, but others scatter to the wind. All University Libraries hold a copy of past theses undertaken at their university. This gives rise to the unfortunate but necessary pastime of searching each local university library for relevant theses. The advantage here is masters and occasionally honours theses are indexed. Most often, just undertake a keyword search then add "thes*" (truncation of theses or thesis). Electronic Theses Databases: Dissertation Abstracts Online, produced by UMI, delivers abstracts to most every doctoral dissertation/thesis in North America, some master's theses and some international theses. This is the definitive site to search, though you will need the help of your library to see more than the abstract. Some libraries will have subscribed to Dissertations Abstracts OnDisc - the CD-version of this database. The [British] Index to Theses with Abstracts is a print directory by ASLIB. This publication is also available as a database, available for site licenses through Theses.com (www.theses.com). This source is quite comprehensive as can be seen with the University List. Several other national databases do exist. Here in Australia, a list of theses was maintained from 1966 to 1991. The Gale Directory of Databases also lists THESA, a database of French theses, and Dissertations and Theses of the ROC (Taiwan). The Australian Education Index (1978+), produced by ACER (Australian Council for Educational Research), is a directory listing citations and some abstracts to Australian work in education. Also available as a commercial database, AEI is bundled into Austrom, a common collection of Australian databases. Digital Archives of Theses In theory, some theses should be available on the internet, particularly theses lodged electronically. There is a push for universities to accept electronic thesis submission, and to build digital archives of theses. The embryonic National Digital Library of Theses and Dissertations (NDTLD - www.theses.org) is just one such a project. There is a distributed and sequential keyword search to participating universities through its not particularly functional. In theory, this is an incremental improvement to searching library catalogues. Conclusion Getting a thesis can be very difficult. You will need the help of a document delivery through a library and many theses will not be available to borrow. You can also buy theses. Read Obtaining Copies of Dissertations (www.library.yale.edu/ref/err/disscops.htm) by Yale University Library for more. For an alternative look at theses, consider Locating Theses (www.lib.monash.edu.au/hss/guides/fstheses.htm) by the Monash University Library. A note on developments in this field: some Theses abstracts are emerging online already. Projects like the LA Theses Database (Landscape Architecture Theses Archive) have much promise but poor coverage. Full text theses presentation also have promise with the US Department of Education funding a National Digital Library of Theses and Dissertations and Virginia Tech starting to request electronic submission of all theses. UMI (the producers of Dissertation Abstracts Online) has backed this move with a direct delivery service of electronic theses to US libraries for $26, but only theses held in their digital archives are available. Eventually, large digital Theses archives will be the norm, but until then, very little will happen in this field. A thesis is a tightly constrained information package, produced in the university environment with limited appeal. For economic reasons, we should not be surprised theses databases are incomplete. The emergence of theses archives sounds interesting - a good use of the internet - but does not represent a financial opportunity that could be explored without government assistance. Consequently, this small area of the information sphere is government grant-driven. ___________________________________________________ A patent discloses certain facts about a commercially important invention in exchange for certain rights to exploit the invention. This is a little simplistic, but explains why patents are factual, unique from other research resources, and a little vague in certain specifics. If you have never seen a patent before, see a sample US patent , Australian patent, and this brief description (www.ipaustralia.gov.au/patents/P_home.htm). There are three primary resources involved in patent research. Firstly, we have the free internet resources. Secondly, we have the national patent agency resources. Thirdly, we have the commercial patent databases. Free Patent Databases The concept of free patent databases has surely come, and while many countries are only slowly moving this direction, the movement is inevitable. * The US Patent and Trademark Office (USPTO) provides a US Patent Bibliographic database at patents.uspto.gov with full use of fields, date and abstract text searching. Choose between their Boolean search, advanced (field) search or by US patent number. They also maintain a fulltext [US] Aids Patent Database and other resources. * The IBM's Patent Server is a public service providing a different patent database of US Patent abstracts. The IBM service is similar but different from the USPTO service - certainly not less powerful. * The Canadian Intellectual Property Office (CIPO) maintains the Canadian Patent Fulltext Database from '89. This database is on par with the US Patent Database, with perhaps even better searching technology. * The Japanese Patent Office (www.jpo-miti.go.jp) has a searchable database of Japanese patent abstracts, including patent number, title, inventor, company, and abstract of the patent. Patent Authority Services Patent libraries are an important and cost-effective patent resource. * IP Australia (www.ipaustralia.gov.au) (formerly the Australian Industrial Property Organisation (AIPO)) has a patent library in each Australian state capital. Each library provides free access to the APAS database (Australian Patent Abstract Search) and includes a complete microfiche copy of all Australian patents and the Australian Official Journal of Patents, Trademarks & Designs (the official Australian patent gazette). Most offices also hold US Patents on microfiche! Staff will help you use the APAS database, arranged for free text searching by International Patent Classification. A particularly useful service by IP Australia is the delivery of copies of many foreign patents for AU$15. You will need the patent number, country and title for this. * The US Patent and Trade Mark Organization (USPTO) has the Patent and Trademark Depository Library Program (PTDL's) placing the CASSIS database (The USPTO patent abstract database on CD-ROM) and US patents around the US. The US patent libraries also hold the Official Gazette of the U.S. Patent and Trademark Office, The official US patent gazette. Importantly, the gazette is fully online and searchable from 1995. * The [UK] Patent Office (www.patent.gov.uk) provides for the Patents Information Network (PIN) which hosts patent information in the UK. The British Library is just one listed source of UK patents (further information online) and delivers some patent services. * The Canadian Intellectual Property Office (CIPO) (cipo.gc.ca) produces the Canadian Patent Index (CPI). They also publish The Patent Office Record, Canada's official patent gazette. * There are many more national & international patent organizations like Intitut National de la Propriete Industrielle [France], World Intellectual Property Organization (WIPO) and European Patent Office. Thankfully there are fine lists of patent libraries and patent websites. Commercial Patent Services One of the most invaluable resources in serious patent research is access to several of the very large commercial patent databases. * Lexis-Nexis (www.lexis-nexis.com) retails several patent databases. Thanks to Patscan (University of British Columbia), we also a guide to searching patents on Lexis-Nexis. * The Dialog Corporation (www.dialog.com) retails a collection of patent databases including: Derwent World Patents Index, Inpadoc, Claims/U.S. Patents and European Patents FullText. * CASSIS is the USPTO database. For a little more information on this, consider the Patent Guide to Using CASSIS, at the University of Michigan. * Derwent Scientific and Patent Information (www.derwent.co.uk) is a prominent publisher of Patent and scientific information including commercial databases. * Questel-Orbit (www.questel.orbit.com) also retails patent databases. * CAS/STN (www.cas.org) retails a collection of patent databases including Chemical Patents Plus for U.S. Chemical patents. In addition to the database retailers and producers, there is a lively industry of patent services. * The Patent Libraries will assist you with some services. IP Australia, for example, will retrieve most full patents from other countries for AU$15. Conclusion Until recently, the legal profession has had a complete monopoly on patent work. As you can see, this need no longer be the case. Casual researchers will find the free patent databases easy to use, and more experienced researchers should not be dissuaded from searching the commercial databases or patent libraries themselves. The very large commercial databases, like Inpadoc, are particularly easy to use. Of course, there are occasions when patent searches are critical, and experts should be sought. Certainly legal assistance is required if you are preparing to lodge your own patent, but patent data as a source of information is another matter. As an industry, patent research is still deeply entrenched in the high-price commercial database and database-centered services. I am mildly surprised the emergence of free databases like the USPTO's patent database has not led to a fall in the costs of the high-end databases (which remain some of the most expensive databases in publicly accessible). It appears this industry, as indeed several others, has no intent to drop the price of retail database access to a more supportable level. I can only predict this rests on economic grounds. Patent information purchases are price insensitive. ___________________________________________________ Statistics allow us to lie with confidence. Dense and factual, carefully interpreted statistics are also far more reliable than personal experience. The expense of collecting meaningful statistics limits the types of organizations involved in this work. This divide is also a very elegant way to divide this field. #1 National Statistical Agencies, #2 Government Agency Statistics, #3 Commercial Statistics, #4 Association Statistics. Statistical Directories Statistical Abstracts (statistical bibliographies and statistical directories) describe sources of statistics. Instat publishes "International Statistics Sources: subject guide to Sources of International Comparative Statistics" but I found this less than brilliant. A better link is Statistical Sources (by Gale Research), a basic and very large statistical abstracts directory. On the internet, US government statistics are well recorded in Statistical Abstract of the United States 1999 (www.census.gov/stat_abstract) a 1000+ page document made available online in pdf format by the US Census Bureau. Statistical Venues Many statistics appear regularly in journals, annual reports and newspapers. Specialty libraries, particularly specialty librarians, may be aware of additional statistics. If an expert goes through the effort to collect statistics, you are far more likely to locate them by undertaking an article search, (looking particularly for journal articles) and a book search. In both cases, limit your search to only the last couple of years or you will locate very old, dated statistics. A particularly sophisticated approach could be to ask BusLib-l (Business Librarians' Electronic Discussion List) since this is a mailing list of librarians. Use this resource sparingly, and only after having exhausted other avenues. National Statistical Agencies Most every country in the world has a single government agency dedicated to collecting, collating and publishing national statistics. Statistics Canada, Australian Bureau of Statistics, The US Census Bureau, The (UK) Office for National Statistics; we have a fine page on national statistical agencies (spireproject.com/bureau.htm). These organizations manage the census, watch the movement of money and goods in and out of the country, and undertake a wide range of other surveys. Finding these statistics is relatively straight forward, with several directories on the internet. Government Agency Statistics Most government agencies collect reams of data on the industries they monitor. Sometimes these statistics are published, sometimes you have to ask for them, only rarely are they considered private or unavailable. Here in Western Australia, the government departments for Tourism, Labour, Small Business and Big Business all publish top-rate statistics free to interested parties. Our Dept of Tourism keeps a directory of future tourism related projects. When government statistics are bound and published, try the government book databases. Remember MOCAT, AGIP and part of UKOP are free online. Again, some US government statistics are well recorded in Statistical Abstract of the United States 1999 by the US Census Bureau, online in pdf format. Association Statistics Valuable statistics only come from motivated sources, and associations are certainly motivated. Start with a list of likely associations, then call up and either explain you needs or ask for their price list for publications and statistics. For AU$25, the Australian Booksellers Association publishes a brilliant analysis of the book industry. Association statistics are financially informative, as the intended audience is association members. Commercial Statistics Statistics created for sale are frequent in the financial sector but exist in a number of further situations. Banks use more professionally prepared market reports such as reports by the Australian economic consultancy firm Syntec Economic Services, Guide to Growth, which examines Australian industries financially with forecasts. IBIS (www.ibis.com), another economic consultancy, also publishes to this market. Professionally prepared market reports are also emerging, with the full text immediately from the commercial information market. Each database retailer has several such databases, but often these databases are focused globally or in a different country. Sheila Webber (www.dis.strath.ac.uk/people/sheila) has a very good list of firms which market research reports. Conclusion Central to the Internet Revolution is the liberation of just this kind of information. Increasingly, we will see the publishing of such documents on the internet, but for the few statistics currently online, there is no effective search. You can only browse government websites. Away from the internet, you must either contact the agencies directly (in the hope they do collect statistics), look at the statistical directories or seek agency statistics in other documents: books, pamphlets, newsletters. Once you have proceeded this far, it is wise to stop looking for statistics, and begin again at sophisticated commentary - which is likely to include supporting statistics or references to statistics anyway. Seek expert guidance from others who would know of hard-to-find statistics. One approach to finding statistics is to reverse the process. Who would prepare the statistic? Statistics are created in a logical manner, in a very expected manner. Tourism statistics? - most likely undertaken by either the government tourism authority, a tourism association or the national statistical agency. There are few others who could even consider preparing tourism statistics. If you can think through the preparation process, you can usually identify who would have created the statistic. (Internet statistics are the exception - too many organizations are creating statistics of worth.) Let's move on to specific fields of statistics. National Statistical Bureau The Spire Project has a fine html article on the National Statistical Agencies (spireproject.com/bureau.htm). Australia (www.abs.gov.au), United Kingdom (www.ons.gov.uk), Canada (www.statcan.ca) and United States (www.census.gov) all have national statistical agencies. Each organization collects and publishes statistics on many facets of their respective countries. This article should simplify your work in searching, selecting and appraising these sources. Each statistical agency organizes their statistics in a distinct way. The Australian Bureau of Statistics (ABS) has an annual Catalogue of Publications but also a search function, specialized statistical category guides and several periodicals on new resources. The UK Office for National Statistics (ONS) has a statistical overview, product catalog and a search. The US Census Bureau has a collection of very large publication catalogues, directories and periodicals. Statistics Canada has several searches, publications and a catalogue The two further elements to the statistical agencies are the statistical libraries and the unreported commercial statistics. The ABS has a dedicated statistical library within each Australian state, and collections of ABS documents within most public and school libraries. While the ABS documents within libraries are limited, the ABS libraries are very detailed with most every publication they create available for review. This is standard throughout the world. While publications are sold by each statistical agency, and the publication catalogues are available online, each agency has data they sell in other formats. CD-ROMs of popular geographical and statistical distribution have become very popular, as have small area population statistics. Some of these services are packaged and sold for specific purposes, like 4-site by the ABS used in describing business locations. Even further, statistics can be generated specific to your needs. This might include ABS import and export statistics for specific commodities, or specific results from any of their surveys. Lastly, Usinfostore.com presents a collection of economic indicators as time-series data. The statistics originate from several government agencies and is best considered as a value-added service: an intriguing beneficial trend? National Statistical Agencies are certainly not the only source of statistics. They are, however, some of the easiest to access. These agencies also have several traits that di |