Research Commentary by David Novak of The Spire Project

Feeling Lost?
Learn to read the Internet's roadside signs. You won't feel lost again
By David Novak

The Internet is well signposted. Far from being a vast ocean of unrelated webpages, many clear patterns can readily be found online. Once you recognize these patterns, you will feel much more in control.

Each page, each item on the Internet has a singular web address; a long string of words and symbols identify that item and no other. The technical term is Uniform Resource Locator (URL). It's that thing you type in the little box at the top of your web-browser.

We all know the basic difference between commercial, educational and government sites, but it is worth sharpening our description a bit. Clues gleamed from urls can improve the way we find information.

[domain name / directory tree / file name]

Firstly, let's cover the basics. Urls are only case sensitive to the right of the domain name. The presence of a ? denotes programming. The presence of a tilde ~ denotes a personal subdirectory.

There is no significance between files ending in .htm and .html, though they are not the same file. When there is no file at the end of the directory, it refers to the default page - usually index.htm or index.html but not always. Sometimes you get a directory listing of all the files in that directory. Sometimes you get an error message.

With this knowledge in hand, we can move freely through the directory tree of an web address we find interesting. We can assume all information in a single directory is a little related.

Say we land on http://iinet.net.au/~datalog/webpage.htm Besides looking at the links from within that page, we can also just chop the far right side off the url and browse other work in that directory.

Historically, before the web when ftp was the primary tool, we navigated in part with a search engine of file names (called Archie) and in part with the clues available from the directory titles. As the Internet allows for long file names and long directory titles, we could select the directories we were interested in. We can still do this at times, particularly in prominent shareware archives like Simtel.net.

Take, for instance, the file http://www.simtel.net/pub/simtelnet/msdos/astronmy/sky3dv11.zip

Just by looking at the url, we see its shareware (I did say simtel was a shareware archive) on astronomy (a directory is even called astronomy). If we were perceptive, we may also notice it was msdos and version 11. For more astronomical programs we can try .../msdos/astronomy/ and there may be other directories of astronomical programs like .../win3/astronomy/.

This movement within a site can be very effective. When we read a webpage like http://www.hicom.net/~oedipus/blind.html then http://www.hicom.net/~oedipus/ will be likely to have more information about the author (and why the directory is called oedipus).

There is another story told by urls; that of organisational support. With the emergence of the web, the directory structures of websites began mirroring the structure of the organizations they belong to. There has been a recent move to re-organise website directories to suit the needs of the visitors, but this has not as yet affected many websites.

What remains then are clear indications as to the organisation, department and importance of a given webpage to its organisation.

Importance? Well, webpages on the top directory are intended for the most general audience.

Take a look at example two, a webpage by the United Nations Statistical Division (UNSD) with international literacy statistics.

http://www.un.org/Depts/unsd/social/literacy.htm

Of course we could have guessed as much. un.org. literacy.htm. It is right there in the web address. Further, we know about the quality of these statistics. United Nations pretty much says it all.

Let's bring this all together. Using a search engine we can quickly bring up a list of potential webpages for our consideration. If we first take the time to peruse the urls, we can find many an important clue as to the contents.

Say we have a list that includes:

http://iinet.net.au/~datalog/webpage.htm
http://www.un.org/Depts/unsd/social/
http://www.hicom.net/~oedipus/

With little more than common sense and an url, we can make simple judgements about the information's purpose, probable motivation, and likely quality.

Start by looking at each web address (url) and deciding if it matches the kind of information you are seeking.

You will not begin to correlate types or styles of information with urls, unless you start taking notes. This is a skill you will develop over time.
* * *
David Novak manages The Spire Project, an Internet research resource and thinktank.

The Spire Project - better ways to find information.
Like this? You should attend our public seminar and receive our bi-monthly update notice.
| SpireProject.com | SpireProject.co.uk | Project Background | Feedback. Copyright©David Novak 2002.