Internet en WWW voor het opsporen van informatie
description
Transcript of Internet en WWW voor het opsporen van informatie
1
Internet en WWW voor het opsporen van informatie
[email protected] Universiteit Brussel,Pleinlaan 2, B-1050 Brussel.
februari 2004VUB-IDLO
2
The slides are available fromhttp://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/
(note: BIBLIO and not biblio)
3
Planning van de dag: voormiddag
• Over “informatie”• Informatiemarkt• Information retrieval• Thesaurussen
(+ oefenen van query-formulering)• Netwerken en Internet i.h.b.• World-Wide Web (+ oefenen van “browsing” + “saving”)
• LUNCH
4
Planning van de dag: namiddag (deel 1)
• Online toegankelijke informatiebronnen!»Globale Internet directories (+oefenen)»Internet indexes (+ oefenen)»Boek-databases (+ oefenen)»Te betalen databases»Databases met titels van tijdschriftartikels»Vinden van illustraties/beelden/foto’s (+ oefenen)
5
Planning van de dag: namiddag (deel 2)
• Evaluatie van informatiebronnen
• Vrij zoeken volgens eigen interesse, met assistentie
6-Interruptions-Questions-Remarks -Discussions are welcome
7
About “information”
Information concepts
8
The flow of documentary information with primary and secondary sources
Reader /User /
Receiver
Secondary sources / systems: mainlyReference works (printed, CD-ROM, online)
Library catalogues, including OPACs...
Author /Creator / Sender Primary sources / systems: mainly
Journal articles / Books / Electronic mail / Online sources /...
9
The role of secondary information sources
• The secondary information flow is generated on the basis of the primary flow, mainly because the great amounts of primary information lower the chance to retrieve and use the appropriate information item.
• Secondary information tries to bring some order in the great chaos.
10
Various categorisations of documentary information sources
Information sources can be categorised in various ways. For instance:
•Primary
•Secondary
•Hard copy /not digital
•Digital•Offline
•Online
•Text•Image•Sound•Animation/video•Software•Data•Interactive
•Books
•Serials
11
Past
Now
Future
Retrospective searching versus current awareness: scheme
Retrospective searching
Current awareness
12
Information retrieval: evolution of storage and distribution media
• 1450 printing with reusable characters/fonts
• 1975 + online access databasesfrom the 1970s growing Internet
• 1985 + CD-ROM• 1990 + World-Wide Web
(based on the Internet)
13
Information retrieval: end user or information intermediaries
End-user
Information intermediary(Broker or library or ...)
Information
14
End user versus information intermediary
• People can retrieve information themselves, directly as so-called “end-users”.
• However, »the information landscape is complex, »it may cost a lot of the time to find the right information, »it may be costly to search for information
• Therefore it may be wise to obtain the assistance of an expert information intermediary, such a a reference librarian or an information broker.
15
About “information”
Computer- and network-based information
16
Information: from bits to meaningful information
Digitalcomputer data = bits
or01Program code, meaningful for andto be interpreted / executed bya suitable / compatible computer
Information = “documents”, meaningful for andto be interpreted byhuman beings
17
Information: digitally stored and managed information
Categories of digital, computer readable information / data, forming electronic “documents”,understandable by human beings.
01textnumbersimagesvideosounds
multimedia
+
18
01
Digital information
Multimedia / Hypermedia
Information: types of digital information
Linear textHypertext
Static imagesVideo
Sound
Programs for computers
19
Online / Networked
CD-ROM
Update speed
Volume
Some publication media compared
Printed
20
Scientific publishing in Utopia: an ideal scheme
Many authors
Many readers / users
Many editors / publishers
Online remote access multimedia database server
Many database search clients and user interfacesone global ,
international computer data communication network
author = reader in science
21
?? Question ??
Indicate the differences between reality
and that simplified, ideal schemeof the information flow.
22
?? Question ??
Which basic problems/difficulties hinder people
to find / access / use information?
23
Information retrieval: basic difficulties (Part 1)
• In many cases it is not completely clear to the user of an information retrieval system which information is in fact needed, required.
• In many cases the need for information cannot be expressed completely in the form of a query.One of the reasons is that the complete context of the
information need should ideally be expressed, including the knowledge and background of the searcher.
24
Information retrieval: basic difficulties (Part 2)
• Computer systems are artificial, but nevertheless most use human language in their interface with the human users, for instance in database search systems. This may cause difficulties related to language and vocabulary in particular. Some examples:• People use different languages and different terms (vocabularies)
to describe a similar concept. • Concepts, vocabularies and meanings of words and terms may
change over time.• Meanings of words / terms may depend on their context.
25
Information retrieval: basic difficulties (Part 3)
• Many different and imperfect retrieval systems should or must be used.»To retrieve and access the information that is in principle
available, many different retrieval systems must be available and be mastered.
»Furthermore, a perfect information retrieval software does not (yet) exist; scientific and technological evolution is fast in the domain of information retrieval software since about 1970.
26
Information retrieval: basic difficulties (Part 4)
• Information overloadUsers are often overwhelmed
by the amount of available information and by the large influx of new information.
27
Information retrieval: basic difficulties (Part 5)
• The price (or inaccessibility) of particular informationA lot of information cannot be obtained or at least not free
of charge.
28
The information industry and the information market
The components of the information industry
29
The components of the information industry
• Authors• Publishers• Distributors• Users
• Related organizations
30
The information industry and the information market
Overview and evolution
31
Increase in the number of scientific and technical serial publications
1
10
100
1000
10000
100000
1000000
1650 1700 1750 1800 1850 1900 1950 2000
32
The information market: growth in the database industry
0
2000
4000
6000
8000
10000
1975 1980 1985 1990 1995
Number oflivingdatabasesNumber ofdatabaseproducersNumber ofvendors
Source: Williams, in: Gale Directory of Databases, 1998.
33
The information industry / market: future trends (Part 1)
• Growth in the production of databases.
• Less analogue / hard-copy production = more digital production, storage, and distribution of information.
• More integration of information types into multimedia and hypermedia.
34
The information industry / market: future trends (Part 2)
• Growth in the number of »producers and distributors, »end-users searching databases
due to easier use and lower costs of information technology
35
Databases and computerized information retrieval
Introduction
36
What is a database?
A database is a collection of similar data records stored in a common file (or collection of files).
37
Types of databases: examples
Examples: The databases that form the basis for »catalogues of books or other types of documents»computerized bibliographies»address directories»a full text newspaper, newsletter, magazine, journal
+ collections of these»WWW and Internet search engines»intranet search engines»...
38
Comparison
Information retrieval: the basic processes in search systems
Information problem
Representation
Query Indexed documents
Representation
Retrieved, sorted documents
Text documents
Evaluation and
feedback
39
Databases and computerized information retrieval
Text retrieval and language
40
Text retrieval and language: a word is not a concept (a)
Problem: A word or phrase or term is not the same as a concept or
subject or topic.
Word
WordConcept
41
Text retrieval and language: a word is not a concept (a’)
So, to ‘cover’ a concept in a search, to increase the recall of a search, the user of a retrieval system should consider an expansion of the query; that is: the user should also include other words in the query to ‘cover’ the concept.
42
Text retrieval and language: a word is not a concept (a’’)
»synonyms!(such as : Latin names of species in biology besides the common names, scientific names besides common names of substances in chemistry…)
43
Text retrieval and language: a word is not a concept (a’’’)
»narrower terms, more specific terms (such as particular brand names);including terms with prefixes(for instance: viruses, retroviruses, rotaviruses,...)
»spelling variations (such as UK English versus US English);possible variations after transliteration
44
Text retrieval and language: a word is not a concept (a’’’’)
»singular or plural forms of a noun (when this is used as a search term)
»(relevant) related terms»various forms of a verb
(when this is used in the query)»broader terms (perhaps)
?? Question ??
Which problems in text retrieval are illustrated by the following sentences?
45
46
Time flies like an arrow.Fruit flies like a banana.
?
Examples
47
Time flies like an arrow.
Fruit flies like a banana.
Examples
48
Time flies like an arrow.Fruit flies like a
banana.
OK!
Examples
49
Text retrieval and language: ambiguity of meaning (a)
• Problem: A word or phrase can have more than 1 meaning.Ambiguity of the meaning of a word is a problem for retrieval. This decreases the precision of many searches.The meaning can depend on the context. The meaning may depend on the region where the term is used.
50
Text retrieval and language: ambiguity of meaning (a’)
• Example of a word:»Pascal the philosopher»Pascal the computer language
Example
51
Text retrieval and language: ambiguity of meaning (a’’)
• Example of sentences:»The banks of New Zealand flooded our mailboxes with
free account proposals.»The banks of New Zealand flooded with heavy rains
account for the economic loss.
Example
52
Text retrieval and language: ambiguity of meaning (a’’’)
Problem: Ambiguity of meaning
may be the cause of low precision.
WordConcept
Concept
53
A word is not a conceptA concept is not a word
Word1
Word2
Word3
Concept1
Concept2
Concept3
A concept cannot be “covered” by only 1 word or term; this may be the cause of low recall of a search.The meaning of many words is ambiguous; this may be the cause of low precision of a search.
54
Databases and computerized information retrieval
Hints on how to use information sources
55
Hints on how to use information sources: overview (Part 1)
• Know the purpose and motivation for each search.• Do not be lazy: search on your own, before bothering
experts with requests for advice.• Plan your search in advance.• Choose the best source(s) for each search.• Use the available tools for subject searching well.• Try to cope with the language problems;
avoid spelling errors in your search query; use spelling variations in your search query
56
Hints on how to use information sources: overview (Part 2)
• Match your search strategy with the type of source.• Work cost-effectively.• Use special care when searching for names.• Be specific.
Avoid broad searches.Limit your search to a specific country or region if required.
• Work iteratively.• Keep a record of your work.
57
Hints on how to use information sources: overview (Part 3)
• Do not only focus on a single source. • Consider citation indexes besides subject-oriented
databases, as useful secondary information sources.• Stop searching when “enough is enough”• Give up if necessary... (Not all questions have an answer.)• Be critical: not all information is correct or useful.
58
Hints on how to use information sources: overview (Part 4)
• In computer-based retrieval systems, consider applying»truncation of search terms (using a symbol like * or ?)»combine search terms, using
—Boolean operators: OR AND / + NOT / AND NOT / -
—proximity operators (for instance “NEAR”)
—phrase searching (“word1 word2”)»searching limited to a field (for instance URL, title…)
59
Hints on how to use information sources: subject searching
• When you search for information on a particular topic/subject: investigate if the database producer offers »a subject classification scheme and/or »a controlled/approved/accepted subject terms, and/or»a subject thesaurus
• Exploit these, if they are available.• In most cases you should find and use
synonyms and narrower terms• Use broader and /or related terms, if appropriate.
60
Hints on how to use information sources: Boolean combinations
Most text search systems understand the basic Boolean operators:
OR = obtain records that contain one or both search terms
AND = obtain records that contain both search terms
NOT= exclude records that contain a search term
61
Hints on how to use information sources: Boolean combinations
In the case of computer-based information sources, use Boolean combinations of search terms when appropriate and when possible.
term x1OR term x2ORterm x3
term y1OR term y2OR term y3
term z1OR term z2OR term z3
AND AND AND ...
62
Hints on how to use information sources: Boolean queries
Most text search systems understand the basic Boolean operators typed in capital characters:
ORAND
63
Hints on how to use information sources: default Boolean operator
• Find out if there is a default implicit Boolean operator working in the search system that you use.
• This works even when no operator is used explicitly among words.
• This can be OR, AND, NEAR...
64
?? Question ??
How many (and which) concepts/facets do you see in a search for
“general reviews about
monitoring seawater pollution that is due to effluents in Tanzania”?
65
!! Task - Assignment !!
Prepare off-line, on paper, a suitable search query in a generic format, to find
“general reviews about
monitoring seawater pollution that is due to effluents” as the basis for later, concrete searches in databases.
(Limit yourself to 1 of the concepts.)
66
?? Question ??
What did you learn from the exercise
on the formulation of a query?
67
Hints on how to use information sources: work iteratively
Work iteratively = search, investigate your results, refine your search, search again, and so on; do not try to find everything in 1 step, with 1 search.
Results
Query Searching
Feedback
68
Hints on how to use information sources: work iteratively: example
When you search a database with subject keywords from a controlled list, added to each record:1. Search with search terms that you know2. Investigate the results and select good, relevant items3. Look for the keywords added to these items4. Select the good, relevant keywords5. Formulate a new search with these keywords added 6. Execute the new search7. Repeat the procedure
69
“The ability to ask the right question is more than half the battle of finding the answer.”
Thomas J. Watson
?
70
Hints on how to use information sources: when to stop searching?
Develop a feel for the “curve of diminishing returns”:If you spend too much time, effort, and/or money
with too few benefits, you should stop.
time / effort / money
payoffTime to stop?
71
Knowledge organisation: classifications, and thesaurus systems
Introduction
72
• To organise knowledge / documents / books / reports / information / data / records / things / items / materials for more efficient storage and retrieval, some related, similar tools / systems / methods / approaches are used.
• Often but not yet always, this process is assisted by a computer system.
• Good systems are expanded and updated when the need arises.• The organization system applied should ideally be clearly and
immediately visible or even searchable on computer, by the user of the materials.
Knowledge organisation: introduction
73
Knowledge organisation: classifications, and thesaurus systems
Classifications
74
• Universal means here: covering all subjects• Not just one but several competing systems exist. Examples
»Universal Decimal Classification = UDCused mainly outside U.S.A.
»Dewey Decimal Classification = DDCused mainly in U.S.A.
»Library of Congress Classificationused mainly in U.S.A.
»...
Classification systems: examples of universal systems
Examples
75
Knowledge organisation: classifications, and thesaurus systems
Thesaurus systems
76
Thesaurus: description
• Thesaurus (contents) = »system to control a vocabulary
(= words and phrases + their relations)»+ the contents of this vocabulary
• Thesaurus program = program to create, manage, modify and/or search a
thesaurus using a computer
77
Thesaurus relations
Term(s) with broader meaning
BT (= Broader Term)
RT (= Related Term) UF (= Use(d) For)Other term(s) Term Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning
78
!! Task - Assignment - Exercise !!
Try to find suitable search terms to retrieve documents on “pollution”from a database on marine science, by using for instance the thesaurus
included in the program for word processing that you use.
79
Knowledge organisation: classifications, and thesaurus systems
Classification systems versus
thesaurus systems
80
Knowledge organization:classifications versus thesauri
• Classification»Good for placement of documents in a library (because
documents on many related subjects can be kept together)»Not well suited for computer searching (too complicated)
• Thesaurus»Not suited for placement of documents in a library
(because documents with related subjects would NOT be kept together)
» Well suited for computer searching (relatively simple alphabetic listing of keywords)
81
Computer networks, data communication and Internet
Introduction
82
Computer networks: summary
The following gives an overview of computer networks and data communication:»The basic principles»Local area networks»National computers networks»International computer networks»The Internet»Future impact of digital communication networks
83
Computer networks: prerequisites
Before using computer networks, you should ideally have some knowledge and skills related to
• computer hardware• computer software
84
Data communication: a definition
• Interpersonal communication »Telecommunication
—Broadcast—Telephone—Data communication
–Remote login–File transfer–Hypertext transfer–Electronic mail–...
85
01
Digital information
Multimedia / Hypermedia
Data communication: which types of ‘data’?
Linear textHypertext
Static imagesVideo
Sound
Programs for computers
86
Data communication: which types of ‘data’?
• The same types of data (information) that can be stored and managed on a computer can be transferred over computer networks to one or several other computers.
• So the networks form an important extension of the stand-alone computers.
• “The network is the computer”
87
Data communication: applications (Part 1)
• Hard-copy transfer (Fax)• Online use of the processing power of a remote computer• Online access to information sources !
»library catalogues, »bookshop catalogues, »publisher’s catalogues, »campus-wide and community information systems, »(text or multimedia) databases, »network-based journals, ...
88
Data communication: applications (Part 2)
• Software-downloading• Electronic mail from a person to one or several persons• Computer-network based interest groups • Online talking / chatting (IRC,...)• Video conferencing (Cu-seeme, ...)• Selling, shopping, buying,..• ...
89
Data communication: modems
• description: MODulator-DEModulator: device to convert digital data signals into a suitable form for transmission along a telecommunications channel, and to convert them back upon receipt into machine readable form.
• types»(Acoustic coupler)»Free standing box»Board/card to plug-in
microcomputer
90
Computer network protocols: definition
• When 2 computer systems communicate via network, they do that by exchanging messages.
• The structure of network messages varies from network to network.
• Thus the message structure in a particular network is agreed upon a priori and is described in a set of rules, each defined in a protocol.
91
Computer networks, data communication and Internet
Local Area Networks
92
Data communication with a server in a Local Area Network
• (Terminal)
• Microcomputer with serial line communications software /terminal emulation software
• Microcomputer with network card and network software
Network Network serverserver
93
LAN software packages for heterogeneous networks: examples
Based on TCP/IP (protocol suite used in Internet)• For DOS:
NCSA (= National Center for Supercomputing Applications) CUTCP, PC/NFS,...
• For Windows 3.x: PC/NFS, PC/TCP, Trumpet TCP Manager,...
• For Windows 95, 98,...: included!• For Windows NT, 2000,...: included!
Examples
94
Computer networks, data communication and Internet
National Wide Area Networks
95
National Wide Area Networks
• Public access national packet switching networks
• Research computer networks
• Public access made available by Internet Service Providers
• ...
96
National research computer networks: examples
• Belgium: BELNET• Finland: FUNET• Germany: DFN• The Netherlands: Surfnet• United Kingdom: JANET (Joint Academic Network)• ...
Examples
97
Computer networks, data communication and Internet
International computer networks
98
International computer networks: examples
• National public data communication networks linked together
• FidoNet• Bitnet / EARN• Usenet• Internet!• ...
Examples
99
Computer networks, data communication and Internet
The Internet data communication network
100
?? Question ??
What is the Internet?
101
@
The Internet data communications network (Part 1)
• “Internet” is not well-defined.
• A network of smaller networks:The global collection of interconnected local area, regional and wide-area (national backbone) networks which use the TCP/IP suite of data communication protocols.
102
The Internet data communications network (Part 2)
• Links computers of various types.
• Is constantly growing.
• The analogy of a superhighway has been used to describe the emerging system of networked computers.
• The Internet has no owner, and is not managed by one organization. @
103
The Internet: access from your Local Area Network
Your microcomputer
Local Area Network (LAN)
One of the national networks
The global Internet
104
Host computers in the Internet: definition
• A host (computer) is a domain name that has a unique IP address record associated with it.
• Could be any computer connected to the Internet by any means.
• For instance: www.vub.ac.be
@
105
Transmission Control Protocol / Internet Protocol (TCP/IP)
• the main suite of transport protocols used on the Internet for connectivity and transmission of data across heterogeneous systems
• “glue that holds the Internet together”• an open standard• available on most Unix systems, VMS and other
minicomputer systems, many mainframe and supercomputing systems and some microcomputer and PC systems
106
Internet: addresses of computers with the Domain Name System
• Internet style = Domain name system• The Internet naming scheme consists of a hierarchical
sequence of names from the most specific to the most general (left to right), separated by dots.
computer.subdomain.domain.(country if not USA) OR n1.n2.n3.n4
where n is a natural number (8-bit)
107
Internet: growth in number of hosts worldwide: linear plot
0
5000000
10000000
15000000
20000000
1993 1994 1995 1996 1997 1998January of each year
108
Internet Service Provider= ISP
Internet Service Providers provide their clients access to Internet + in many cases»an email address / server»space for a web site»software tools to start»training»technical support»an accessible location for a WWW site of the client»assistance with WWW site design and promotion
109
Microcomputer -- external computer: some ways of data communication
Modem
External computerGateway computer system
Private/academic data comm. network (e.g. Internet)
Intern Extern
Local PAD
Leased, fixed communication line
Tele-phone
Public data comm. network
Voice telecommunication network
LAN
TelePAD
ISDN
Micro-computer
110
Online communication: remote login and file transfer
Remote terminal log-in / access
111
Remote terminal log-in / access: definition
The ability to access a computer from outside a building in which it is housed.
This requires communications hardware, software, and actual physical links,although this can be as simple as common carrier (telephone) lines or as complex as telnet login to another computer across the Internet.
112
Online communication: remote login and file transfer
Telnet in the Internet
113
Telnet: description
• The Internet standard protocol for remote terminal connection service; on top of the TCP/IP protocol suite
• Allows a user at one site to interact with a remote timesharing system at another site as if the user's terminal was connected directly to the remote computer
• Includes VT100 terminal emulation
114
Online communication: remote login and file transfer
Downloading and file transfer
115
Data communication: downloading by copying a fragment
Capturing a small fragment of the information displayed: 1. select information on the display, 2. copy, and 3. paste in a document managed by another program.
116
Online communication: remote login and file transfer
File transferftp in the Internet
117
Data communication: file transfer
• Copying + downloading / transfer of a whole file• Requires a transfer protocol with error correction
118
World-Wide Web = WWW
Introduction
119
The World-Wide Web:prerequisites
Before using the WWW you should ideally already have learned to understand and to use
• computer hardware• computer software• the Internet• older methods for online communication, such as telnet
120
The WWW: example of a welcome page
Example
121
URL = Universal Resource Locator
• = draft standard for specifying an object on the Internet• the structure is in most cases
protocol://computer_address[/path_name/file_name]• examples:
»telnet://biblio.vub.ac.be»ftp://ftp.vub.ac.be/»gopher://gopher.vub.ac.be/»http://www.vub.ac.be/BIBLIO/index.html»news://news.server.edu/comp.infosystems.www
122
URLformat / structure
1. The first part of a URL, before the colon “:”, specifies the access method = protocol
2. The second part of the URL, after the colon “:”, is interpreted specific to the access method. In general, two slashes after the colon indicate a machine /computer name.
123
?? Question ??
What is the difference between Internet and the World-Wide Web?
124
The WWW is an application of Internet
• The World-Wide Web (WWW) is a service, an application of Internet.
• It is based on the Internet infrastructure. • So the WWW is newer than the Internet.
The concept of the WWW was created at the end of the 1980s when the Internet was already well established.
125
The WWW is an application of Internet: scheme
Data communication
Internet
WWW
126
The WWW: the essential elements
• Information delivery and access using hypertext/hypermedia documents/objects»html documents»http protocol: http clients http servers
• Integration of protocols in the Internet:»http servers offering html documents including links to
other http servers, telnet servers, ftp servers, nntp servers, gopher servers...
127
Computer 1
The WWW: hyperlinks
Hyperlinks can link a part of a hypermedia document to• another part of the same document file• another document file on the same server computer• another document file on a server computer located
elsewhere in the world
Computer 2
128
The WWW: hypertext mark-up language = HTML
• Hypertext mark-up language = HTML = the system of codes used by authors to build the hypertext-pages/files in WWW, for instance to create a title or an anchor.
• The codes are invisible / transparent for the user / reader.
129
The WWW: hypertext transfer protocol = HTTP
• Hypertext transfer protocol = HTTP = the software conventions used by client and server programs for WWW to request and transfer hypermedia documents.
• The protocol must not be known by he user / reader = the protocol is invisible / transparent for the user.
• Analogous with the telnet, ftp and gopher protocol.
130
?? Question ??
Briefly compare TCP/IP and HTTP.
131
The WWW: pages and forms
• PagesMany documents developed for WWW are kept small and
are named “pages”.These often refer to several other “pages”.
• Forms = gateways to services and databases on server computers in WWW Some pages contain electronic forms, to be filled in by the
user.
132
The WWWapplications
Analogous to gopher applications:• Access to online public access catalogues• Campus-wide information systems• Access to subject-oriented information• Access to computer file archives• Traveling / navigating through the Internet
via linked html-pages• Access to intranets within institutes / companies
133
World-Wide Web = WWW
WWW client programs
134
WWW: client / browse programs
• To access the WWW, you run a browser program. • The browser reads documents, and can fetch documents
from other sources. Information providers set up hypermedia servers which browsers can get documents from.
• The browser can display hypertext documents. Hypertext is text with pointers to other text. The browsers let you deal with the pointers in a transparent way: select the pointer, and you are presented with the text that is pointed to.
135
WWW: examples of browsers for your own computer
Browsers are available for many computer platforms; in particular: browsers for Windows + Winsock:»Netscape»Microsoft Internet Explorer»...
136
?? Question ??
Which client program do YOU use or will YOU use
to access the WWW?
137
!! Task - Assignment - Exercise !!
Browse the WWW, using an available
browser client program.
138
!! Task - Assignment - Exercise !!
Visualise the HTML source code of a WWW page,
using a WWW client program.What do you learn from this exercise about the basic properties of HTML?
139
!! Task - Assignment - Exercise !!
Exploit the possibility to open more than one window, using a WWW client program
in Windows.
140
?? Question ??
Why would you want to open more than one window
on WWW servers,using a WWW client program?
141
World-Wide Web = WWW
Saving information from a web
142
WWW: How to save information from a web?
Information displayed by your web browser/client program can be saved,
• by select, copy, paste in another document (and save)• by saving a complete page to your disk
»in separate files (for instance 1 HTML file + some image files)
»in 1 file, using Microsoft Internet Explorer 5 or a later version• by copying the information into an e-mail message that you
send to your own e-mail account
143
!! Task - Assignment - Exercise !!
Copy some text fragment from WWWand paste it into another document
on your computer.
144
!! Task - Assignment - Exercise !!
Save a text from WWW to disk, as HTML,
using a browser program.
145
!! Task - Assignment - Exercise !!
Display an HTML file that you have saved
from the WWW to your disk,in a program for word processing.
Is the file displayed properly?
146
World-Wide Web = WWW
The success of WWW
147
WWW: growing number of WWW servers
01000000200000030000004000000500000060000007000000
1993 1994 1995 1996 1997 1998 1999 2000
148
WWW as popular method to access information from computers
• The WWW has quickly become the most popular medium to access information that resides on various computers that are connected to a computer network.
149
Online access information sources and services
Introduction
150
Online information sources: summary
• The following gives a general overview of online accessible information sources.
• This overview is not limited to or focusing on a particular concrete subject domain/area.
151
Online access to information: avoid network traffic jams
To access from Europe online information sources in the US, work when lines are not saturated.
(better in the morning than in the afternoon)
152
Internet based information sources: problems / difficulties (Part 1)
• Redundancy and overlap:On the one hand, there is too much information on some topics; in other words, the redundancy and overlap are high in many cases. Too few information sources: On the other hand, there are too few information sources on some topics.
153
Internet based information sources: problems / difficulties (Part 2)
• No order is imposed on most sources.Quality checks / quality controls are not performed.Related to this: it is not required to register new information offered. Is the information that you find real, honest, authentic?
154
Internet based information sources: problems / difficulties (Part 3)
• Change is the only constant: Information sources are constantly changing, growing, but sometimes disappearing.
155
Internet based information sources: problems / difficulties (Part 4)
• Scattering: There is no single simple but powerful system to find relevant information through the Internet.In other words: integration / aggregation is still far from perfect.
156
Internet based information sources: problems / difficulties (Part 5)
• Slow: The Internet is in many places and for many applications not yet fast enough.
157
Internet based information sources: problems / difficulties (Part 6)
• In conclusion: Surfing, using the Internet, the WWW, can be a time sink instead of a productive activity.
158
Internet based information sources: how many? how much information?
• More than 10 million WWW sites (in 2003)
• More than 2000 million (= 2 billion) unique URLs in the total Internet (in 2002)
• More than 10 terabyte (= 10 000 gigabyte) of text data (in 2001)
159
Online access information sources and services
Types of online access information systems
160
Types of online access information systems: “free” versus “fee”
Public access information sources free of charge
Fee-based online information services(NOT free of charge)
161
Online access information sources and services
Dictionaries and encyclopaedias accessible through the WWW
162
Dictionaries and encyclopedias through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among many types of information sources, »when we do not need detailed information on a common
topic»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling, synonyms, context,…
• Some dictionaries and encyclopedias are available through the WWW free of charge.
163
Dictionaries accessible through Internet and the WWW: example
• The American Heritage® Dictionary of the English Language»Over 200,000 entries,
70,000 audio word pronunciations, 900 full-page color illustrations
»Available free of charge from http://education.yahoo.com/reference/dictionary/
Example
164
Dictionaries accessible through Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched simultaneously and free of charge: http://www.onelook.com/
Example
165
Encyclopedias accessible through Internet and the WWW: examples
• Encarta Concise Free Encyclopedia »http://encarta.msn.com/»Available in English and in some other languages
Example
166
Encyclopedias accessible through Internet and the WWW: examples
• Encyclopædia Britannica only a small part is available free of charge + links to selected WWW sites»http://www.britannica.com/
• Encyclopædia Britannica Concise»http://education.yahoo.com/reference/encyclopedia/
Example
167
Encyclopedias accessible through Internet and the WWW: examples
• The Canadian Encyclopedia(in English and in French):»http://thecanadianencyclopedia.com/
Example
168
Encyclopedias accessible through Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:http://www.internetoracle.com/encyclop.htm
• Other lists of encyclopedia on Internet can be found as a part of more general directories of Internet-based information sources.
Example
169
Online access information sources and services
Internet directories and indexes
170
Internet: meta-information about Internet information sources
• in printed manuals and guides:- it is not always possible to get a copy fast- it costs money to get a copy- they are soon out of date
• offered on the WWW!:+ directly available when we want to use the Internet+ many systems are accessible free of charge+ most systems are regularly updated
• (“intelligent agent” software on client PC)
171
Internet: subject-oriented meta-information offered via WWW
Information about information sources: in the form of»subject guides = texts with references»subject hypertext directories = subject guides»key word indexes, generated automatically, for searching»collections of links or forms to the above»(multi-threaded search systems)
172
Internet global subject directories:introduction
• They are virtual libraries with open shelves, for browsing.• They are manually generated, man-made by many
people.• They can be browsed following a tree structure or a more
complicated variation.• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!
173
Internet global subject directories: structure
The structure corresponds to a classification that is in most cases specific for the particular overview. In other words: the well-known and classical universal classification systems are not used in most Internet directories.
174
Internet global subject directories: pros and cons
• They cover a small number of selected WWW sites, in comparison with the total number of sites that are accessible.
+ The selected, included sites should be better than average. - They are not suitable for deep, detailed, specific searches
with a high coverage.
175
Internet global subject directories:why use one?
• They are suitable mainly for broad searches that can be difficult to formulate in words, but NOT for more specific searches that require combinations of several concepts.
176
Internet global subject directories:searching directories with a query
• Many of the Internet directories include an index to search their contents with a query.
• However, then the assisting classification structure is not well exploited and the user should be aware of the problems and difficulties of information retrieval with natural language queries.
• Furthermore, the possibility to use the system in this way may be confusing, as these directories are not real full-text Internet indexes, like those provided by other search tools.
177
Internet global subject directories: Yahoo!
• A hypertext global subject directory can be found at http://www.yahoo.com/
and at many other sites, includinghttp://www.yahoo.co.uk/
• Entries are NOT rated.• Accessible free of charge.
178
Internet global subject directories: Google directory
• A hypertext global subject directory can be found athttp://directory.google.com/
• Accessible free of charge.• Based on the Netscape DMOZ
Open Directory Project.• Do not confuse this with the famous Google WWW search
engine.
179
Internet global subject directories: Open Directory Project
• A hypertext global subject directory can be found athttp://www.dmoz.org/
• The contents is also used in other systems,such as Google Directory and Webbrain.
• Accessible free of charge.
180
!! Task - Assignment - Exercise !!
Try to find Internet sourceswhich are relevant for you, by using an Internet-based
global subject directory.
181
Internet local subject directories: examples in Belgium
• http://yellow.advalvas.be/weblist.html• http://search.msn.be/
• The guide developed by the public libraries in Flanders: http://www.bib.vlaanderen.be/webwijzer
182
Internet indexes:automated search tools
• Several systems allow to search for and to locate many items (addressable resources) in the Internet in a more systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers through the real Internet in real time and completely when a user makes a query. Searching in that way would be much too slow due to limitations in the technology.
183
Internet indexes: scheme of the mechanism
User searching for Internet based information
Internet client hardware and software
user interface to a search engine Internet information source
Internet index search engine Internet crawler and indexing system
database of Internet files, including an index
184
Internet indexes:description of the mechanism
Each of these search systems is based on:• a database of links to pages / URLs that can be retrieved by
searching with queries through a big index that is built machine-made on the basis of the contents, the texts, of these pages(to build this database and to keep it up to date, pages are continuously collected from the Internet by a “robot” computer software system)
• a search system with a user interface in a WWW form, to allow the user to search through that database
185
Internet indexes:AltaVista
• The primary search interface can be found in the US. The following addresses all lead to the same information:»http://www.altavista.com/»http://www.av.com/»http://av.com/
• Mirror site in UK:»http://uk.altavista.com/»http://www.altavista.co.uk/
186
Internet indexes:AltaVista: features
• Allows full text searching of the WWW• Offers relevance ranking of search results• Allows also advanced Boolean searching
(in “Advanced” mode)• Offers a link to an Internet subject directory (Looksmart)• Offers links to systems to find
images, sounds… (multimedia) in the Internet
187
Internet indexes:All the Web
• The search interface can be found at:http://www.alltheweb.com/http://alltheweb.com/
• You can search the WWW and ftp servers.• The database is one of the biggest.• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.• Offers also a module to search for pictures/images.• Offers spelling suggestions in the search interface.
188
Internet indexes: Google (Part 1)
• http://www.google.com/• Full-text searching is possible of many files that are
available through the WWW.• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats, such as »Adobe PDF, »Microsoft Word, Microsoft Excel, Microsoft PowerPoint »Rich Text Format…
189
Internet indexes: Google (Part 2)
• One of the most popular systems in 2001, 2002, 2003…• For retrieval an algorithm is used that takes into account
the links between WWW pages.A retrieved page is ranked higher when »many sites/pages point to it»“important” sites/pages point to it
• Some other famous search systems are based on Google such as Netscape Search and the WWW searches of Yahoo! (at least in 2003).
190
Internet indexes: Google computer servers
• Google uses a system of more than 10 000 small computer servers to offer it’s information services.
191
Internet indexes: Google additional features
• Besides a system to search for WWW pages, Google offers also »a subject directory»searching for images/pictures on the WWW»searching an archive of Usenet messages +
posting to Usenet groups»searching for news
• Thus Google has become a great integrator / aggregator.
192
Internet indexes: coverage
• Internet indexes do not cover all static documents on the WWW.
• Most indexes grow and their “size ranking” is variable.• If exhaustive results are desired, then more than one
Internet index search system should be used.
193
Internet indexes: coverage and size of each index
• Most indexes grow and their “size ranking” is variable.• The biggest systems in 2003:
» Google !» AltaVista» All the Web (serving also Lycos)» Systems based on the INKTOMI database of WWW
pages.
194
!! Task - Assignment - Exercise !!
Try to find Internet sourceswhich are relevant for you, by using an Internet index.
195
Internet information sources
Coverage of Internet directories and Internet indexes
A global Internet indexA global Internet directory
196
Global Internet search tools: a comparison
Global Internet directories
• Only a limited selection of Internet sources
• Browsing information sources is easy
• Good for broad searches
Global Internet indexes
• About 1/3 of the Internet is covered by an index
• Searching requires some skills and knowledge
• Good for specific, narrow searches
Multi-threaded search systems
• These get information from directories and indexes
• Searching requires some skills and knowledge
• Good when even 1 index does not yield information
197
Internet: who owns the search tools?
In 2003:• The company Yahoo! owns
»the most famous global Internet subject directory»3 (!) Internet full-text search engines:
All the Web, AltaVista, Inktomi• The company Google owns
»the most famous Internet full-text search engine»one of the best Internet image search engines»a gateway to old and new Usenet news messages
198
Online access information sources and services
Public access book databases
199
Public access book databases: introduction
• Even in this age of Internet-based information sources, a lot of information is still distributed in the form of printed books.
• The contents of most books is (still) not available on the Internet.
• Most general Internet search tools do NOT allow you to find out about the existence of books that may be interesting for you.
• So, specific search tools to find books can be useful.
200
Public access book databases: an overview
• (Databases by publishers.)• Fee-based databases by commercial providers• Databases by book distributors / bookshops!• Online public access catalogues of
»local libraries,»national libraries (which produce and offer normally their
national bibliography)!»big, famous libraries!!
• (Databases of computer-based versions of books.)
201
Public access book databases: which one to use?
• For years, the market of bibliographic information on books was limited to the services and databases of subscription-based bibliographic providers.
• Nowadays, the WWW provides a key to unlock many possibilities to find bibliographic information.
• Which book database should be preferred for particular applications is not clear for most librarians or end-users.
202
Public access book databases by commercial producers
• To find currently available books, some databases assembled by commercial producers can be interesting.
• Example: Global Books in Print• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents with subject terms…
• However, access to such a database is not free of charge and can be expensive (in comparison with alternatives).
203
Public access book databases provided by bookshops
• To find currently available books, the bibliographic databases assembled by big bookshops are interesting.
• Several offer a good coverage and are accessible free of charge.
• The added price information can be useful for the acquisition and accounting department of a library or if an individual user wants to buy a book.
• Some provide a current awareness service, also free of charge.
204
Book databases accessible free of charge: examples in U.S.A.
• Amazon.com (US):http://www.amazon.com/ http://www.amazon.co.uk/ note: amazon, NOT amazoneSubject description is poor.
• Barnes and Noble (US):http://www.bn.com/
Examples
205
Free public access bibliographic book database + price comparisons
• Even comparisons of the catalogues of shops of books (as well as of music, movies and many other goods) are available free of charge.
• See for instance»http://www.bookfinder.com/»http://www.dealtime.com/
206
!! Task - Assignment - Exercise !!
Search for titles of bookswhich are relevant for you,
using an online database provided by a book publisher or bookshop.
207
Online Public Access Catalogues of libraries
• Mainly to find older books, the catalogues of libraries can be useful.
• Most are accessible online and free of charge.
208
Online access information sources and services
Fee-based online public access information services
209
Types of online access information systems: “free” versus “fee”
• A lot of the information on the Internet is available free of charge, but another part is only accessible when a fee is paid to the producer and / or the distributor.
• The first commercial computer systems that make information available online were born around 1975. Most of them are now also available through the Internet.
• Some organisations pay these fees for some sources and then organise access, so that the members of the organisation can retrieve and exploit the information as if it is free of charge.
210
Types of online access information systems: “free” versus “fee”
Public access information sources free of charge
Fee-based online information services(NOT free of charge)
211
Types of online access information systems: “free” for members only
Public access information sources free of charge
Fee-based online information services(NOT free of charge)
Fee-based online information services, made accessible “free of charge”
by an institute to its members
212
Online information services:total size of their databases
In 1999:The big host systems and the public access WWW pages offer a
comparable quantity of information:• WWW offered about 8 terabytes (= 8 000 gigabytes) of text data(according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.)
• Dialog offered about 9 terabytes (= 9 000 gigabytes) (in 1998)»6 billion pages of text»3 million images
213
Online access information sources and services
Online access databases about journal articles
214
Online access databases about journal articles: overview
• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers.
• Many publishers offer searchable bibliographies, but only of their own publications. (for instance Emerald, Elsevier)
• Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.
215
Online access databases about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic database, NOT full-text, (Journal articles, journal issues, books, reports, conferences, doctoral dissertations) at the Institut de l'Information Scientifique et Technique, France.
• Does not offer usage of classification or thesaurus.• Searching is free of charge.• Available from http://form.inist.fr/public/eng/conslt.htm• Payment is required to receive the full text of an article.
216
Online access databases about journal articles: Ingenta (1)
• Ingenta Journals allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts.
• Searching is free of charge.
217
Online access databases about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.• Available from
»http://www.ingenta.co.uk/»http://www.ingenta.com/
• Ingenta has acquired Uncover in 2000.
218
Online access databases about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a bibliographic database of the articles of more than 20 000 journal titles and conference proceedings, NOT full-text.
• Available from http://www3.infotrieve.com/• Payment is required to receive the full text of a document.• Current awareness services are also offered free of
charge: the table of contents of new issues of the journals that you have selected are sent to you by email.
219
Online access databases about journal articles: Scirus
• This is a specialised Internet index that allows you to search for selected scientific information (only) on the WWW. This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only when a fee has been paid to the publisher
• The search interface: http://www.scirus.com
Example
220
Online access databases about journal articles: Scirus features
• Offered free of charge by Elsevier.• Is partly based on the Fast WWW search system that is
also used by Alltheweb.• Offers access to information ordered according to some
classification system / taxonomy.
Example
221
Online access information sources and services
Finding multimedia files on the Internet
222
Finding multimedia files on the Internet: introduction
Several public access search systems are available free of charge, to search the Internet for multimedia files: »images / pictures (either artwork, either photos, or both)»sound / audio files (music, speeches...); video
223
Finding images on the Internet:introduction
• Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet.
• When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).
224 Examples
Finding images on the Internet:screen shot of a Google image search
225Examples
Finding images on the Internet:examples of search engines (1)
• http://alltheweb.com/ !!• http://gallery.yahoo.com/ !• http://images.google.com/ !!!
or through http://www.google.com/The largest database in this category (at least in 2002, 2003). For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.
226Examples
Finding images on the Internet:examples of search engines (2)
• http://multimedia.lycos.com/• http://www.altavista.com/ !!
(also audio and video, choose not the normal text search, but IMAGES in the user interface.)
227
!! Task - Assignment - Exercise !!
Use a specialised search engineto find images
about a particular subject on the Internet.
228
Online access information sources and services
Evolution and future trends
229
Online access information: evolution and future trends
• An increasing amount of information becomes available online.
• A growing amount of this online information becomes available free of charge.
• The quality and ease of use of software on server as well as client is growing.
A consequence is:• An increasing number of end-users searching for
information online.
230
Online access information: easier and more complicated?!
• At the same time, information retrieval becomes both easier and also more complicated.This may seem strange and contradictory, but it is reality.
This is a paradox.
231
Online access information: easier information retrieval systems
• Individual information retrieval systems become easier: »they react faster; »they can provide access to more data/information in one
action;»their user interfaces are simple,
but more sophisticated, intelligent retrieval algorithms can nevertheless deliver satisfactory results in most simple cases.
232
Online access information: more complicated information market
• The whole information landscape consists of more and more decentralised information sources, each one bringing an individual user interface that should be mastered. Making the right, ideal choice among the sources becomes not easier, perhaps even more complicated every day.
233
Online access information: more complicated information market
• Furthermore, for many sources the accessibility / availability, the user interface, the interlinking, depend on the organisation in which the searcher is active.
234
Online access information: conclusion
• In the case of simple information needs, the WWW and the search tools can work like “magic”.
• However, in the case of more complicated information needs, there is still is no “magic button” that brings you immediately to all the required information.
235
Evaluating the quality of information
Documentary information sources: evaluating their quality
236
Documentary information sources: evaluating their quality
• We should always be critical when using information sources, in view of »the widely varying degrees of quality of information
sources, and of»the costs associated with searching, finding, using
information.
237
Documentary information sources: evaluation criteria (1)
• Is the information valid, reliable, trustworthy, genuine, authentic? Is the author honest? Is the source objective, not subjective, without cultural or political or ideological or commercial bias? Is the origin an individual or a company or an organisation?Is the publication sponsored by some company or organisation?
238
Documentary information sources: evaluation criteria (2)
• Is the information accurate, correct? Who is the author or producer? Has the source an author or a producer with a high expertise, a good reputation, good qualifications?Can the author be contacted for clarification or discussion?
Was the information reviewed, edited, improved, corrected, censored, approved, verified, before publication? Do experts agree on the information provided?
239
Documentary information sources: evaluation criteria (3)
• Is the information source unique? Does it offer a great amount of primary information, which is not obtainable from other sources?
• Is the information complete? Is the work available in its entirety?
• Does the source offer a wide coverage? Is the source comprehensive, substantive?
• Is the information current enough, up to date? Is a publication date provided?Is an expiration date provided?
240
Documentary information sources: evaluation criteria (4)
• Does the document provide suitable references, so that you can verify statements and find older suitable information sources?
• Good clear format and lay-out of the information / User-friendly information system / Easy for users to orientate themselves within the resource and to find their way around it?
• Good user support / Good customer support?• Is the type of distribution medium appropriate?
(print, e-mail, online,...)
241
Documentary information sources: evaluation criteria (5)
• Is the information what you want?If not, then reassess your needs and consider other types of information as well.
242
Documentary information sources: evaluation criteria (6)
• Is the information suitable for your level of understanding of the subject? Is the document popular, suitable for the general public, for students, for professionals, for scholarly/academic use…?Does it report new, primary research (survey, experiment, observation, measurement, invention) or is it a review of sources published earlier?
• Does the information repeat or confirm what you already know, or is it complementary, contradictory, new?