Internet en WWW voor het opsporen van informatie

Post on 22-Feb-2016

39 views 0 download

description

Internet en WWW voor het opsporen van informatie. Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel. februari 2004 VUB-IDLO. The slides are available from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/ (note: BIBLIO and not biblio). - PowerPoint PPT Presentation

Transcript of Internet en WWW voor het opsporen van informatie

1

Internet en WWW voor het opsporen van informatie

Paul.Nieuwenhuysen@vub.ac.beVrije Universiteit Brussel,Pleinlaan 2, B-1050 Brussel.

februari 2004VUB-IDLO

2

The slides are available fromhttp://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/

(note: BIBLIO and not biblio)

3

Planning van de dag: voormiddag

• Over “informatie”• Informatiemarkt• Information retrieval• Thesaurussen

(+ oefenen van query-formulering)• Netwerken en Internet i.h.b.• World-Wide Web (+ oefenen van “browsing” + “saving”)

• LUNCH

4

Planning van de dag: namiddag (deel 1)

• Online toegankelijke informatiebronnen!»Globale Internet directories (+oefenen)»Internet indexes (+ oefenen)»Boek-databases (+ oefenen)»Te betalen databases»Databases met titels van tijdschriftartikels»Vinden van illustraties/beelden/foto’s (+ oefenen)

5

Planning van de dag: namiddag (deel 2)

• Evaluatie van informatiebronnen

• Vrij zoeken volgens eigen interesse, met assistentie

6-Interruptions-Questions-Remarks -Discussions are welcome

7

About “information”

Information concepts

8

The flow of documentary information with primary and secondary sources

Reader /User /

Receiver

Secondary sources / systems: mainlyReference works (printed, CD-ROM, online)

Library catalogues, including OPACs...

Author /Creator / Sender Primary sources / systems: mainly

Journal articles / Books / Electronic mail / Online sources /...

9

The role of secondary information sources

• The secondary information flow is generated on the basis of the primary flow, mainly because the great amounts of primary information lower the chance to retrieve and use the appropriate information item.

• Secondary information tries to bring some order in the great chaos.

10

Various categorisations of documentary information sources

Information sources can be categorised in various ways. For instance:

•Primary

•Secondary

•Hard copy /not digital

•Digital•Offline

•Online

•Text•Image•Sound•Animation/video•Software•Data•Interactive

•Books

•Serials

11

Past

Now

Future

Retrospective searching versus current awareness: scheme

Retrospective searching

Current awareness

12

Information retrieval: evolution of storage and distribution media

• 1450 printing with reusable characters/fonts

• 1975 + online access databasesfrom the 1970s growing Internet

• 1985 + CD-ROM• 1990 + World-Wide Web

(based on the Internet)

13

Information retrieval: end user or information intermediaries

End-user

Information intermediary(Broker or library or ...)

Information

14

End user versus information intermediary

• People can retrieve information themselves, directly as so-called “end-users”.

• However, »the information landscape is complex, »it may cost a lot of the time to find the right information, »it may be costly to search for information

• Therefore it may be wise to obtain the assistance of an expert information intermediary, such a a reference librarian or an information broker.

15

About “information”

Computer- and network-based information

16

Information: from bits to meaningful information

Digitalcomputer data = bits

or01Program code, meaningful for andto be interpreted / executed bya suitable / compatible computer

Information = “documents”, meaningful for andto be interpreted byhuman beings

17

Information: digitally stored and managed information

Categories of digital, computer readable information / data, forming electronic “documents”,understandable by human beings.

01textnumbersimagesvideosounds

multimedia

+

18

01

Digital information

Multimedia / Hypermedia

Information: types of digital information

Linear textHypertext

Static imagesVideo

Sound

Programs for computers

19

Online / Networked

CD-ROM

Update speed

Volume

Some publication media compared

Printed

20

Scientific publishing in Utopia: an ideal scheme

Many authors

Many readers / users

Many editors / publishers

Online remote access multimedia database server

Many database search clients and user interfacesone global ,

international computer data communication network

author = reader in science

21

?? Question ??

Indicate the differences between reality

and that simplified, ideal schemeof the information flow.

22

?? Question ??

Which basic problems/difficulties hinder people

to find / access / use information?

23

Information retrieval: basic difficulties (Part 1)

• In many cases it is not completely clear to the user of an information retrieval system which information is in fact needed, required.

• In many cases the need for information cannot be expressed completely in the form of a query.One of the reasons is that the complete context of the

information need should ideally be expressed, including the knowledge and background of the searcher.

24

Information retrieval: basic difficulties (Part 2)

• Computer systems are artificial, but nevertheless most use human language in their interface with the human users, for instance in database search systems. This may cause difficulties related to language and vocabulary in particular. Some examples:• People use different languages and different terms (vocabularies)

to describe a similar concept. • Concepts, vocabularies and meanings of words and terms may

change over time.• Meanings of words / terms may depend on their context.

25

Information retrieval: basic difficulties (Part 3)

• Many different and imperfect retrieval systems should or must be used.»To retrieve and access the information that is in principle

available, many different retrieval systems must be available and be mastered.

»Furthermore, a perfect information retrieval software does not (yet) exist; scientific and technological evolution is fast in the domain of information retrieval software since about 1970.

26

Information retrieval: basic difficulties (Part 4)

• Information overloadUsers are often overwhelmed

by the amount of available information and by the large influx of new information.

27

Information retrieval: basic difficulties (Part 5)

• The price (or inaccessibility) of particular informationA lot of information cannot be obtained or at least not free

of charge.

28

The information industry and the information market

The components of the information industry

29

The components of the information industry

• Authors• Publishers• Distributors• Users

• Related organizations

30

The information industry and the information market

Overview and evolution

31

Increase in the number of scientific and technical serial publications

1

10

100

1000

10000

100000

1000000

1650 1700 1750 1800 1850 1900 1950 2000

32

The information market: growth in the database industry

0

2000

4000

6000

8000

10000

1975 1980 1985 1990 1995

Number oflivingdatabasesNumber ofdatabaseproducersNumber ofvendors

Source: Williams, in: Gale Directory of Databases, 1998.

33

The information industry / market: future trends (Part 1)

• Growth in the production of databases.

• Less analogue / hard-copy production = more digital production, storage, and distribution of information.

• More integration of information types into multimedia and hypermedia.

34

The information industry / market: future trends (Part 2)

• Growth in the number of »producers and distributors, »end-users searching databases

due to easier use and lower costs of information technology

35

Databases and computerized information retrieval

Introduction

36

What is a database?

A database is a collection of similar data records stored in a common file (or collection of files).

37

Types of databases: examples

Examples: The databases that form the basis for »catalogues of books or other types of documents»computerized bibliographies»address directories»a full text newspaper, newsletter, magazine, journal

+ collections of these»WWW and Internet search engines»intranet search engines»...

38

Comparison

Information retrieval: the basic processes in search systems

Information problem

Representation

Query Indexed documents

Representation

Retrieved, sorted documents

Text documents

Evaluation and

feedback

39

Databases and computerized information retrieval

Text retrieval and language

40

Text retrieval and language: a word is not a concept (a)

Problem: A word or phrase or term is not the same as a concept or

subject or topic.

Word

WordConcept

41

Text retrieval and language: a word is not a concept (a’)

So, to ‘cover’ a concept in a search, to increase the recall of a search, the user of a retrieval system should consider an expansion of the query; that is: the user should also include other words in the query to ‘cover’ the concept.

42

Text retrieval and language: a word is not a concept (a’’)

»synonyms!(such as : Latin names of species in biology besides the common names, scientific names besides common names of substances in chemistry…)

43

Text retrieval and language: a word is not a concept (a’’’)

»narrower terms, more specific terms (such as particular brand names);including terms with prefixes(for instance: viruses, retroviruses, rotaviruses,...)

»spelling variations (such as UK English versus US English);possible variations after transliteration

44

Text retrieval and language: a word is not a concept (a’’’’)

»singular or plural forms of a noun (when this is used as a search term)

»(relevant) related terms»various forms of a verb

(when this is used in the query)»broader terms (perhaps)

?? Question ??

Which problems in text retrieval are illustrated by the following sentences?

45

46

Time flies like an arrow.Fruit flies like a banana.

?

Examples

47

Time flies like an arrow.

Fruit flies like a banana.

Examples

48

Time flies like an arrow.Fruit flies like a

banana.

OK!

Examples

49

Text retrieval and language: ambiguity of meaning (a)

• Problem: A word or phrase can have more than 1 meaning.Ambiguity of the meaning of a word is a problem for retrieval. This decreases the precision of many searches.The meaning can depend on the context. The meaning may depend on the region where the term is used.

50

Text retrieval and language: ambiguity of meaning (a’)

• Example of a word:»Pascal the philosopher»Pascal the computer language

Example

51

Text retrieval and language: ambiguity of meaning (a’’)

• Example of sentences:»The banks of New Zealand flooded our mailboxes with

free account proposals.»The banks of New Zealand flooded with heavy rains

account for the economic loss.

Example

52

Text retrieval and language: ambiguity of meaning (a’’’)

Problem: Ambiguity of meaning

may be the cause of low precision.

WordConcept

Concept

53

A word is not a conceptA concept is not a word

Word1

Word2

Word3

Concept1

Concept2

Concept3

A concept cannot be “covered” by only 1 word or term; this may be the cause of low recall of a search.The meaning of many words is ambiguous; this may be the cause of low precision of a search.

54

Databases and computerized information retrieval

Hints on how to use information sources

55

Hints on how to use information sources: overview (Part 1)

• Know the purpose and motivation for each search.• Do not be lazy: search on your own, before bothering

experts with requests for advice.• Plan your search in advance.• Choose the best source(s) for each search.• Use the available tools for subject searching well.• Try to cope with the language problems;

avoid spelling errors in your search query; use spelling variations in your search query

56

Hints on how to use information sources: overview (Part 2)

• Match your search strategy with the type of source.• Work cost-effectively.• Use special care when searching for names.• Be specific.

Avoid broad searches.Limit your search to a specific country or region if required.

• Work iteratively.• Keep a record of your work.

57

Hints on how to use information sources: overview (Part 3)

• Do not only focus on a single source. • Consider citation indexes besides subject-oriented

databases, as useful secondary information sources.• Stop searching when “enough is enough”• Give up if necessary... (Not all questions have an answer.)• Be critical: not all information is correct or useful.

58

Hints on how to use information sources: overview (Part 4)

• In computer-based retrieval systems, consider applying»truncation of search terms (using a symbol like * or ?)»combine search terms, using

—Boolean operators: OR AND / + NOT / AND NOT / -

—proximity operators (for instance “NEAR”)

—phrase searching (“word1 word2”)»searching limited to a field (for instance URL, title…)

59

Hints on how to use information sources: subject searching

• When you search for information on a particular topic/subject: investigate if the database producer offers »a subject classification scheme and/or »a controlled/approved/accepted subject terms, and/or»a subject thesaurus

• Exploit these, if they are available.• In most cases you should find and use

synonyms and narrower terms• Use broader and /or related terms, if appropriate.

60

Hints on how to use information sources: Boolean combinations

Most text search systems understand the basic Boolean operators:

OR = obtain records that contain one or both search terms

AND = obtain records that contain both search terms

NOT= exclude records that contain a search term

61

Hints on how to use information sources: Boolean combinations

In the case of computer-based information sources, use Boolean combinations of search terms when appropriate and when possible.

term x1OR term x2ORterm x3

term y1OR term y2OR term y3

term z1OR term z2OR term z3

AND AND AND ...

62

Hints on how to use information sources: Boolean queries

Most text search systems understand the basic Boolean operators typed in capital characters:

ORAND

63

Hints on how to use information sources: default Boolean operator

• Find out if there is a default implicit Boolean operator working in the search system that you use.

• This works even when no operator is used explicitly among words.

• This can be OR, AND, NEAR...

64

?? Question ??

How many (and which) concepts/facets do you see in a search for

“general reviews about

monitoring seawater pollution that is due to effluents in Tanzania”?

65

!! Task - Assignment !!

Prepare off-line, on paper, a suitable search query in a generic format, to find

“general reviews about

monitoring seawater pollution that is due to effluents” as the basis for later, concrete searches in databases.

(Limit yourself to 1 of the concepts.)

66

?? Question ??

What did you learn from the exercise

on the formulation of a query?

67

Hints on how to use information sources: work iteratively

Work iteratively = search, investigate your results, refine your search, search again, and so on; do not try to find everything in 1 step, with 1 search.

Results

Query Searching

Feedback

68

Hints on how to use information sources: work iteratively: example

When you search a database with subject keywords from a controlled list, added to each record:1. Search with search terms that you know2. Investigate the results and select good, relevant items3. Look for the keywords added to these items4. Select the good, relevant keywords5. Formulate a new search with these keywords added 6. Execute the new search7. Repeat the procedure

69

“The ability to ask the right question is more than half the battle of finding the answer.”

Thomas J. Watson

?

70

Hints on how to use information sources: when to stop searching?

Develop a feel for the “curve of diminishing returns”:If you spend too much time, effort, and/or money

with too few benefits, you should stop.

time / effort / money

payoffTime to stop?

71

Knowledge organisation: classifications, and thesaurus systems

Introduction

72

• To organise knowledge / documents / books / reports / information / data / records / things / items / materials for more efficient storage and retrieval, some related, similar tools / systems / methods / approaches are used.

• Often but not yet always, this process is assisted by a computer system.

• Good systems are expanded and updated when the need arises.• The organization system applied should ideally be clearly and

immediately visible or even searchable on computer, by the user of the materials.

Knowledge organisation: introduction

73

Knowledge organisation: classifications, and thesaurus systems

Classifications

74

• Universal means here: covering all subjects• Not just one but several competing systems exist. Examples

»Universal Decimal Classification = UDCused mainly outside U.S.A.

»Dewey Decimal Classification = DDCused mainly in U.S.A.

»Library of Congress Classificationused mainly in U.S.A.

»...

Classification systems: examples of universal systems

Examples

75

Knowledge organisation: classifications, and thesaurus systems

Thesaurus systems

76

Thesaurus: description

• Thesaurus (contents) = »system to control a vocabulary

(= words and phrases + their relations)»+ the contents of this vocabulary

• Thesaurus program = program to create, manage, modify and/or search a

thesaurus using a computer

77

Thesaurus relations

Term(s) with broader meaning

BT (= Broader Term)

RT (= Related Term) UF (= Use(d) For)Other term(s) Term Synonym(s)

NT (= Narrower Term)

Term(s) with narrower meaning

78

!! Task - Assignment - Exercise !!

Try to find suitable search terms to retrieve documents on “pollution”from a database on marine science, by using for instance the thesaurus

included in the program for word processing that you use.

79

Knowledge organisation: classifications, and thesaurus systems

Classification systems versus

thesaurus systems

80

Knowledge organization:classifications versus thesauri

• Classification»Good for placement of documents in a library (because

documents on many related subjects can be kept together)»Not well suited for computer searching (too complicated)

• Thesaurus»Not suited for placement of documents in a library

(because documents with related subjects would NOT be kept together)

» Well suited for computer searching (relatively simple alphabetic listing of keywords)

81

Computer networks, data communication and Internet

Introduction

82

Computer networks: summary

The following gives an overview of computer networks and data communication:»The basic principles»Local area networks»National computers networks»International computer networks»The Internet»Future impact of digital communication networks

83

Computer networks: prerequisites

Before using computer networks, you should ideally have some knowledge and skills related to

• computer hardware• computer software

84

Data communication: a definition

• Interpersonal communication »Telecommunication

—Broadcast—Telephone—Data communication

–Remote login–File transfer–Hypertext transfer–Electronic mail–...

85

01

Digital information

Multimedia / Hypermedia

Data communication: which types of ‘data’?

Linear textHypertext

Static imagesVideo

Sound

Programs for computers

86

Data communication: which types of ‘data’?

• The same types of data (information) that can be stored and managed on a computer can be transferred over computer networks to one or several other computers.

• So the networks form an important extension of the stand-alone computers.

• “The network is the computer”

87

Data communication: applications (Part 1)

• Hard-copy transfer (Fax)• Online use of the processing power of a remote computer• Online access to information sources !

»library catalogues, »bookshop catalogues, »publisher’s catalogues, »campus-wide and community information systems, »(text or multimedia) databases, »network-based journals, ...

88

Data communication: applications (Part 2)

• Software-downloading• Electronic mail from a person to one or several persons• Computer-network based interest groups • Online talking / chatting (IRC,...)• Video conferencing (Cu-seeme, ...)• Selling, shopping, buying,..• ...

89

Data communication: modems

• description: MODulator-DEModulator: device to convert digital data signals into a suitable form for transmission along a telecommunications channel, and to convert them back upon receipt into machine readable form.

• types»(Acoustic coupler)»Free standing box»Board/card to plug-in

microcomputer

90

Computer network protocols: definition

• When 2 computer systems communicate via network, they do that by exchanging messages.

• The structure of network messages varies from network to network.

• Thus the message structure in a particular network is agreed upon a priori and is described in a set of rules, each defined in a protocol.

91

Computer networks, data communication and Internet

Local Area Networks

92

Data communication with a server in a Local Area Network

• (Terminal)

• Microcomputer with serial line communications software /terminal emulation software

• Microcomputer with network card and network software

Network Network serverserver

93

LAN software packages for heterogeneous networks: examples

Based on TCP/IP (protocol suite used in Internet)• For DOS:

NCSA (= National Center for Supercomputing Applications) CUTCP, PC/NFS,...

• For Windows 3.x: PC/NFS, PC/TCP, Trumpet TCP Manager,...

• For Windows 95, 98,...: included!• For Windows NT, 2000,...: included!

Examples

94

Computer networks, data communication and Internet

National Wide Area Networks

95

National Wide Area Networks

• Public access national packet switching networks

• Research computer networks

• Public access made available by Internet Service Providers

• ...

96

National research computer networks: examples

• Belgium: BELNET• Finland: FUNET• Germany: DFN• The Netherlands: Surfnet• United Kingdom: JANET (Joint Academic Network)• ...

Examples

97

Computer networks, data communication and Internet

International computer networks

98

International computer networks: examples

• National public data communication networks linked together

• FidoNet• Bitnet / EARN• Usenet• Internet!• ...

Examples

99

Computer networks, data communication and Internet

The Internet data communication network

100

?? Question ??

What is the Internet?

101

@

The Internet data communications network (Part 1)

• “Internet” is not well-defined.

• A network of smaller networks:The global collection of interconnected local area, regional and wide-area (national backbone) networks which use the TCP/IP suite of data communication protocols.

102

The Internet data communications network (Part 2)

• Links computers of various types.

• Is constantly growing.

• The analogy of a superhighway has been used to describe the emerging system of networked computers.

• The Internet has no owner, and is not managed by one organization. @

103

The Internet: access from your Local Area Network

Your microcomputer

Local Area Network (LAN)

One of the national networks

The global Internet

104

Host computers in the Internet: definition

• A host (computer) is a domain name that has a unique IP address record associated with it.

• Could be any computer connected to the Internet by any means.

• For instance: www.vub.ac.be

@

105

Transmission Control Protocol / Internet Protocol (TCP/IP)

• the main suite of transport protocols used on the Internet for connectivity and transmission of data across heterogeneous systems

• “glue that holds the Internet together”• an open standard• available on most Unix systems, VMS and other

minicomputer systems, many mainframe and supercomputing systems and some microcomputer and PC systems

106

Internet: addresses of computers with the Domain Name System

• Internet style = Domain name system• The Internet naming scheme consists of a hierarchical

sequence of names from the most specific to the most general (left to right), separated by dots.

computer.subdomain.domain.(country if not USA) OR n1.n2.n3.n4

where n is a natural number (8-bit)

107

Internet: growth in number of hosts worldwide: linear plot

0

5000000

10000000

15000000

20000000

1993 1994 1995 1996 1997 1998January of each year

108

Internet Service Provider= ISP

Internet Service Providers provide their clients access to Internet + in many cases»an email address / server»space for a web site»software tools to start»training»technical support»an accessible location for a WWW site of the client»assistance with WWW site design and promotion

109

Microcomputer -- external computer: some ways of data communication

Modem

External computerGateway computer system

Private/academic data comm. network (e.g. Internet)

Intern Extern

Local PAD

Leased, fixed communication line

Tele-phone

Public data comm. network

Voice telecommunication network

LAN

TelePAD

ISDN

Micro-computer

110

Online communication: remote login and file transfer

Remote terminal log-in / access

111

Remote terminal log-in / access: definition

The ability to access a computer from outside a building in which it is housed.

This requires communications hardware, software, and actual physical links,although this can be as simple as common carrier (telephone) lines or as complex as telnet login to another computer across the Internet.

112

Online communication: remote login and file transfer

Telnet in the Internet

113

Telnet: description

• The Internet standard protocol for remote terminal connection service; on top of the TCP/IP protocol suite

• Allows a user at one site to interact with a remote timesharing system at another site as if the user's terminal was connected directly to the remote computer

• Includes VT100 terminal emulation

114

Online communication: remote login and file transfer

Downloading and file transfer

115

Data communication: downloading by copying a fragment

Capturing a small fragment of the information displayed: 1. select information on the display, 2. copy, and 3. paste in a document managed by another program.

116

Online communication: remote login and file transfer

File transferftp in the Internet

117

Data communication: file transfer

• Copying + downloading / transfer of a whole file• Requires a transfer protocol with error correction

118

World-Wide Web = WWW

Introduction

119

The World-Wide Web:prerequisites

Before using the WWW you should ideally already have learned to understand and to use

• computer hardware• computer software• the Internet• older methods for online communication, such as telnet

120

The WWW: example of a welcome page

Example

121

URL = Universal Resource Locator

• = draft standard for specifying an object on the Internet• the structure is in most cases

protocol://computer_address[/path_name/file_name]• examples:

»telnet://biblio.vub.ac.be»ftp://ftp.vub.ac.be/»gopher://gopher.vub.ac.be/»http://www.vub.ac.be/BIBLIO/index.html»news://news.server.edu/comp.infosystems.www

122

URLformat / structure

1. The first part of a URL, before the colon “:”, specifies the access method = protocol

2. The second part of the URL, after the colon “:”, is interpreted specific to the access method. In general, two slashes after the colon indicate a machine /computer name.

123

?? Question ??

What is the difference between Internet and the World-Wide Web?

124

The WWW is an application of Internet

• The World-Wide Web (WWW) is a service, an application of Internet.

• It is based on the Internet infrastructure. • So the WWW is newer than the Internet.

The concept of the WWW was created at the end of the 1980s when the Internet was already well established.

125

The WWW is an application of Internet: scheme

Data communication

Internet

WWW

126

The WWW: the essential elements

• Information delivery and access using hypertext/hypermedia documents/objects»html documents»http protocol: http clients http servers

• Integration of protocols in the Internet:»http servers offering html documents including links to

other http servers, telnet servers, ftp servers, nntp servers, gopher servers...

127

Computer 1

The WWW: hyperlinks

Hyperlinks can link a part of a hypermedia document to• another part of the same document file• another document file on the same server computer• another document file on a server computer located

elsewhere in the world

Computer 2

128

The WWW: hypertext mark-up language = HTML

• Hypertext mark-up language = HTML = the system of codes used by authors to build the hypertext-pages/files in WWW, for instance to create a title or an anchor.

• The codes are invisible / transparent for the user / reader.

129

The WWW: hypertext transfer protocol = HTTP

• Hypertext transfer protocol = HTTP = the software conventions used by client and server programs for WWW to request and transfer hypermedia documents.

• The protocol must not be known by he user / reader = the protocol is invisible / transparent for the user.

• Analogous with the telnet, ftp and gopher protocol.

130

?? Question ??

Briefly compare TCP/IP and HTTP.

131

The WWW: pages and forms

• PagesMany documents developed for WWW are kept small and

are named “pages”.These often refer to several other “pages”.

• Forms = gateways to services and databases on server computers in WWW Some pages contain electronic forms, to be filled in by the

user.

132

The WWWapplications

Analogous to gopher applications:• Access to online public access catalogues• Campus-wide information systems• Access to subject-oriented information• Access to computer file archives• Traveling / navigating through the Internet

via linked html-pages• Access to intranets within institutes / companies

133

World-Wide Web = WWW

WWW client programs

134

WWW: client / browse programs

• To access the WWW, you run a browser program. • The browser reads documents, and can fetch documents

from other sources. Information providers set up hypermedia servers which browsers can get documents from.

• The browser can display hypertext documents. Hypertext is text with pointers to other text. The browsers let you deal with the pointers in a transparent way: select the pointer, and you are presented with the text that is pointed to.

135

WWW: examples of browsers for your own computer

Browsers are available for many computer platforms; in particular: browsers for Windows + Winsock:»Netscape»Microsoft Internet Explorer»...

136

?? Question ??

Which client program do YOU use or will YOU use

to access the WWW?

137

!! Task - Assignment - Exercise !!

Browse the WWW, using an available

browser client program.

138

!! Task - Assignment - Exercise !!

Visualise the HTML source code of a WWW page,

using a WWW client program.What do you learn from this exercise about the basic properties of HTML?

139

!! Task - Assignment - Exercise !!

Exploit the possibility to open more than one window, using a WWW client program

in Windows.

140

?? Question ??

Why would you want to open more than one window

on WWW servers,using a WWW client program?

141

World-Wide Web = WWW

Saving information from a web

142

WWW: How to save information from a web?

Information displayed by your web browser/client program can be saved,

• by select, copy, paste in another document (and save)• by saving a complete page to your disk

»in separate files (for instance 1 HTML file + some image files)

»in 1 file, using Microsoft Internet Explorer 5 or a later version• by copying the information into an e-mail message that you

send to your own e-mail account

143

!! Task - Assignment - Exercise !!

Copy some text fragment from WWWand paste it into another document

on your computer.

144

!! Task - Assignment - Exercise !!

Save a text from WWW to disk, as HTML,

using a browser program.

145

!! Task - Assignment - Exercise !!

Display an HTML file that you have saved

from the WWW to your disk,in a program for word processing.

Is the file displayed properly?

146

World-Wide Web = WWW

The success of WWW

147

WWW: growing number of WWW servers

01000000200000030000004000000500000060000007000000

1993 1994 1995 1996 1997 1998 1999 2000

148

WWW as popular method to access information from computers

• The WWW has quickly become the most popular medium to access information that resides on various computers that are connected to a computer network.

149

Online access information sources and services

Introduction

150

Online information sources: summary

• The following gives a general overview of online accessible information sources.

• This overview is not limited to or focusing on a particular concrete subject domain/area.

151

Online access to information: avoid network traffic jams

To access from Europe online information sources in the US, work when lines are not saturated.

(better in the morning than in the afternoon)

152

Internet based information sources: problems / difficulties (Part 1)

• Redundancy and overlap:On the one hand, there is too much information on some topics; in other words, the redundancy and overlap are high in many cases. Too few information sources: On the other hand, there are too few information sources on some topics.

153

Internet based information sources: problems / difficulties (Part 2)

• No order is imposed on most sources.Quality checks / quality controls are not performed.Related to this: it is not required to register new information offered. Is the information that you find real, honest, authentic?

154

Internet based information sources: problems / difficulties (Part 3)

• Change is the only constant: Information sources are constantly changing, growing, but sometimes disappearing.

155

Internet based information sources: problems / difficulties (Part 4)

• Scattering: There is no single simple but powerful system to find relevant information through the Internet.In other words: integration / aggregation is still far from perfect.

156

Internet based information sources: problems / difficulties (Part 5)

• Slow: The Internet is in many places and for many applications not yet fast enough.

157

Internet based information sources: problems / difficulties (Part 6)

• In conclusion: Surfing, using the Internet, the WWW, can be a time sink instead of a productive activity.

158

Internet based information sources: how many? how much information?

• More than 10 million WWW sites (in 2003)

• More than 2000 million (= 2 billion) unique URLs in the total Internet (in 2002)

• More than 10 terabyte (= 10 000 gigabyte) of text data (in 2001)

159

Online access information sources and services

Types of online access information systems

160

Types of online access information systems: “free” versus “fee”

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

161

Online access information sources and services

Dictionaries and encyclopaedias accessible through the WWW

162

Dictionaries and encyclopedias through the WWW: introduction

• Dictionaries and encyclopedias are the first choice among many types of information sources, »when we do not need detailed information on a common

topic»when we want to prepare a more detailed search on an

unfamiliar topic, by searching for the right spelling, synonyms, context,…

• Some dictionaries and encyclopedias are available through the WWW free of charge.

163

Dictionaries accessible through Internet and the WWW: example

• The American Heritage® Dictionary of the English Language»Over 200,000 entries,

70,000 audio word pronunciations, 900 full-page color illustrations

»Available free of charge from http://education.yahoo.com/reference/dictionary/

Example

164

Dictionaries accessible through Internet and the WWW: compilation

• A compilation/collection of dictionaries can be searched simultaneously and free of charge: http://www.onelook.com/

Example

165

Encyclopedias accessible through Internet and the WWW: examples

• Encarta Concise Free Encyclopedia »http://encarta.msn.com/»Available in English and in some other languages

Example

166

Encyclopedias accessible through Internet and the WWW: examples

• Encyclopædia Britannica only a small part is available free of charge + links to selected WWW sites»http://www.britannica.com/

• Encyclopædia Britannica Concise»http://education.yahoo.com/reference/encyclopedia/

Example

167

Encyclopedias accessible through Internet and the WWW: examples

• The Canadian Encyclopedia(in English and in French):»http://thecanadianencyclopedia.com/

Example

168

Encyclopedias accessible through Internet and the WWW: overviews

• A list / overview of encyclopedia on the Internet:http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet can be found as a part of more general directories of Internet-based information sources.

Example

169

Online access information sources and services

Internet directories and indexes

170

Internet: meta-information about Internet information sources

• in printed manuals and guides:- it is not always possible to get a copy fast- it costs money to get a copy- they are soon out of date

• offered on the WWW!:+ directly available when we want to use the Internet+ many systems are accessible free of charge+ most systems are regularly updated

• (“intelligent agent” software on client PC)

171

Internet: subject-oriented meta-information offered via WWW

Information about information sources: in the form of»subject guides = texts with references»subject hypertext directories = subject guides»key word indexes, generated automatically, for searching»collections of links or forms to the above»(multi-threaded search systems)

172

Internet global subject directories:introduction

• They are virtual libraries with open shelves, for browsing.• They are manually generated, man-made by many

people.• They can be browsed following a tree structure or a more

complicated variation.• The most famous of these systems belong to the most

popular and most visited sites on the WWW: e.g. Yahoo!

173

Internet global subject directories: structure

The structure corresponds to a classification that is in most cases specific for the particular overview. In other words: the well-known and classical universal classification systems are not used in most Internet directories.

174

Internet global subject directories: pros and cons

• They cover a small number of selected WWW sites, in comparison with the total number of sites that are accessible.

+ The selected, included sites should be better than average. - They are not suitable for deep, detailed, specific searches

with a high coverage.

175

Internet global subject directories:why use one?

• They are suitable mainly for broad searches that can be difficult to formulate in words, but NOT for more specific searches that require combinations of several concepts.

176

Internet global subject directories:searching directories with a query

• Many of the Internet directories include an index to search their contents with a query.

• However, then the assisting classification structure is not well exploited and the user should be aware of the problems and difficulties of information retrieval with natural language queries.

• Furthermore, the possibility to use the system in this way may be confusing, as these directories are not real full-text Internet indexes, like those provided by other search tools.

177

Internet global subject directories: Yahoo!

• A hypertext global subject directory can be found at http://www.yahoo.com/

and at many other sites, includinghttp://www.yahoo.co.uk/

• Entries are NOT rated.• Accessible free of charge.

178

Internet global subject directories: Google directory

• A hypertext global subject directory can be found athttp://directory.google.com/

• Accessible free of charge.• Based on the Netscape DMOZ

Open Directory Project.• Do not confuse this with the famous Google WWW search

engine.

179

Internet global subject directories: Open Directory Project

• A hypertext global subject directory can be found athttp://www.dmoz.org/

• The contents is also used in other systems,such as Google Directory and Webbrain.

• Accessible free of charge.

180

!! Task - Assignment - Exercise !!

Try to find Internet sourceswhich are relevant for you, by using an Internet-based

global subject directory.

181

Internet local subject directories: examples in Belgium

• http://yellow.advalvas.be/weblist.html• http://search.msn.be/

• The guide developed by the public libraries in Flanders: http://www.bib.vlaanderen.be/webwijzer

        

182

Internet indexes:automated search tools

• Several systems allow to search for and to locate many items (addressable resources) in the Internet in a more systematic, direct way than by only browsing/navigating.

• These systems do NOT search the contents of computers through the real Internet in real time and completely when a user makes a query. Searching in that way would be much too slow due to limitations in the technology.

183

Internet indexes: scheme of the mechanism

User searching for Internet based information

Internet client hardware and software

user interface to a search engine Internet information source

Internet index search engine Internet crawler and indexing system

database of Internet files, including an index

184

Internet indexes:description of the mechanism

Each of these search systems is based on:• a database of links to pages / URLs that can be retrieved by

searching with queries through a big index that is built machine-made on the basis of the contents, the texts, of these pages(to build this database and to keep it up to date, pages are continuously collected from the Internet by a “robot” computer software system)

• a search system with a user interface in a WWW form, to allow the user to search through that database

185

Internet indexes:AltaVista

• The primary search interface can be found in the US. The following addresses all lead to the same information:»http://www.altavista.com/»http://www.av.com/»http://av.com/

• Mirror site in UK:»http://uk.altavista.com/»http://www.altavista.co.uk/

186

Internet indexes:AltaVista: features

• Allows full text searching of the WWW• Offers relevance ranking of search results• Allows also advanced Boolean searching

(in “Advanced” mode)• Offers a link to an Internet subject directory (Looksmart)• Offers links to systems to find

images, sounds… (multimedia) in the Internet

187

Internet indexes:All the Web

• The search interface can be found at:http://www.alltheweb.com/http://alltheweb.com/

• You can search the WWW and ftp servers.• The database is one of the biggest.• Not only HTML and plain text files, but also the full text

of many Adobe PDF files is indexed.• Offers also a module to search for pictures/images.• Offers spelling suggestions in the search interface.

188

Internet indexes: Google (Part 1)

• http://www.google.com/• Full-text searching is possible of many files that are

available through the WWW.• Not only HTML and plain text pages are covered, but also

the first part is indexed of many files in other file formats, such as »Adobe PDF, »Microsoft Word, Microsoft Excel, Microsoft PowerPoint »Rich Text Format…

189

Internet indexes: Google (Part 2)

• One of the most popular systems in 2001, 2002, 2003…• For retrieval an algorithm is used that takes into account

the links between WWW pages.A retrieved page is ranked higher when »many sites/pages point to it»“important” sites/pages point to it

• Some other famous search systems are based on Google such as Netscape Search and the WWW searches of Yahoo! (at least in 2003).

190

Internet indexes: Google computer servers

• Google uses a system of more than 10 000 small computer servers to offer it’s information services.

191

Internet indexes: Google additional features

• Besides a system to search for WWW pages, Google offers also »a subject directory»searching for images/pictures on the WWW»searching an archive of Usenet messages +

posting to Usenet groups»searching for news

• Thus Google has become a great integrator / aggregator.

192

Internet indexes: coverage

• Internet indexes do not cover all static documents on the WWW.

• Most indexes grow and their “size ranking” is variable.• If exhaustive results are desired, then more than one

Internet index search system should be used.

193

Internet indexes: coverage and size of each index

• Most indexes grow and their “size ranking” is variable.• The biggest systems in 2003:

» Google !» AltaVista» All the Web (serving also Lycos)» Systems based on the INKTOMI database of WWW

pages.

194

!! Task - Assignment - Exercise !!

Try to find Internet sourceswhich are relevant for you, by using an Internet index.

195

Internet information sources

Coverage of Internet directories and Internet indexes

A global Internet indexA global Internet directory

196

Global Internet search tools: a comparison

Global Internet directories

• Only a limited selection of Internet sources

• Browsing information sources is easy

• Good for broad searches

Global Internet indexes

• About 1/3 of the Internet is covered by an index

• Searching requires some skills and knowledge

• Good for specific, narrow searches

Multi-threaded search systems

• These get information from directories and indexes

• Searching requires some skills and knowledge

• Good when even 1 index does not yield information

197

Internet: who owns the search tools?

In 2003:• The company Yahoo! owns

»the most famous global Internet subject directory»3 (!) Internet full-text search engines:

All the Web, AltaVista, Inktomi• The company Google owns

»the most famous Internet full-text search engine»one of the best Internet image search engines»a gateway to old and new Usenet news messages

198

Online access information sources and services

Public access book databases

199

Public access book databases: introduction

• Even in this age of Internet-based information sources, a lot of information is still distributed in the form of printed books.

• The contents of most books is (still) not available on the Internet.

• Most general Internet search tools do NOT allow you to find out about the existence of books that may be interesting for you.

• So, specific search tools to find books can be useful.

200

Public access book databases: an overview

• (Databases by publishers.)• Fee-based databases by commercial providers• Databases by book distributors / bookshops!• Online public access catalogues of

»local libraries,»national libraries (which produce and offer normally their

national bibliography)!»big, famous libraries!!

• (Databases of computer-based versions of books.)

201

Public access book databases: which one to use?

• For years, the market of bibliographic information on books was limited to the services and databases of subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many possibilities to find bibliographic information.

• Which book database should be preferred for particular applications is not clear for most librarians or end-users.

202

Public access book databases by commercial producers

• To find currently available books, some databases assembled by commercial producers can be interesting.

• Example: Global Books in Print• These databases offer formal descriptions of books,

prices of the books, short descriptions of the contents with subject terms…

• However, access to such a database is not free of charge and can be expensive (in comparison with alternatives).

203

Public access book databases provided by bookshops

• To find currently available books, the bibliographic databases assembled by big bookshops are interesting.

• Several offer a good coverage and are accessible free of charge.

• The added price information can be useful for the acquisition and accounting department of a library or if an individual user wants to buy a book.

• Some provide a current awareness service, also free of charge.

204

Book databases accessible free of charge: examples in U.S.A.

• Amazon.com (US):http://www.amazon.com/ http://www.amazon.co.uk/ note: amazon, NOT amazoneSubject description is poor.

• Barnes and Noble (US):http://www.bn.com/

Examples

205

Free public access bibliographic book database + price comparisons

• Even comparisons of the catalogues of shops of books (as well as of music, movies and many other goods) are available free of charge.

• See for instance»http://www.bookfinder.com/»http://www.dealtime.com/

206

!! Task - Assignment - Exercise !!

Search for titles of bookswhich are relevant for you,

using an online database provided by a book publisher or bookshop.

207

Online Public Access Catalogues of libraries

• Mainly to find older books, the catalogues of libraries can be useful.

• Most are accessible online and free of charge.

208

Online access information sources and services

Fee-based online public access information services

209

Types of online access information systems: “free” versus “fee”

• A lot of the information on the Internet is available free of charge, but another part is only accessible when a fee is paid to the producer and / or the distributor.

• The first commercial computer systems that make information available online were born around 1975. Most of them are now also available through the Internet.

• Some organisations pay these fees for some sources and then organise access, so that the members of the organisation can retrieve and exploit the information as if it is free of charge.

210

Types of online access information systems: “free” versus “fee”

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

211

Types of online access information systems: “free” for members only

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

Fee-based online information services, made accessible “free of charge”

by an institute to its members

212

Online information services:total size of their databases

In 1999:The big host systems and the public access WWW pages offer a

comparable quantity of information:• WWW offered about 8 terabytes (= 8 000 gigabytes) of text data(according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.)

• Dialog offered about 9 terabytes (= 9 000 gigabytes) (in 1998)»6 billion pages of text»3 million images

213

Online access information sources and services

Online access databases about journal articles

214

Online access databases about journal articles: overview

• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers.

• Many publishers offer searchable bibliographies, but only of their own publications. (for instance Emerald, Elsevier)

• Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.

215

Online access databases about journal articles: Article@INIST

• Article@INIST allows you to search in a bibliographic database, NOT full-text, (Journal articles, journal issues, books, reports, conferences, doctoral dissertations) at the Institut de l'Information Scientifique et Technique, France.

• Does not offer usage of classification or thesaurus.• Searching is free of charge.• Available from http://form.inist.fr/public/eng/conslt.htm• Payment is required to receive the full text of an article.

216

Online access databases about journal articles: Ingenta (1)

• Ingenta Journals allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts.

• Searching is free of charge.

217

Online access databases about journal articles: Ingenta (2)

• Payment is required to receive the full text of an article.• Available from

»http://www.ingenta.co.uk/»http://www.ingenta.com/

• Ingenta has acquired Uncover in 2000.

218

Online access databases about journal articles: Infotrieve

• Infotrieve allows you to search free of charge in a bibliographic database of the articles of more than 20 000 journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/• Payment is required to receive the full text of a document.• Current awareness services are also offered free of

charge: the table of contents of new issues of the journals that you have selected are sent to you by email.

219

Online access databases about journal articles: Scirus

• This is a specialised Internet index that allows you to search for selected scientific information (only) on the WWW. This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier.

• An article can be downloaded in full-text format only when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

Example

220

Online access databases about journal articles: Scirus features

• Offered free of charge by Elsevier.• Is partly based on the Fast WWW search system that is

also used by Alltheweb.• Offers access to information ordered according to some

classification system / taxonomy.

Example

221

Online access information sources and services

Finding multimedia files on the Internet

222

Finding multimedia files on the Internet: introduction

Several public access search systems are available free of charge, to search the Internet for multimedia files: »images / pictures (either artwork, either photos, or both)»sound / audio files (music, speeches...); video

223

Finding images on the Internet:introduction

• Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet.

• When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).

224 Examples

Finding images on the Internet:screen shot of a Google image search

225Examples

Finding images on the Internet:examples of search engines (1)

• http://alltheweb.com/ !!• http://gallery.yahoo.com/ !• http://images.google.com/ !!!

or through http://www.google.com/The largest database in this category (at least in 2002, 2003). For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.

226Examples

Finding images on the Internet:examples of search engines (2)

• http://multimedia.lycos.com/• http://www.altavista.com/ !!

(also audio and video, choose not the normal text search, but IMAGES in the user interface.)

227

!! Task - Assignment - Exercise !!

Use a specialised search engineto find images

about a particular subject on the Internet.

228

Online access information sources and services

Evolution and future trends

229

Online access information: evolution and future trends

• An increasing amount of information becomes available online.

• A growing amount of this online information becomes available free of charge.

• The quality and ease of use of software on server as well as client is growing.

A consequence is:• An increasing number of end-users searching for

information online.

230

Online access information: easier and more complicated?!

• At the same time, information retrieval becomes both easier and also more complicated.This may seem strange and contradictory, but it is reality.

This is a paradox.

231

Online access information: easier information retrieval systems

• Individual information retrieval systems become easier: »they react faster; »they can provide access to more data/information in one

action;»their user interfaces are simple,

but more sophisticated, intelligent retrieval algorithms can nevertheless deliver satisfactory results in most simple cases.

232

Online access information: more complicated information market

• The whole information landscape consists of more and more decentralised information sources, each one bringing an individual user interface that should be mastered. Making the right, ideal choice among the sources becomes not easier, perhaps even more complicated every day.

233

Online access information: more complicated information market

• Furthermore, for many sources the accessibility / availability, the user interface, the interlinking, depend on the organisation in which the searcher is active.

234

Online access information: conclusion

• In the case of simple information needs, the WWW and the search tools can work like “magic”.

• However, in the case of more complicated information needs, there is still is no “magic button” that brings you immediately to all the required information.

235

Evaluating the quality of information

Documentary information sources: evaluating their quality

236

Documentary information sources: evaluating their quality

• We should always be critical when using information sources, in view of »the widely varying degrees of quality of information

sources, and of»the costs associated with searching, finding, using

information.

237

Documentary information sources: evaluation criteria (1)

• Is the information valid, reliable, trustworthy, genuine, authentic? Is the author honest? Is the source objective, not subjective, without cultural or political or ideological or commercial bias? Is the origin an individual or a company or an organisation?Is the publication sponsored by some company or organisation?

238

Documentary information sources: evaluation criteria (2)

• Is the information accurate, correct? Who is the author or producer? Has the source an author or a producer with a high expertise, a good reputation, good qualifications?Can the author be contacted for clarification or discussion?

Was the information reviewed, edited, improved, corrected, censored, approved, verified, before publication? Do experts agree on the information provided?

239

Documentary information sources: evaluation criteria (3)

• Is the information source unique? Does it offer a great amount of primary information, which is not obtainable from other sources?

• Is the information complete? Is the work available in its entirety?

• Does the source offer a wide coverage? Is the source comprehensive, substantive?

• Is the information current enough, up to date? Is a publication date provided?Is an expiration date provided?

240

Documentary information sources: evaluation criteria (4)

• Does the document provide suitable references, so that you can verify statements and find older suitable information sources?

• Good clear format and lay-out of the information / User-friendly information system / Easy for users to orientate themselves within the resource and to find their way around it?

• Good user support / Good customer support?• Is the type of distribution medium appropriate?

(print, e-mail, online,...)

241

Documentary information sources: evaluation criteria (5)

• Is the information what you want?If not, then reassess your needs and consider other types of information as well.

242

Documentary information sources: evaluation criteria (6)

• Is the information suitable for your level of understanding of the subject? Is the document popular, suitable for the general public, for students, for professionals, for scholarly/academic use…?Does it report new, primary research (survey, experiment, observation, measurement, invention) or is it a review of sources published earlier?

• Does the information repeat or confirm what you already know, or is it complementary, contradictory, new?