Internet en WWW voor het opsporen van informatie

242
1 Internet en WWW voor het opsporen van informatie [email protected] Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel. februari 2004 VUB-IDLO

description

Internet en WWW voor het opsporen van informatie. [email protected] Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel. februari 2004 VUB-IDLO. The slides are available from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/ (note: BIBLIO and not biblio). - PowerPoint PPT Presentation

Transcript of Internet en WWW voor het opsporen van informatie

Page 1: Internet en WWW  voor het opsporen van informatie

1

Internet en WWW voor het opsporen van informatie

[email protected] Universiteit Brussel,Pleinlaan 2, B-1050 Brussel.

februari 2004VUB-IDLO

Page 2: Internet en WWW  voor het opsporen van informatie

2

The slides are available fromhttp://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/

(note: BIBLIO and not biblio)

Page 3: Internet en WWW  voor het opsporen van informatie

3

Planning van de dag: voormiddag

• Over “informatie”• Informatiemarkt• Information retrieval• Thesaurussen

(+ oefenen van query-formulering)• Netwerken en Internet i.h.b.• World-Wide Web (+ oefenen van “browsing” + “saving”)

• LUNCH

Page 4: Internet en WWW  voor het opsporen van informatie

4

Planning van de dag: namiddag (deel 1)

• Online toegankelijke informatiebronnen!»Globale Internet directories (+oefenen)»Internet indexes (+ oefenen)»Boek-databases (+ oefenen)»Te betalen databases»Databases met titels van tijdschriftartikels»Vinden van illustraties/beelden/foto’s (+ oefenen)

Page 5: Internet en WWW  voor het opsporen van informatie

5

Planning van de dag: namiddag (deel 2)

• Evaluatie van informatiebronnen

• Vrij zoeken volgens eigen interesse, met assistentie

Page 6: Internet en WWW  voor het opsporen van informatie

6-Interruptions-Questions-Remarks -Discussions are welcome

Page 7: Internet en WWW  voor het opsporen van informatie

7

About “information”

Information concepts

Page 8: Internet en WWW  voor het opsporen van informatie

8

The flow of documentary information with primary and secondary sources

Reader /User /

Receiver

Secondary sources / systems: mainlyReference works (printed, CD-ROM, online)

Library catalogues, including OPACs...

Author /Creator / Sender Primary sources / systems: mainly

Journal articles / Books / Electronic mail / Online sources /...

Page 9: Internet en WWW  voor het opsporen van informatie

9

The role of secondary information sources

• The secondary information flow is generated on the basis of the primary flow, mainly because the great amounts of primary information lower the chance to retrieve and use the appropriate information item.

• Secondary information tries to bring some order in the great chaos.

Page 10: Internet en WWW  voor het opsporen van informatie

10

Various categorisations of documentary information sources

Information sources can be categorised in various ways. For instance:

•Primary

•Secondary

•Hard copy /not digital

•Digital•Offline

•Online

•Text•Image•Sound•Animation/video•Software•Data•Interactive

•Books

•Serials

Page 11: Internet en WWW  voor het opsporen van informatie

11

Past

Now

Future

Retrospective searching versus current awareness: scheme

Retrospective searching

Current awareness

Page 12: Internet en WWW  voor het opsporen van informatie

12

Information retrieval: evolution of storage and distribution media

• 1450 printing with reusable characters/fonts

• 1975 + online access databasesfrom the 1970s growing Internet

• 1985 + CD-ROM• 1990 + World-Wide Web

(based on the Internet)

Page 13: Internet en WWW  voor het opsporen van informatie

13

Information retrieval: end user or information intermediaries

End-user

Information intermediary(Broker or library or ...)

Information

Page 14: Internet en WWW  voor het opsporen van informatie

14

End user versus information intermediary

• People can retrieve information themselves, directly as so-called “end-users”.

• However, »the information landscape is complex, »it may cost a lot of the time to find the right information, »it may be costly to search for information

• Therefore it may be wise to obtain the assistance of an expert information intermediary, such a a reference librarian or an information broker.

Page 15: Internet en WWW  voor het opsporen van informatie

15

About “information”

Computer- and network-based information

Page 16: Internet en WWW  voor het opsporen van informatie

16

Information: from bits to meaningful information

Digitalcomputer data = bits

or01Program code, meaningful for andto be interpreted / executed bya suitable / compatible computer

Information = “documents”, meaningful for andto be interpreted byhuman beings

Page 17: Internet en WWW  voor het opsporen van informatie

17

Information: digitally stored and managed information

Categories of digital, computer readable information / data, forming electronic “documents”,understandable by human beings.

01textnumbersimagesvideosounds

multimedia

+

Page 18: Internet en WWW  voor het opsporen van informatie

18

01

Digital information

Multimedia / Hypermedia

Information: types of digital information

Linear textHypertext

Static imagesVideo

Sound

Programs for computers

Page 19: Internet en WWW  voor het opsporen van informatie

19

Online / Networked

CD-ROM

Update speed

Volume

Some publication media compared

Printed

Page 20: Internet en WWW  voor het opsporen van informatie

20

Scientific publishing in Utopia: an ideal scheme

Many authors

Many readers / users

Many editors / publishers

Online remote access multimedia database server

Many database search clients and user interfacesone global ,

international computer data communication network

author = reader in science

Page 21: Internet en WWW  voor het opsporen van informatie

21

?? Question ??

Indicate the differences between reality

and that simplified, ideal schemeof the information flow.

Page 22: Internet en WWW  voor het opsporen van informatie

22

?? Question ??

Which basic problems/difficulties hinder people

to find / access / use information?

Page 23: Internet en WWW  voor het opsporen van informatie

23

Information retrieval: basic difficulties (Part 1)

• In many cases it is not completely clear to the user of an information retrieval system which information is in fact needed, required.

• In many cases the need for information cannot be expressed completely in the form of a query.One of the reasons is that the complete context of the

information need should ideally be expressed, including the knowledge and background of the searcher.

Page 24: Internet en WWW  voor het opsporen van informatie

24

Information retrieval: basic difficulties (Part 2)

• Computer systems are artificial, but nevertheless most use human language in their interface with the human users, for instance in database search systems. This may cause difficulties related to language and vocabulary in particular. Some examples:• People use different languages and different terms (vocabularies)

to describe a similar concept. • Concepts, vocabularies and meanings of words and terms may

change over time.• Meanings of words / terms may depend on their context.

Page 25: Internet en WWW  voor het opsporen van informatie

25

Information retrieval: basic difficulties (Part 3)

• Many different and imperfect retrieval systems should or must be used.»To retrieve and access the information that is in principle

available, many different retrieval systems must be available and be mastered.

»Furthermore, a perfect information retrieval software does not (yet) exist; scientific and technological evolution is fast in the domain of information retrieval software since about 1970.

Page 26: Internet en WWW  voor het opsporen van informatie

26

Information retrieval: basic difficulties (Part 4)

• Information overloadUsers are often overwhelmed

by the amount of available information and by the large influx of new information.

Page 27: Internet en WWW  voor het opsporen van informatie

27

Information retrieval: basic difficulties (Part 5)

• The price (or inaccessibility) of particular informationA lot of information cannot be obtained or at least not free

of charge.

Page 28: Internet en WWW  voor het opsporen van informatie

28

The information industry and the information market

The components of the information industry

Page 29: Internet en WWW  voor het opsporen van informatie

29

The components of the information industry

• Authors• Publishers• Distributors• Users

• Related organizations

Page 30: Internet en WWW  voor het opsporen van informatie

30

The information industry and the information market

Overview and evolution

Page 31: Internet en WWW  voor het opsporen van informatie

31

Increase in the number of scientific and technical serial publications

1

10

100

1000

10000

100000

1000000

1650 1700 1750 1800 1850 1900 1950 2000

Page 32: Internet en WWW  voor het opsporen van informatie

32

The information market: growth in the database industry

0

2000

4000

6000

8000

10000

1975 1980 1985 1990 1995

Number oflivingdatabasesNumber ofdatabaseproducersNumber ofvendors

Source: Williams, in: Gale Directory of Databases, 1998.

Page 33: Internet en WWW  voor het opsporen van informatie

33

The information industry / market: future trends (Part 1)

• Growth in the production of databases.

• Less analogue / hard-copy production = more digital production, storage, and distribution of information.

• More integration of information types into multimedia and hypermedia.

Page 34: Internet en WWW  voor het opsporen van informatie

34

The information industry / market: future trends (Part 2)

• Growth in the number of »producers and distributors, »end-users searching databases

due to easier use and lower costs of information technology

Page 35: Internet en WWW  voor het opsporen van informatie

35

Databases and computerized information retrieval

Introduction

Page 36: Internet en WWW  voor het opsporen van informatie

36

What is a database?

A database is a collection of similar data records stored in a common file (or collection of files).

Page 37: Internet en WWW  voor het opsporen van informatie

37

Types of databases: examples

Examples: The databases that form the basis for »catalogues of books or other types of documents»computerized bibliographies»address directories»a full text newspaper, newsletter, magazine, journal

+ collections of these»WWW and Internet search engines»intranet search engines»...

Page 38: Internet en WWW  voor het opsporen van informatie

38

Comparison

Information retrieval: the basic processes in search systems

Information problem

Representation

Query Indexed documents

Representation

Retrieved, sorted documents

Text documents

Evaluation and

feedback

Page 39: Internet en WWW  voor het opsporen van informatie

39

Databases and computerized information retrieval

Text retrieval and language

Page 40: Internet en WWW  voor het opsporen van informatie

40

Text retrieval and language: a word is not a concept (a)

Problem: A word or phrase or term is not the same as a concept or

subject or topic.

Word

WordConcept

Page 41: Internet en WWW  voor het opsporen van informatie

41

Text retrieval and language: a word is not a concept (a’)

So, to ‘cover’ a concept in a search, to increase the recall of a search, the user of a retrieval system should consider an expansion of the query; that is: the user should also include other words in the query to ‘cover’ the concept.

Page 42: Internet en WWW  voor het opsporen van informatie

42

Text retrieval and language: a word is not a concept (a’’)

»synonyms!(such as : Latin names of species in biology besides the common names, scientific names besides common names of substances in chemistry…)

Page 43: Internet en WWW  voor het opsporen van informatie

43

Text retrieval and language: a word is not a concept (a’’’)

»narrower terms, more specific terms (such as particular brand names);including terms with prefixes(for instance: viruses, retroviruses, rotaviruses,...)

»spelling variations (such as UK English versus US English);possible variations after transliteration

Page 44: Internet en WWW  voor het opsporen van informatie

44

Text retrieval and language: a word is not a concept (a’’’’)

»singular or plural forms of a noun (when this is used as a search term)

»(relevant) related terms»various forms of a verb

(when this is used in the query)»broader terms (perhaps)

Page 45: Internet en WWW  voor het opsporen van informatie

?? Question ??

Which problems in text retrieval are illustrated by the following sentences?

45

Page 46: Internet en WWW  voor het opsporen van informatie

46

Time flies like an arrow.Fruit flies like a banana.

?

Examples

Page 47: Internet en WWW  voor het opsporen van informatie

47

Time flies like an arrow.

Fruit flies like a banana.

Examples

Page 48: Internet en WWW  voor het opsporen van informatie

48

Time flies like an arrow.Fruit flies like a

banana.

OK!

Examples

Page 49: Internet en WWW  voor het opsporen van informatie

49

Text retrieval and language: ambiguity of meaning (a)

• Problem: A word or phrase can have more than 1 meaning.Ambiguity of the meaning of a word is a problem for retrieval. This decreases the precision of many searches.The meaning can depend on the context. The meaning may depend on the region where the term is used.

Page 50: Internet en WWW  voor het opsporen van informatie

50

Text retrieval and language: ambiguity of meaning (a’)

• Example of a word:»Pascal the philosopher»Pascal the computer language

Example

Page 51: Internet en WWW  voor het opsporen van informatie

51

Text retrieval and language: ambiguity of meaning (a’’)

• Example of sentences:»The banks of New Zealand flooded our mailboxes with

free account proposals.»The banks of New Zealand flooded with heavy rains

account for the economic loss.

Example

Page 52: Internet en WWW  voor het opsporen van informatie

52

Text retrieval and language: ambiguity of meaning (a’’’)

Problem: Ambiguity of meaning

may be the cause of low precision.

WordConcept

Concept

Page 53: Internet en WWW  voor het opsporen van informatie

53

A word is not a conceptA concept is not a word

Word1

Word2

Word3

Concept1

Concept2

Concept3

A concept cannot be “covered” by only 1 word or term; this may be the cause of low recall of a search.The meaning of many words is ambiguous; this may be the cause of low precision of a search.

Page 54: Internet en WWW  voor het opsporen van informatie

54

Databases and computerized information retrieval

Hints on how to use information sources

Page 55: Internet en WWW  voor het opsporen van informatie

55

Hints on how to use information sources: overview (Part 1)

• Know the purpose and motivation for each search.• Do not be lazy: search on your own, before bothering

experts with requests for advice.• Plan your search in advance.• Choose the best source(s) for each search.• Use the available tools for subject searching well.• Try to cope with the language problems;

avoid spelling errors in your search query; use spelling variations in your search query

Page 56: Internet en WWW  voor het opsporen van informatie

56

Hints on how to use information sources: overview (Part 2)

• Match your search strategy with the type of source.• Work cost-effectively.• Use special care when searching for names.• Be specific.

Avoid broad searches.Limit your search to a specific country or region if required.

• Work iteratively.• Keep a record of your work.

Page 57: Internet en WWW  voor het opsporen van informatie

57

Hints on how to use information sources: overview (Part 3)

• Do not only focus on a single source. • Consider citation indexes besides subject-oriented

databases, as useful secondary information sources.• Stop searching when “enough is enough”• Give up if necessary... (Not all questions have an answer.)• Be critical: not all information is correct or useful.

Page 58: Internet en WWW  voor het opsporen van informatie

58

Hints on how to use information sources: overview (Part 4)

• In computer-based retrieval systems, consider applying»truncation of search terms (using a symbol like * or ?)»combine search terms, using

—Boolean operators: OR AND / + NOT / AND NOT / -

—proximity operators (for instance “NEAR”)

—phrase searching (“word1 word2”)»searching limited to a field (for instance URL, title…)

Page 59: Internet en WWW  voor het opsporen van informatie

59

Hints on how to use information sources: subject searching

• When you search for information on a particular topic/subject: investigate if the database producer offers »a subject classification scheme and/or »a controlled/approved/accepted subject terms, and/or»a subject thesaurus

• Exploit these, if they are available.• In most cases you should find and use

synonyms and narrower terms• Use broader and /or related terms, if appropriate.

Page 60: Internet en WWW  voor het opsporen van informatie

60

Hints on how to use information sources: Boolean combinations

Most text search systems understand the basic Boolean operators:

OR = obtain records that contain one or both search terms

AND = obtain records that contain both search terms

NOT= exclude records that contain a search term

Page 61: Internet en WWW  voor het opsporen van informatie

61

Hints on how to use information sources: Boolean combinations

In the case of computer-based information sources, use Boolean combinations of search terms when appropriate and when possible.

term x1OR term x2ORterm x3

term y1OR term y2OR term y3

term z1OR term z2OR term z3

AND AND AND ...

Page 62: Internet en WWW  voor het opsporen van informatie

62

Hints on how to use information sources: Boolean queries

Most text search systems understand the basic Boolean operators typed in capital characters:

ORAND

Page 63: Internet en WWW  voor het opsporen van informatie

63

Hints on how to use information sources: default Boolean operator

• Find out if there is a default implicit Boolean operator working in the search system that you use.

• This works even when no operator is used explicitly among words.

• This can be OR, AND, NEAR...

Page 64: Internet en WWW  voor het opsporen van informatie

64

?? Question ??

How many (and which) concepts/facets do you see in a search for

“general reviews about

monitoring seawater pollution that is due to effluents in Tanzania”?

Page 65: Internet en WWW  voor het opsporen van informatie

65

!! Task - Assignment !!

Prepare off-line, on paper, a suitable search query in a generic format, to find

“general reviews about

monitoring seawater pollution that is due to effluents” as the basis for later, concrete searches in databases.

(Limit yourself to 1 of the concepts.)

Page 66: Internet en WWW  voor het opsporen van informatie

66

?? Question ??

What did you learn from the exercise

on the formulation of a query?

Page 67: Internet en WWW  voor het opsporen van informatie

67

Hints on how to use information sources: work iteratively

Work iteratively = search, investigate your results, refine your search, search again, and so on; do not try to find everything in 1 step, with 1 search.

Results

Query Searching

Feedback

Page 68: Internet en WWW  voor het opsporen van informatie

68

Hints on how to use information sources: work iteratively: example

When you search a database with subject keywords from a controlled list, added to each record:1. Search with search terms that you know2. Investigate the results and select good, relevant items3. Look for the keywords added to these items4. Select the good, relevant keywords5. Formulate a new search with these keywords added 6. Execute the new search7. Repeat the procedure

Page 69: Internet en WWW  voor het opsporen van informatie

69

“The ability to ask the right question is more than half the battle of finding the answer.”

Thomas J. Watson

?

Page 70: Internet en WWW  voor het opsporen van informatie

70

Hints on how to use information sources: when to stop searching?

Develop a feel for the “curve of diminishing returns”:If you spend too much time, effort, and/or money

with too few benefits, you should stop.

time / effort / money

payoffTime to stop?

Page 71: Internet en WWW  voor het opsporen van informatie

71

Knowledge organisation: classifications, and thesaurus systems

Introduction

Page 72: Internet en WWW  voor het opsporen van informatie

72

• To organise knowledge / documents / books / reports / information / data / records / things / items / materials for more efficient storage and retrieval, some related, similar tools / systems / methods / approaches are used.

• Often but not yet always, this process is assisted by a computer system.

• Good systems are expanded and updated when the need arises.• The organization system applied should ideally be clearly and

immediately visible or even searchable on computer, by the user of the materials.

Knowledge organisation: introduction

Page 73: Internet en WWW  voor het opsporen van informatie

73

Knowledge organisation: classifications, and thesaurus systems

Classifications

Page 74: Internet en WWW  voor het opsporen van informatie

74

• Universal means here: covering all subjects• Not just one but several competing systems exist. Examples

»Universal Decimal Classification = UDCused mainly outside U.S.A.

»Dewey Decimal Classification = DDCused mainly in U.S.A.

»Library of Congress Classificationused mainly in U.S.A.

»...

Classification systems: examples of universal systems

Examples

Page 75: Internet en WWW  voor het opsporen van informatie

75

Knowledge organisation: classifications, and thesaurus systems

Thesaurus systems

Page 76: Internet en WWW  voor het opsporen van informatie

76

Thesaurus: description

• Thesaurus (contents) = »system to control a vocabulary

(= words and phrases + their relations)»+ the contents of this vocabulary

• Thesaurus program = program to create, manage, modify and/or search a

thesaurus using a computer

Page 77: Internet en WWW  voor het opsporen van informatie

77

Thesaurus relations

Term(s) with broader meaning

BT (= Broader Term)

RT (= Related Term) UF (= Use(d) For)Other term(s) Term Synonym(s)

NT (= Narrower Term)

Term(s) with narrower meaning

Page 78: Internet en WWW  voor het opsporen van informatie

78

!! Task - Assignment - Exercise !!

Try to find suitable search terms to retrieve documents on “pollution”from a database on marine science, by using for instance the thesaurus

included in the program for word processing that you use.

Page 79: Internet en WWW  voor het opsporen van informatie

79

Knowledge organisation: classifications, and thesaurus systems

Classification systems versus

thesaurus systems

Page 80: Internet en WWW  voor het opsporen van informatie

80

Knowledge organization:classifications versus thesauri

• Classification»Good for placement of documents in a library (because

documents on many related subjects can be kept together)»Not well suited for computer searching (too complicated)

• Thesaurus»Not suited for placement of documents in a library

(because documents with related subjects would NOT be kept together)

» Well suited for computer searching (relatively simple alphabetic listing of keywords)

Page 81: Internet en WWW  voor het opsporen van informatie

81

Computer networks, data communication and Internet

Introduction

Page 82: Internet en WWW  voor het opsporen van informatie

82

Computer networks: summary

The following gives an overview of computer networks and data communication:»The basic principles»Local area networks»National computers networks»International computer networks»The Internet»Future impact of digital communication networks

Page 83: Internet en WWW  voor het opsporen van informatie

83

Computer networks: prerequisites

Before using computer networks, you should ideally have some knowledge and skills related to

• computer hardware• computer software

Page 84: Internet en WWW  voor het opsporen van informatie

84

Data communication: a definition

• Interpersonal communication »Telecommunication

—Broadcast—Telephone—Data communication

–Remote login–File transfer–Hypertext transfer–Electronic mail–...

Page 85: Internet en WWW  voor het opsporen van informatie

85

01

Digital information

Multimedia / Hypermedia

Data communication: which types of ‘data’?

Linear textHypertext

Static imagesVideo

Sound

Programs for computers

Page 86: Internet en WWW  voor het opsporen van informatie

86

Data communication: which types of ‘data’?

• The same types of data (information) that can be stored and managed on a computer can be transferred over computer networks to one or several other computers.

• So the networks form an important extension of the stand-alone computers.

• “The network is the computer”

Page 87: Internet en WWW  voor het opsporen van informatie

87

Data communication: applications (Part 1)

• Hard-copy transfer (Fax)• Online use of the processing power of a remote computer• Online access to information sources !

»library catalogues, »bookshop catalogues, »publisher’s catalogues, »campus-wide and community information systems, »(text or multimedia) databases, »network-based journals, ...

Page 88: Internet en WWW  voor het opsporen van informatie

88

Data communication: applications (Part 2)

• Software-downloading• Electronic mail from a person to one or several persons• Computer-network based interest groups • Online talking / chatting (IRC,...)• Video conferencing (Cu-seeme, ...)• Selling, shopping, buying,..• ...

Page 89: Internet en WWW  voor het opsporen van informatie

89

Data communication: modems

• description: MODulator-DEModulator: device to convert digital data signals into a suitable form for transmission along a telecommunications channel, and to convert them back upon receipt into machine readable form.

• types»(Acoustic coupler)»Free standing box»Board/card to plug-in

microcomputer

Page 90: Internet en WWW  voor het opsporen van informatie

90

Computer network protocols: definition

• When 2 computer systems communicate via network, they do that by exchanging messages.

• The structure of network messages varies from network to network.

• Thus the message structure in a particular network is agreed upon a priori and is described in a set of rules, each defined in a protocol.

Page 91: Internet en WWW  voor het opsporen van informatie

91

Computer networks, data communication and Internet

Local Area Networks

Page 92: Internet en WWW  voor het opsporen van informatie

92

Data communication with a server in a Local Area Network

• (Terminal)

• Microcomputer with serial line communications software /terminal emulation software

• Microcomputer with network card and network software

Network Network serverserver

Page 93: Internet en WWW  voor het opsporen van informatie

93

LAN software packages for heterogeneous networks: examples

Based on TCP/IP (protocol suite used in Internet)• For DOS:

NCSA (= National Center for Supercomputing Applications) CUTCP, PC/NFS,...

• For Windows 3.x: PC/NFS, PC/TCP, Trumpet TCP Manager,...

• For Windows 95, 98,...: included!• For Windows NT, 2000,...: included!

Examples

Page 94: Internet en WWW  voor het opsporen van informatie

94

Computer networks, data communication and Internet

National Wide Area Networks

Page 95: Internet en WWW  voor het opsporen van informatie

95

National Wide Area Networks

• Public access national packet switching networks

• Research computer networks

• Public access made available by Internet Service Providers

• ...

Page 96: Internet en WWW  voor het opsporen van informatie

96

National research computer networks: examples

• Belgium: BELNET• Finland: FUNET• Germany: DFN• The Netherlands: Surfnet• United Kingdom: JANET (Joint Academic Network)• ...

Examples

Page 97: Internet en WWW  voor het opsporen van informatie

97

Computer networks, data communication and Internet

International computer networks

Page 98: Internet en WWW  voor het opsporen van informatie

98

International computer networks: examples

• National public data communication networks linked together

• FidoNet• Bitnet / EARN• Usenet• Internet!• ...

Examples

Page 99: Internet en WWW  voor het opsporen van informatie

99

Computer networks, data communication and Internet

The Internet data communication network

Page 100: Internet en WWW  voor het opsporen van informatie

100

?? Question ??

What is the Internet?

Page 101: Internet en WWW  voor het opsporen van informatie

101

@

The Internet data communications network (Part 1)

• “Internet” is not well-defined.

• A network of smaller networks:The global collection of interconnected local area, regional and wide-area (national backbone) networks which use the TCP/IP suite of data communication protocols.

Page 102: Internet en WWW  voor het opsporen van informatie

102

The Internet data communications network (Part 2)

• Links computers of various types.

• Is constantly growing.

• The analogy of a superhighway has been used to describe the emerging system of networked computers.

• The Internet has no owner, and is not managed by one organization. @

Page 103: Internet en WWW  voor het opsporen van informatie

103

The Internet: access from your Local Area Network

Your microcomputer

Local Area Network (LAN)

One of the national networks

The global Internet

Page 104: Internet en WWW  voor het opsporen van informatie

104

Host computers in the Internet: definition

• A host (computer) is a domain name that has a unique IP address record associated with it.

• Could be any computer connected to the Internet by any means.

• For instance: www.vub.ac.be

@

Page 105: Internet en WWW  voor het opsporen van informatie

105

Transmission Control Protocol / Internet Protocol (TCP/IP)

• the main suite of transport protocols used on the Internet for connectivity and transmission of data across heterogeneous systems

• “glue that holds the Internet together”• an open standard• available on most Unix systems, VMS and other

minicomputer systems, many mainframe and supercomputing systems and some microcomputer and PC systems

Page 106: Internet en WWW  voor het opsporen van informatie

106

Internet: addresses of computers with the Domain Name System

• Internet style = Domain name system• The Internet naming scheme consists of a hierarchical

sequence of names from the most specific to the most general (left to right), separated by dots.

computer.subdomain.domain.(country if not USA) OR n1.n2.n3.n4

where n is a natural number (8-bit)

Page 107: Internet en WWW  voor het opsporen van informatie

107

Internet: growth in number of hosts worldwide: linear plot

0

5000000

10000000

15000000

20000000

1993 1994 1995 1996 1997 1998January of each year

Page 108: Internet en WWW  voor het opsporen van informatie

108

Internet Service Provider= ISP

Internet Service Providers provide their clients access to Internet + in many cases»an email address / server»space for a web site»software tools to start»training»technical support»an accessible location for a WWW site of the client»assistance with WWW site design and promotion

Page 109: Internet en WWW  voor het opsporen van informatie

109

Microcomputer -- external computer: some ways of data communication

Modem

External computerGateway computer system

Private/academic data comm. network (e.g. Internet)

Intern Extern

Local PAD

Leased, fixed communication line

Tele-phone

Public data comm. network

Voice telecommunication network

LAN

TelePAD

ISDN

Micro-computer

Page 110: Internet en WWW  voor het opsporen van informatie

110

Online communication: remote login and file transfer

Remote terminal log-in / access

Page 111: Internet en WWW  voor het opsporen van informatie

111

Remote terminal log-in / access: definition

The ability to access a computer from outside a building in which it is housed.

This requires communications hardware, software, and actual physical links,although this can be as simple as common carrier (telephone) lines or as complex as telnet login to another computer across the Internet.

Page 112: Internet en WWW  voor het opsporen van informatie

112

Online communication: remote login and file transfer

Telnet in the Internet

Page 113: Internet en WWW  voor het opsporen van informatie

113

Telnet: description

• The Internet standard protocol for remote terminal connection service; on top of the TCP/IP protocol suite

• Allows a user at one site to interact with a remote timesharing system at another site as if the user's terminal was connected directly to the remote computer

• Includes VT100 terminal emulation

Page 114: Internet en WWW  voor het opsporen van informatie

114

Online communication: remote login and file transfer

Downloading and file transfer

Page 115: Internet en WWW  voor het opsporen van informatie

115

Data communication: downloading by copying a fragment

Capturing a small fragment of the information displayed: 1. select information on the display, 2. copy, and 3. paste in a document managed by another program.

Page 116: Internet en WWW  voor het opsporen van informatie

116

Online communication: remote login and file transfer

File transferftp in the Internet

Page 117: Internet en WWW  voor het opsporen van informatie

117

Data communication: file transfer

• Copying + downloading / transfer of a whole file• Requires a transfer protocol with error correction

Page 118: Internet en WWW  voor het opsporen van informatie

118

World-Wide Web = WWW

Introduction

Page 119: Internet en WWW  voor het opsporen van informatie

119

The World-Wide Web:prerequisites

Before using the WWW you should ideally already have learned to understand and to use

• computer hardware• computer software• the Internet• older methods for online communication, such as telnet

Page 120: Internet en WWW  voor het opsporen van informatie

120

The WWW: example of a welcome page

Example

Page 121: Internet en WWW  voor het opsporen van informatie

121

URL = Universal Resource Locator

• = draft standard for specifying an object on the Internet• the structure is in most cases

protocol://computer_address[/path_name/file_name]• examples:

»telnet://biblio.vub.ac.be»ftp://ftp.vub.ac.be/»gopher://gopher.vub.ac.be/»http://www.vub.ac.be/BIBLIO/index.html»news://news.server.edu/comp.infosystems.www

Page 122: Internet en WWW  voor het opsporen van informatie

122

URLformat / structure

1. The first part of a URL, before the colon “:”, specifies the access method = protocol

2. The second part of the URL, after the colon “:”, is interpreted specific to the access method. In general, two slashes after the colon indicate a machine /computer name.

Page 123: Internet en WWW  voor het opsporen van informatie

123

?? Question ??

What is the difference between Internet and the World-Wide Web?

Page 124: Internet en WWW  voor het opsporen van informatie

124

The WWW is an application of Internet

• The World-Wide Web (WWW) is a service, an application of Internet.

• It is based on the Internet infrastructure. • So the WWW is newer than the Internet.

The concept of the WWW was created at the end of the 1980s when the Internet was already well established.

Page 125: Internet en WWW  voor het opsporen van informatie

125

The WWW is an application of Internet: scheme

Data communication

Internet

WWW

Page 126: Internet en WWW  voor het opsporen van informatie

126

The WWW: the essential elements

• Information delivery and access using hypertext/hypermedia documents/objects»html documents»http protocol: http clients http servers

• Integration of protocols in the Internet:»http servers offering html documents including links to

other http servers, telnet servers, ftp servers, nntp servers, gopher servers...

Page 127: Internet en WWW  voor het opsporen van informatie

127

Computer 1

The WWW: hyperlinks

Hyperlinks can link a part of a hypermedia document to• another part of the same document file• another document file on the same server computer• another document file on a server computer located

elsewhere in the world

Computer 2

Page 128: Internet en WWW  voor het opsporen van informatie

128

The WWW: hypertext mark-up language = HTML

• Hypertext mark-up language = HTML = the system of codes used by authors to build the hypertext-pages/files in WWW, for instance to create a title or an anchor.

• The codes are invisible / transparent for the user / reader.

Page 129: Internet en WWW  voor het opsporen van informatie

129

The WWW: hypertext transfer protocol = HTTP

• Hypertext transfer protocol = HTTP = the software conventions used by client and server programs for WWW to request and transfer hypermedia documents.

• The protocol must not be known by he user / reader = the protocol is invisible / transparent for the user.

• Analogous with the telnet, ftp and gopher protocol.

Page 130: Internet en WWW  voor het opsporen van informatie

130

?? Question ??

Briefly compare TCP/IP and HTTP.

Page 131: Internet en WWW  voor het opsporen van informatie

131

The WWW: pages and forms

• PagesMany documents developed for WWW are kept small and

are named “pages”.These often refer to several other “pages”.

• Forms = gateways to services and databases on server computers in WWW Some pages contain electronic forms, to be filled in by the

user.

Page 132: Internet en WWW  voor het opsporen van informatie

132

The WWWapplications

Analogous to gopher applications:• Access to online public access catalogues• Campus-wide information systems• Access to subject-oriented information• Access to computer file archives• Traveling / navigating through the Internet

via linked html-pages• Access to intranets within institutes / companies

Page 133: Internet en WWW  voor het opsporen van informatie

133

World-Wide Web = WWW

WWW client programs

Page 134: Internet en WWW  voor het opsporen van informatie

134

WWW: client / browse programs

• To access the WWW, you run a browser program. • The browser reads documents, and can fetch documents

from other sources. Information providers set up hypermedia servers which browsers can get documents from.

• The browser can display hypertext documents. Hypertext is text with pointers to other text. The browsers let you deal with the pointers in a transparent way: select the pointer, and you are presented with the text that is pointed to.

Page 135: Internet en WWW  voor het opsporen van informatie

135

WWW: examples of browsers for your own computer

Browsers are available for many computer platforms; in particular: browsers for Windows + Winsock:»Netscape»Microsoft Internet Explorer»...

Page 136: Internet en WWW  voor het opsporen van informatie

136

?? Question ??

Which client program do YOU use or will YOU use

to access the WWW?

Page 137: Internet en WWW  voor het opsporen van informatie

137

!! Task - Assignment - Exercise !!

Browse the WWW, using an available

browser client program.

Page 138: Internet en WWW  voor het opsporen van informatie

138

!! Task - Assignment - Exercise !!

Visualise the HTML source code of a WWW page,

using a WWW client program.What do you learn from this exercise about the basic properties of HTML?

Page 139: Internet en WWW  voor het opsporen van informatie

139

!! Task - Assignment - Exercise !!

Exploit the possibility to open more than one window, using a WWW client program

in Windows.

Page 140: Internet en WWW  voor het opsporen van informatie

140

?? Question ??

Why would you want to open more than one window

on WWW servers,using a WWW client program?

Page 141: Internet en WWW  voor het opsporen van informatie

141

World-Wide Web = WWW

Saving information from a web

Page 142: Internet en WWW  voor het opsporen van informatie

142

WWW: How to save information from a web?

Information displayed by your web browser/client program can be saved,

• by select, copy, paste in another document (and save)• by saving a complete page to your disk

»in separate files (for instance 1 HTML file + some image files)

»in 1 file, using Microsoft Internet Explorer 5 or a later version• by copying the information into an e-mail message that you

send to your own e-mail account

Page 143: Internet en WWW  voor het opsporen van informatie

143

!! Task - Assignment - Exercise !!

Copy some text fragment from WWWand paste it into another document

on your computer.

Page 144: Internet en WWW  voor het opsporen van informatie

144

!! Task - Assignment - Exercise !!

Save a text from WWW to disk, as HTML,

using a browser program.

Page 145: Internet en WWW  voor het opsporen van informatie

145

!! Task - Assignment - Exercise !!

Display an HTML file that you have saved

from the WWW to your disk,in a program for word processing.

Is the file displayed properly?

Page 146: Internet en WWW  voor het opsporen van informatie

146

World-Wide Web = WWW

The success of WWW

Page 147: Internet en WWW  voor het opsporen van informatie

147

WWW: growing number of WWW servers

01000000200000030000004000000500000060000007000000

1993 1994 1995 1996 1997 1998 1999 2000

Page 148: Internet en WWW  voor het opsporen van informatie

148

WWW as popular method to access information from computers

• The WWW has quickly become the most popular medium to access information that resides on various computers that are connected to a computer network.

Page 149: Internet en WWW  voor het opsporen van informatie

149

Online access information sources and services

Introduction

Page 150: Internet en WWW  voor het opsporen van informatie

150

Online information sources: summary

• The following gives a general overview of online accessible information sources.

• This overview is not limited to or focusing on a particular concrete subject domain/area.

Page 151: Internet en WWW  voor het opsporen van informatie

151

Online access to information: avoid network traffic jams

To access from Europe online information sources in the US, work when lines are not saturated.

(better in the morning than in the afternoon)

Page 152: Internet en WWW  voor het opsporen van informatie

152

Internet based information sources: problems / difficulties (Part 1)

• Redundancy and overlap:On the one hand, there is too much information on some topics; in other words, the redundancy and overlap are high in many cases. Too few information sources: On the other hand, there are too few information sources on some topics.

Page 153: Internet en WWW  voor het opsporen van informatie

153

Internet based information sources: problems / difficulties (Part 2)

• No order is imposed on most sources.Quality checks / quality controls are not performed.Related to this: it is not required to register new information offered. Is the information that you find real, honest, authentic?

Page 154: Internet en WWW  voor het opsporen van informatie

154

Internet based information sources: problems / difficulties (Part 3)

• Change is the only constant: Information sources are constantly changing, growing, but sometimes disappearing.

Page 155: Internet en WWW  voor het opsporen van informatie

155

Internet based information sources: problems / difficulties (Part 4)

• Scattering: There is no single simple but powerful system to find relevant information through the Internet.In other words: integration / aggregation is still far from perfect.

Page 156: Internet en WWW  voor het opsporen van informatie

156

Internet based information sources: problems / difficulties (Part 5)

• Slow: The Internet is in many places and for many applications not yet fast enough.

Page 157: Internet en WWW  voor het opsporen van informatie

157

Internet based information sources: problems / difficulties (Part 6)

• In conclusion: Surfing, using the Internet, the WWW, can be a time sink instead of a productive activity.

Page 158: Internet en WWW  voor het opsporen van informatie

158

Internet based information sources: how many? how much information?

• More than 10 million WWW sites (in 2003)

• More than 2000 million (= 2 billion) unique URLs in the total Internet (in 2002)

• More than 10 terabyte (= 10 000 gigabyte) of text data (in 2001)

Page 159: Internet en WWW  voor het opsporen van informatie

159

Online access information sources and services

Types of online access information systems

Page 160: Internet en WWW  voor het opsporen van informatie

160

Types of online access information systems: “free” versus “fee”

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

Page 161: Internet en WWW  voor het opsporen van informatie

161

Online access information sources and services

Dictionaries and encyclopaedias accessible through the WWW

Page 162: Internet en WWW  voor het opsporen van informatie

162

Dictionaries and encyclopedias through the WWW: introduction

• Dictionaries and encyclopedias are the first choice among many types of information sources, »when we do not need detailed information on a common

topic»when we want to prepare a more detailed search on an

unfamiliar topic, by searching for the right spelling, synonyms, context,…

• Some dictionaries and encyclopedias are available through the WWW free of charge.

Page 163: Internet en WWW  voor het opsporen van informatie

163

Dictionaries accessible through Internet and the WWW: example

• The American Heritage® Dictionary of the English Language»Over 200,000 entries,

70,000 audio word pronunciations, 900 full-page color illustrations

»Available free of charge from http://education.yahoo.com/reference/dictionary/

Example

Page 164: Internet en WWW  voor het opsporen van informatie

164

Dictionaries accessible through Internet and the WWW: compilation

• A compilation/collection of dictionaries can be searched simultaneously and free of charge: http://www.onelook.com/

Example

Page 165: Internet en WWW  voor het opsporen van informatie

165

Encyclopedias accessible through Internet and the WWW: examples

• Encarta Concise Free Encyclopedia »http://encarta.msn.com/»Available in English and in some other languages

Example

Page 166: Internet en WWW  voor het opsporen van informatie

166

Encyclopedias accessible through Internet and the WWW: examples

• Encyclopædia Britannica only a small part is available free of charge + links to selected WWW sites»http://www.britannica.com/

• Encyclopædia Britannica Concise»http://education.yahoo.com/reference/encyclopedia/

Example

Page 167: Internet en WWW  voor het opsporen van informatie

167

Encyclopedias accessible through Internet and the WWW: examples

• The Canadian Encyclopedia(in English and in French):»http://thecanadianencyclopedia.com/

Example

Page 168: Internet en WWW  voor het opsporen van informatie

168

Encyclopedias accessible through Internet and the WWW: overviews

• A list / overview of encyclopedia on the Internet:http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet can be found as a part of more general directories of Internet-based information sources.

Example

Page 169: Internet en WWW  voor het opsporen van informatie

169

Online access information sources and services

Internet directories and indexes

Page 170: Internet en WWW  voor het opsporen van informatie

170

Internet: meta-information about Internet information sources

• in printed manuals and guides:- it is not always possible to get a copy fast- it costs money to get a copy- they are soon out of date

• offered on the WWW!:+ directly available when we want to use the Internet+ many systems are accessible free of charge+ most systems are regularly updated

• (“intelligent agent” software on client PC)

Page 171: Internet en WWW  voor het opsporen van informatie

171

Internet: subject-oriented meta-information offered via WWW

Information about information sources: in the form of»subject guides = texts with references»subject hypertext directories = subject guides»key word indexes, generated automatically, for searching»collections of links or forms to the above»(multi-threaded search systems)

Page 172: Internet en WWW  voor het opsporen van informatie

172

Internet global subject directories:introduction

• They are virtual libraries with open shelves, for browsing.• They are manually generated, man-made by many

people.• They can be browsed following a tree structure or a more

complicated variation.• The most famous of these systems belong to the most

popular and most visited sites on the WWW: e.g. Yahoo!

Page 173: Internet en WWW  voor het opsporen van informatie

173

Internet global subject directories: structure

The structure corresponds to a classification that is in most cases specific for the particular overview. In other words: the well-known and classical universal classification systems are not used in most Internet directories.

Page 174: Internet en WWW  voor het opsporen van informatie

174

Internet global subject directories: pros and cons

• They cover a small number of selected WWW sites, in comparison with the total number of sites that are accessible.

+ The selected, included sites should be better than average. - They are not suitable for deep, detailed, specific searches

with a high coverage.

Page 175: Internet en WWW  voor het opsporen van informatie

175

Internet global subject directories:why use one?

• They are suitable mainly for broad searches that can be difficult to formulate in words, but NOT for more specific searches that require combinations of several concepts.

Page 176: Internet en WWW  voor het opsporen van informatie

176

Internet global subject directories:searching directories with a query

• Many of the Internet directories include an index to search their contents with a query.

• However, then the assisting classification structure is not well exploited and the user should be aware of the problems and difficulties of information retrieval with natural language queries.

• Furthermore, the possibility to use the system in this way may be confusing, as these directories are not real full-text Internet indexes, like those provided by other search tools.

Page 177: Internet en WWW  voor het opsporen van informatie

177

Internet global subject directories: Yahoo!

• A hypertext global subject directory can be found at http://www.yahoo.com/

and at many other sites, includinghttp://www.yahoo.co.uk/

• Entries are NOT rated.• Accessible free of charge.

Page 178: Internet en WWW  voor het opsporen van informatie

178

Internet global subject directories: Google directory

• A hypertext global subject directory can be found athttp://directory.google.com/

• Accessible free of charge.• Based on the Netscape DMOZ

Open Directory Project.• Do not confuse this with the famous Google WWW search

engine.

Page 179: Internet en WWW  voor het opsporen van informatie

179

Internet global subject directories: Open Directory Project

• A hypertext global subject directory can be found athttp://www.dmoz.org/

• The contents is also used in other systems,such as Google Directory and Webbrain.

• Accessible free of charge.

Page 180: Internet en WWW  voor het opsporen van informatie

180

!! Task - Assignment - Exercise !!

Try to find Internet sourceswhich are relevant for you, by using an Internet-based

global subject directory.

Page 181: Internet en WWW  voor het opsporen van informatie

181

Internet local subject directories: examples in Belgium

• http://yellow.advalvas.be/weblist.html• http://search.msn.be/

• The guide developed by the public libraries in Flanders: http://www.bib.vlaanderen.be/webwijzer

        

Page 182: Internet en WWW  voor het opsporen van informatie

182

Internet indexes:automated search tools

• Several systems allow to search for and to locate many items (addressable resources) in the Internet in a more systematic, direct way than by only browsing/navigating.

• These systems do NOT search the contents of computers through the real Internet in real time and completely when a user makes a query. Searching in that way would be much too slow due to limitations in the technology.

Page 183: Internet en WWW  voor het opsporen van informatie

183

Internet indexes: scheme of the mechanism

User searching for Internet based information

Internet client hardware and software

user interface to a search engine Internet information source

Internet index search engine Internet crawler and indexing system

database of Internet files, including an index

Page 184: Internet en WWW  voor het opsporen van informatie

184

Internet indexes:description of the mechanism

Each of these search systems is based on:• a database of links to pages / URLs that can be retrieved by

searching with queries through a big index that is built machine-made on the basis of the contents, the texts, of these pages(to build this database and to keep it up to date, pages are continuously collected from the Internet by a “robot” computer software system)

• a search system with a user interface in a WWW form, to allow the user to search through that database

Page 185: Internet en WWW  voor het opsporen van informatie

185

Internet indexes:AltaVista

• The primary search interface can be found in the US. The following addresses all lead to the same information:»http://www.altavista.com/»http://www.av.com/»http://av.com/

• Mirror site in UK:»http://uk.altavista.com/»http://www.altavista.co.uk/

Page 186: Internet en WWW  voor het opsporen van informatie

186

Internet indexes:AltaVista: features

• Allows full text searching of the WWW• Offers relevance ranking of search results• Allows also advanced Boolean searching

(in “Advanced” mode)• Offers a link to an Internet subject directory (Looksmart)• Offers links to systems to find

images, sounds… (multimedia) in the Internet

Page 187: Internet en WWW  voor het opsporen van informatie

187

Internet indexes:All the Web

• The search interface can be found at:http://www.alltheweb.com/http://alltheweb.com/

• You can search the WWW and ftp servers.• The database is one of the biggest.• Not only HTML and plain text files, but also the full text

of many Adobe PDF files is indexed.• Offers also a module to search for pictures/images.• Offers spelling suggestions in the search interface.

Page 188: Internet en WWW  voor het opsporen van informatie

188

Internet indexes: Google (Part 1)

• http://www.google.com/• Full-text searching is possible of many files that are

available through the WWW.• Not only HTML and plain text pages are covered, but also

the first part is indexed of many files in other file formats, such as »Adobe PDF, »Microsoft Word, Microsoft Excel, Microsoft PowerPoint »Rich Text Format…

Page 189: Internet en WWW  voor het opsporen van informatie

189

Internet indexes: Google (Part 2)

• One of the most popular systems in 2001, 2002, 2003…• For retrieval an algorithm is used that takes into account

the links between WWW pages.A retrieved page is ranked higher when »many sites/pages point to it»“important” sites/pages point to it

• Some other famous search systems are based on Google such as Netscape Search and the WWW searches of Yahoo! (at least in 2003).

Page 190: Internet en WWW  voor het opsporen van informatie

190

Internet indexes: Google computer servers

• Google uses a system of more than 10 000 small computer servers to offer it’s information services.

Page 191: Internet en WWW  voor het opsporen van informatie

191

Internet indexes: Google additional features

• Besides a system to search for WWW pages, Google offers also »a subject directory»searching for images/pictures on the WWW»searching an archive of Usenet messages +

posting to Usenet groups»searching for news

• Thus Google has become a great integrator / aggregator.

Page 192: Internet en WWW  voor het opsporen van informatie

192

Internet indexes: coverage

• Internet indexes do not cover all static documents on the WWW.

• Most indexes grow and their “size ranking” is variable.• If exhaustive results are desired, then more than one

Internet index search system should be used.

Page 193: Internet en WWW  voor het opsporen van informatie

193

Internet indexes: coverage and size of each index

• Most indexes grow and their “size ranking” is variable.• The biggest systems in 2003:

» Google !» AltaVista» All the Web (serving also Lycos)» Systems based on the INKTOMI database of WWW

pages.

Page 194: Internet en WWW  voor het opsporen van informatie

194

!! Task - Assignment - Exercise !!

Try to find Internet sourceswhich are relevant for you, by using an Internet index.

Page 195: Internet en WWW  voor het opsporen van informatie

195

Internet information sources

Coverage of Internet directories and Internet indexes

A global Internet indexA global Internet directory

Page 196: Internet en WWW  voor het opsporen van informatie

196

Global Internet search tools: a comparison

Global Internet directories

• Only a limited selection of Internet sources

• Browsing information sources is easy

• Good for broad searches

Global Internet indexes

• About 1/3 of the Internet is covered by an index

• Searching requires some skills and knowledge

• Good for specific, narrow searches

Multi-threaded search systems

• These get information from directories and indexes

• Searching requires some skills and knowledge

• Good when even 1 index does not yield information

Page 197: Internet en WWW  voor het opsporen van informatie

197

Internet: who owns the search tools?

In 2003:• The company Yahoo! owns

»the most famous global Internet subject directory»3 (!) Internet full-text search engines:

All the Web, AltaVista, Inktomi• The company Google owns

»the most famous Internet full-text search engine»one of the best Internet image search engines»a gateway to old and new Usenet news messages

Page 198: Internet en WWW  voor het opsporen van informatie

198

Online access information sources and services

Public access book databases

Page 199: Internet en WWW  voor het opsporen van informatie

199

Public access book databases: introduction

• Even in this age of Internet-based information sources, a lot of information is still distributed in the form of printed books.

• The contents of most books is (still) not available on the Internet.

• Most general Internet search tools do NOT allow you to find out about the existence of books that may be interesting for you.

• So, specific search tools to find books can be useful.

Page 200: Internet en WWW  voor het opsporen van informatie

200

Public access book databases: an overview

• (Databases by publishers.)• Fee-based databases by commercial providers• Databases by book distributors / bookshops!• Online public access catalogues of

»local libraries,»national libraries (which produce and offer normally their

national bibliography)!»big, famous libraries!!

• (Databases of computer-based versions of books.)

Page 201: Internet en WWW  voor het opsporen van informatie

201

Public access book databases: which one to use?

• For years, the market of bibliographic information on books was limited to the services and databases of subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many possibilities to find bibliographic information.

• Which book database should be preferred for particular applications is not clear for most librarians or end-users.

Page 202: Internet en WWW  voor het opsporen van informatie

202

Public access book databases by commercial producers

• To find currently available books, some databases assembled by commercial producers can be interesting.

• Example: Global Books in Print• These databases offer formal descriptions of books,

prices of the books, short descriptions of the contents with subject terms…

• However, access to such a database is not free of charge and can be expensive (in comparison with alternatives).

Page 203: Internet en WWW  voor het opsporen van informatie

203

Public access book databases provided by bookshops

• To find currently available books, the bibliographic databases assembled by big bookshops are interesting.

• Several offer a good coverage and are accessible free of charge.

• The added price information can be useful for the acquisition and accounting department of a library or if an individual user wants to buy a book.

• Some provide a current awareness service, also free of charge.

Page 204: Internet en WWW  voor het opsporen van informatie

204

Book databases accessible free of charge: examples in U.S.A.

• Amazon.com (US):http://www.amazon.com/ http://www.amazon.co.uk/ note: amazon, NOT amazoneSubject description is poor.

• Barnes and Noble (US):http://www.bn.com/

Examples

Page 205: Internet en WWW  voor het opsporen van informatie

205

Free public access bibliographic book database + price comparisons

• Even comparisons of the catalogues of shops of books (as well as of music, movies and many other goods) are available free of charge.

• See for instance»http://www.bookfinder.com/»http://www.dealtime.com/

Page 206: Internet en WWW  voor het opsporen van informatie

206

!! Task - Assignment - Exercise !!

Search for titles of bookswhich are relevant for you,

using an online database provided by a book publisher or bookshop.

Page 207: Internet en WWW  voor het opsporen van informatie

207

Online Public Access Catalogues of libraries

• Mainly to find older books, the catalogues of libraries can be useful.

• Most are accessible online and free of charge.

Page 208: Internet en WWW  voor het opsporen van informatie

208

Online access information sources and services

Fee-based online public access information services

Page 209: Internet en WWW  voor het opsporen van informatie

209

Types of online access information systems: “free” versus “fee”

• A lot of the information on the Internet is available free of charge, but another part is only accessible when a fee is paid to the producer and / or the distributor.

• The first commercial computer systems that make information available online were born around 1975. Most of them are now also available through the Internet.

• Some organisations pay these fees for some sources and then organise access, so that the members of the organisation can retrieve and exploit the information as if it is free of charge.

Page 210: Internet en WWW  voor het opsporen van informatie

210

Types of online access information systems: “free” versus “fee”

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

Page 211: Internet en WWW  voor het opsporen van informatie

211

Types of online access information systems: “free” for members only

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

Fee-based online information services, made accessible “free of charge”

by an institute to its members

Page 212: Internet en WWW  voor het opsporen van informatie

212

Online information services:total size of their databases

In 1999:The big host systems and the public access WWW pages offer a

comparable quantity of information:• WWW offered about 8 terabytes (= 8 000 gigabytes) of text data(according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.)

• Dialog offered about 9 terabytes (= 9 000 gigabytes) (in 1998)»6 billion pages of text»3 million images

Page 213: Internet en WWW  voor het opsporen van informatie

213

Online access information sources and services

Online access databases about journal articles

Page 214: Internet en WWW  voor het opsporen van informatie

214

Online access databases about journal articles: overview

• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers.

• Many publishers offer searchable bibliographies, but only of their own publications. (for instance Emerald, Elsevier)

• Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.

Page 215: Internet en WWW  voor het opsporen van informatie

215

Online access databases about journal articles: Article@INIST

• Article@INIST allows you to search in a bibliographic database, NOT full-text, (Journal articles, journal issues, books, reports, conferences, doctoral dissertations) at the Institut de l'Information Scientifique et Technique, France.

• Does not offer usage of classification or thesaurus.• Searching is free of charge.• Available from http://form.inist.fr/public/eng/conslt.htm• Payment is required to receive the full text of an article.

Page 216: Internet en WWW  voor het opsporen van informatie

216

Online access databases about journal articles: Ingenta (1)

• Ingenta Journals allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts.

• Searching is free of charge.

Page 217: Internet en WWW  voor het opsporen van informatie

217

Online access databases about journal articles: Ingenta (2)

• Payment is required to receive the full text of an article.• Available from

»http://www.ingenta.co.uk/»http://www.ingenta.com/

• Ingenta has acquired Uncover in 2000.

Page 218: Internet en WWW  voor het opsporen van informatie

218

Online access databases about journal articles: Infotrieve

• Infotrieve allows you to search free of charge in a bibliographic database of the articles of more than 20 000 journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/• Payment is required to receive the full text of a document.• Current awareness services are also offered free of

charge: the table of contents of new issues of the journals that you have selected are sent to you by email.

Page 219: Internet en WWW  voor het opsporen van informatie

219

Online access databases about journal articles: Scirus

• This is a specialised Internet index that allows you to search for selected scientific information (only) on the WWW. This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier.

• An article can be downloaded in full-text format only when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

Example

Page 220: Internet en WWW  voor het opsporen van informatie

220

Online access databases about journal articles: Scirus features

• Offered free of charge by Elsevier.• Is partly based on the Fast WWW search system that is

also used by Alltheweb.• Offers access to information ordered according to some

classification system / taxonomy.

Example

Page 221: Internet en WWW  voor het opsporen van informatie

221

Online access information sources and services

Finding multimedia files on the Internet

Page 222: Internet en WWW  voor het opsporen van informatie

222

Finding multimedia files on the Internet: introduction

Several public access search systems are available free of charge, to search the Internet for multimedia files: »images / pictures (either artwork, either photos, or both)»sound / audio files (music, speeches...); video

Page 223: Internet en WWW  voor het opsporen van informatie

223

Finding images on the Internet:introduction

• Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet.

• When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).

Page 224: Internet en WWW  voor het opsporen van informatie

224 Examples

Finding images on the Internet:screen shot of a Google image search

Page 225: Internet en WWW  voor het opsporen van informatie

225Examples

Finding images on the Internet:examples of search engines (1)

• http://alltheweb.com/ !!• http://gallery.yahoo.com/ !• http://images.google.com/ !!!

or through http://www.google.com/The largest database in this category (at least in 2002, 2003). For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.

Page 226: Internet en WWW  voor het opsporen van informatie

226Examples

Finding images on the Internet:examples of search engines (2)

• http://multimedia.lycos.com/• http://www.altavista.com/ !!

(also audio and video, choose not the normal text search, but IMAGES in the user interface.)

Page 227: Internet en WWW  voor het opsporen van informatie

227

!! Task - Assignment - Exercise !!

Use a specialised search engineto find images

about a particular subject on the Internet.

Page 228: Internet en WWW  voor het opsporen van informatie

228

Online access information sources and services

Evolution and future trends

Page 229: Internet en WWW  voor het opsporen van informatie

229

Online access information: evolution and future trends

• An increasing amount of information becomes available online.

• A growing amount of this online information becomes available free of charge.

• The quality and ease of use of software on server as well as client is growing.

A consequence is:• An increasing number of end-users searching for

information online.

Page 230: Internet en WWW  voor het opsporen van informatie

230

Online access information: easier and more complicated?!

• At the same time, information retrieval becomes both easier and also more complicated.This may seem strange and contradictory, but it is reality.

This is a paradox.

Page 231: Internet en WWW  voor het opsporen van informatie

231

Online access information: easier information retrieval systems

• Individual information retrieval systems become easier: »they react faster; »they can provide access to more data/information in one

action;»their user interfaces are simple,

but more sophisticated, intelligent retrieval algorithms can nevertheless deliver satisfactory results in most simple cases.

Page 232: Internet en WWW  voor het opsporen van informatie

232

Online access information: more complicated information market

• The whole information landscape consists of more and more decentralised information sources, each one bringing an individual user interface that should be mastered. Making the right, ideal choice among the sources becomes not easier, perhaps even more complicated every day.

Page 233: Internet en WWW  voor het opsporen van informatie

233

Online access information: more complicated information market

• Furthermore, for many sources the accessibility / availability, the user interface, the interlinking, depend on the organisation in which the searcher is active.

Page 234: Internet en WWW  voor het opsporen van informatie

234

Online access information: conclusion

• In the case of simple information needs, the WWW and the search tools can work like “magic”.

• However, in the case of more complicated information needs, there is still is no “magic button” that brings you immediately to all the required information.

Page 235: Internet en WWW  voor het opsporen van informatie

235

Evaluating the quality of information

Documentary information sources: evaluating their quality

Page 236: Internet en WWW  voor het opsporen van informatie

236

Documentary information sources: evaluating their quality

• We should always be critical when using information sources, in view of »the widely varying degrees of quality of information

sources, and of»the costs associated with searching, finding, using

information.

Page 237: Internet en WWW  voor het opsporen van informatie

237

Documentary information sources: evaluation criteria (1)

• Is the information valid, reliable, trustworthy, genuine, authentic? Is the author honest? Is the source objective, not subjective, without cultural or political or ideological or commercial bias? Is the origin an individual or a company or an organisation?Is the publication sponsored by some company or organisation?

Page 238: Internet en WWW  voor het opsporen van informatie

238

Documentary information sources: evaluation criteria (2)

• Is the information accurate, correct? Who is the author or producer? Has the source an author or a producer with a high expertise, a good reputation, good qualifications?Can the author be contacted for clarification or discussion?

Was the information reviewed, edited, improved, corrected, censored, approved, verified, before publication? Do experts agree on the information provided?

Page 239: Internet en WWW  voor het opsporen van informatie

239

Documentary information sources: evaluation criteria (3)

• Is the information source unique? Does it offer a great amount of primary information, which is not obtainable from other sources?

• Is the information complete? Is the work available in its entirety?

• Does the source offer a wide coverage? Is the source comprehensive, substantive?

• Is the information current enough, up to date? Is a publication date provided?Is an expiration date provided?

Page 240: Internet en WWW  voor het opsporen van informatie

240

Documentary information sources: evaluation criteria (4)

• Does the document provide suitable references, so that you can verify statements and find older suitable information sources?

• Good clear format and lay-out of the information / User-friendly information system / Easy for users to orientate themselves within the resource and to find their way around it?

• Good user support / Good customer support?• Is the type of distribution medium appropriate?

(print, e-mail, online,...)

Page 241: Internet en WWW  voor het opsporen van informatie

241

Documentary information sources: evaluation criteria (5)

• Is the information what you want?If not, then reassess your needs and consider other types of information as well.

Page 242: Internet en WWW  voor het opsporen van informatie

242

Documentary information sources: evaluation criteria (6)

• Is the information suitable for your level of understanding of the subject? Is the document popular, suitable for the general public, for students, for professionals, for scholarly/academic use…?Does it report new, primary research (survey, experiment, observation, measurement, invention) or is it a review of sources published earlier?

• Does the information repeat or confirm what you already know, or is it complementary, contradictory, new?