Zoveel informatie, zo weinig tijd

187
1 Zo veel informatie Zo weinig tijd [email protected] Created to support a presentation at the bi-annual 2-day conference series “Informatie” organised by VVBAD, in Oostende, Belgium September 10-11, 2009 “Informatie aan zee”

description

Zoveel informatie, zo weinig tijdPaul Nieuwenhuysen, VUBInformatie aan Zee10 september 2009Kursaal OostendeZaal Delvaux

Transcript of Zoveel informatie, zo weinig tijd

Page 1: Zoveel informatie, zo weinig tijd

1

Zo veel informatieZo weinig tijd

[email protected] to support a presentation

at the bi-annual 2-day conference series “Informatie”organised by VVBAD, in Oostende, BelgiumSeptember 10-11, 2009“Informatie aan zee”

Page 2: Zoveel informatie, zo weinig tijd

2

0. Introduction with problem statements

1. Methods to make information retrieval efficient in a world of scattered sources

2. Applications of those methods

3. Comparison of the methods

4. Conclusions

contents = summary = structure= overview

of thispresentation

Page 3: Zoveel informatie, zo weinig tijd

3These slides should be available from the WWW sitehttp://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/(note: BIBLIO and not biblio)and also from the WWW site of the organisers of the conference =

VVBAD

Page 4: Zoveel informatie, zo weinig tijd

4

Information Retrieval in a World of Scattered Information Sources

0. Introduction and problem statements

Page 5: Zoveel informatie, zo weinig tijd

5

Introduction: scattering of sources

• Users want to exploit information sources fast and effectively.

• This is hindered by the fact that digital, electronic information sources that may contain relevant information are created and scattered, distributed on numerous computers all over the intranet of the user’s organization AND over the Internet and the WWW.

Page 6: Zoveel informatie, zo weinig tijd

6

Introduction: scattering of sources

• In other words: integration / aggregation is still far from perfect.

Page 7: Zoveel informatie, zo weinig tijd

7

Introduction: scattering of sources difficulties

• Using many information retrieval systems costs time:1. They must be used one after the other which requires

many decisions and actions

Page 8: Zoveel informatie, zo weinig tijd

8

Introduction: scattering of sources difficulties

• Using many information retrieval systems costs time:2. They offer different user interfaces in the retrieval phase,

which is confusing

Page 9: Zoveel informatie, zo weinig tijd

9

Introduction: scattering of sources difficulties

• Using many information retrieval systems costs time:3. They offer found information items in various data

formats

Page 10: Zoveel informatie, zo weinig tijd

10

Introduction: scattering of sources difficulties

• Using many information retrieval systems costs time:4. They display found items in different ways on a computer

screen

Page 11: Zoveel informatie, zo weinig tijd

11

Introduction: scattering of sources difficulties

Small = BEAUTIFUL

Page 12: Zoveel informatie, zo weinig tijd

12

Introduction: scattering of sources difficulties

Page 13: Zoveel informatie, zo weinig tijd

13

Introduction: problem statements

1. Which methods have been developed and applied to cope with this reality?

Page 14: Zoveel informatie, zo weinig tijd

14

Introduction: problem statements

2. Which concrete applications are available and how can an end-user exploit systems created in this domain?

Page 15: Zoveel informatie, zo weinig tijd

15

Introduction: problem statements

3. How can information intermediaries evaluate and apply these methods to bring information more efficiently to end-users?

Page 16: Zoveel informatie, zo weinig tijd

16

Information Retrieval in a World of Scattered Information Sources

1. Methods to make information retrieval efficient

in a world of scattered sources

Page 17: Zoveel informatie, zo weinig tijd

17

Method 1: Merging = aggregating into a searchable database

Search engine Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

UserUser UserUser

Dor

Page 18: Zoveel informatie, zo weinig tijd

18

Method 2: Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 19: Zoveel informatie, zo weinig tijd

19

Both methods offer benefits to the users

+ Saves the users time executing queries to various servers or browsing through various systems.

Page 20: Zoveel informatie, zo weinig tijd

20

Both methods offer benefits to the users

+ Offers a uniform / consistent display of results in the output phase.

Page 21: Zoveel informatie, zo weinig tijd

21

Both methods offer benefits to the users

+ Some systems offer tools to refine display of the results; for instance + to deduplicate very similar items in the result set,+ to sort the results, + to rank the results, + to visualize the results in a more graphical way,+ to search within the result set,+ … ☺

Page 22: Zoveel informatie, zo weinig tijd

22

Both methods bringdifficulties / challenges / problems

- In many cases there are differences among the merged sources in the formatting/structuring of their database records in fields. This hinders - searching limited to a field - displaying selected fields only (such as title)- sorting of the displayed records on the contents of a

particular selected field (such as author or date)

Page 23: Zoveel informatie, zo weinig tijd

23

Both methods bringdifficulties / challenges / problems

- In many cases there are differences among sources in the metadata schemes that are applied in the databases to improve retrieval, such as»classifications»taxonomies»thesaurus systems»ontologies

This hinders the exploitation of the added value of such metadata.

Page 24: Zoveel informatie, zo weinig tijd

24

Both methods bringdifficulties / challenges / problems

- How to deduplicate/dedupe/cluster very similar entries/results/items= near-duplicates, from various target sources? When is similar similar enough? Which entry/result/item to choose/select as the representative of a cluster of similar entries?

Page 25: Zoveel informatie, zo weinig tijd

25

Both methods bringdifficulties / challenges / problems

- When some special, non-standard, dedicated retrieval software is made available by a specific target source database, to offer special features to the user to exploit the database better than with a more classical standard retrieval interface, then this may be lost in the new retrieval system.Searches are reduced to the lowest common denominator.Examples: - clustering of results- deduplication of results…

Page 26: Zoveel informatie, zo weinig tijd

26

Method 1: Merging = aggregating into a searchable database

Search engine Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

UserUser UserUser

Dor

Page 27: Zoveel informatie, zo weinig tijd

27

Data Providers

Client computer

+ client

software

ServiceProviderSearch

& retrieval Metadata

databaseserver PMH

http protocol

request

http protocol

metadata

metadata

metadataDigital objects

useruser

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

Page 28: Zoveel informatie, zo weinig tijd

28

Merging into a searchable database offers benefits for the users

+ Applicable even in the absence of data communication to remote servers(whereas federated searching needs good, fast data communication.)Therefore this is the relatively ‘old’ method.

Page 29: Zoveel informatie, zo weinig tijd

29

Merging into a searchable database brings difficulties / challenges

- The contents of the aggregated database is less up to data than the original information sources.The importance of this aspect depends of course - on the particular application- on the time delay

Page 30: Zoveel informatie, zo weinig tijd

30

Method 2: Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 31: Zoveel informatie, zo weinig tijd

31

Federated searching: terminology / vocabulary / synonyms

federated searching= meta-searching = metasearching= cross-database searching= multi-database searching= multi-threaded searching= one-stop searching= poly-searching = polysearching= broadcast searching= searching through a portal / gateway

Page 32: Zoveel informatie, zo weinig tijd

32

Federated searchingthrough scattered databases: why?

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum4. Something nice to read (free via your library)

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum4. Something nice to read (free via your library)

Page 33: Zoveel informatie, zo weinig tijd

33

Federated searching: application:finding a suitable flight

Example:• http://CheapTickets.com/ for the USA

Example

Page 34: Zoveel informatie, zo weinig tijd

34

Federated searching: application:finding a hotel room in some city

Example

Page 35: Zoveel informatie, zo weinig tijd

35

Federated searching: searching in a museum

Example

Page 36: Zoveel informatie, zo weinig tijd

36

Federated searching: searching in a library

Example

Page 37: Zoveel informatie, zo weinig tijd

37

Meta-searching systemMeta-searching system

Catalog database(s)

of other libraries

Catalog database(s)

of other libraries

Federated searching: integrating access

Databases(full-text or bibliographic)

Databases(full-text or bibliographic)

PublishersPublishers

JournalsJournals

ArticlesArticlesIntranetIntranet

Local library catalog database(s)

Local library catalog database(s)

WWW search engines

WWW search engines

Page 38: Zoveel informatie, zo weinig tijd

38

Federated searching: benefits for the users

+ The system can help the user to select appropriate sources.

Page 39: Zoveel informatie, zo weinig tijd

39

Federated searching: benefits for the users

+ The system can help in the process of authentication and authorization when this involves not only a simple recognition of IP-address of the user’s client computer, but when it involves user-id’s and passwords.

Page 40: Zoveel informatie, zo weinig tijd

40

Federated searching: benefits for the users

+ The need to know which particular database is suitable for a particular search is reduced, because several ones can be searched in one action.

Page 41: Zoveel informatie, zo weinig tijd

41

Federated searching: benefits for the users

+ The users have to learn only 1 user interface for searching and only 1 search syntax, instead of a user interface and a search syntax for each database!

Page 42: Zoveel informatie, zo weinig tijd

42

Federated searching: benefits for the users

+ Can make users search and exploit databases that they would never use otherwise, that is without federated search system!

Page 43: Zoveel informatie, zo weinig tijd

43

Federated searching: benefits for the users

+ Useful, relevant, interesting items/references can be found/uncovered from unexpected, unknown, unfamiliar databases!This is mainly beneficial in the case of interdisciplinary subjects/topics.

Page 44: Zoveel informatie, zo weinig tijd

44

Federated searching: benefits for the users

+ Some systems offer tools to refine display of the results; for instance »to dedupe very similar items in the result set,»to sort the results, »to rank the results, »to search within the result set,»…

Page 45: Zoveel informatie, zo weinig tijd

45

Federated searching: benefits for the users

+ Some systems offer interesting links from a retrieval result to various related sources or services (such as the full text or a document delivery service), using a link generator based on the OpenURL standard.

Page 46: Zoveel informatie, zo weinig tijd

46

Federated searching: benefits for the users

+ Some systems check for each retrieved bibliographic description if the corresponding full text is immediately available online and indicate this immediately to the user, on the fly.

Page 47: Zoveel informatie, zo weinig tijd

47

Federated searching: benefits for the users

+ Some systems further process the retrieved results and display them in an interesting way that is not offered by the searched original systems. For instance:

» Clustering of results according to subject or age or availability of full text

» Displaying the results in a graphical way

Page 48: Zoveel informatie, zo weinig tijd

48

Federated searching: benefits for the users

So far so good !

Page 49: Zoveel informatie, zo weinig tijd

49

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 50: Zoveel informatie, zo weinig tijd

50

Federated searching: difficulties / challenges / problems

- How to provide some useful relevance ranking of search results/entries, even when the target databases can be quite different in type and quality, and even when no index is created in advance, just-in-case, well before the search action, like Google and other Internet search engines do.

Page 51: Zoveel informatie, zo weinig tijd

51

Federated searching: difficulties / challenges / problems

- Powerful / sophisticated / refined forms of searching may not be applicable in a federated search.Example: limiting to a particular type of document, such as a therapy (in medicine).This may cause a LOSS of time, instead of winning time.

Page 52: Zoveel informatie, zo weinig tijd

52

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engine Search engine Search engine

Page 53: Zoveel informatie, zo weinig tijd

53

Federated searching: difficulties / challenges / problems

- Differences among target sources in the Internet application protocols that are applied normally, by default, for connection/communication and retrieval, such as»(telnet) HTTP»proprietary, non-standard protocols»Z39.50, ISO239.50, SRU, and related protocols that are

developed for federated-searching!

Page 54: Zoveel informatie, zo weinig tijd

54

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 55: Zoveel informatie, zo weinig tijd

55

Federated searching: difficulties / challenges / problems

- Various search engines may act in different ways!For instance:Is truncation of a word in a search query possible?Is limitation to a particular field possible?

How can a federated search engine take these differences into account?

Page 56: Zoveel informatie, zo weinig tijd

56

Federated searching: difficulties / challenges / problems

- A query with several words and without explicit Boolean operators can be interpreted in various ways by the various database retrieval systems.For instance, the retrieval software may apply the Boolean operator AND to combine all the query words, but it may also use OR. In the case that the federated search system does not take care of this well, then this may lead to lower recall and precision.

Page 57: Zoveel informatie, zo weinig tijd

57

Federated searching: difficulties / challenges / problems

- When some special, non-standard, dedicated retrieval software is made available by a specific target source databases to offer special features to the user to exploit the database better than with a standard retrieval interface, then the source can probably not be exploited as well by the federated search system.Searches are reduced to the lowest common denominator.

Page 58: Zoveel informatie, zo weinig tijd

58

Federated searching: difficulties / challenges / problems

- Differences in response time among the target sources.A slow response of a target source can hinder the final analysis and presentation of the results to the user.

Page 59: Zoveel informatie, zo weinig tijd

59

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 60: Zoveel informatie, zo weinig tijd

60

Federated searching: difficulties / challenges / problems

- Some databases can NOT be included as a target database in a federated searching engine, because their owners/producers do not allow this.This is an important difficulty, because in this way interesting / valuable databases are perhaps not exploited by users who rely on federated searching.

Page 61: Zoveel informatie, zo weinig tijd

61

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 62: Zoveel informatie, zo weinig tijd

62

Federated searching: difficulties / challenges / problems

- Users may be less impressed by a federated searching system than by the simple, common, familiar, famous Internet / WWW search engines, as response time is in most cases less impressive, due to differences as follows:- The computer hardware used by the systems- Slower distributed searching through several computer

systems, versus faster searching through a more centralised computer database of a priori compiled records

Page 63: Zoveel informatie, zo weinig tijd

63

Federated searching: difficulties / challenges / problems

- The evaluation of the quality of each search result from a federated search action may be more difficult than when each database is searched separately, because the user may be less aware of the limitations, strengths, selection criteria and aims of the individual, separate databases that offer each result.For instance, peer-reviewed articles from reputable scientific journals may be mixed with more popular and more biased, unscientific texts from trade literature.

Page 64: Zoveel informatie, zo weinig tijd

64

Federated searching: conclusion

Federated searching - is a continuous challenge

for developers of the sophisticated software and for the implementers in libraries and information centers

- offers benefits for those end-users who are not enthusiastic to work with separate target source databases

- does not eliminate the need for access to individual databases

Page 65: Zoveel informatie, zo weinig tijd

65

Hybrid method:merging data + federated searching

Federated search engine

DatabaseDatabase

UserUser UserUser

Search engine Search engine

Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

Search engine

Page 66: Zoveel informatie, zo weinig tijd

66

Information Retrieval in a World of Scattered Information Sources

2. Applications of methods for efficient information retrieval

Page 67: Zoveel informatie, zo weinig tijd

67

Method 1: Merging = aggregating into a searchable database

Search engine Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

UserUser UserUser

Dor

Page 68: Zoveel informatie, zo weinig tijd

68

Internet global subject directories:introduction

• They are virtual libraries with open shelves, for browsing.• They are manually generated, man-made by many

people.• They can be browsed following a tree structure or a more

complicated variation.

Page 69: Zoveel informatie, zo weinig tijd

69

Internet global subject directories: Yahoo!: screenshot of home page

Example

Page 70: Zoveel informatie, zo weinig tijd

70

Internet global subject directories: BUBL LINK

• A hypertext global subject directory to more than 10 000 WWW sites for the higher education community can be found athttp://bubl.ac.uk/link/ [accessed 2008]

• Accessible free of charge.• The categories are based on the well-known general

Dewey classification system.

Example

Page 71: Zoveel informatie, zo weinig tijd

71

Internet global subject directories: dmoz: screenshot of the starting page

Example

Page 72: Zoveel informatie, zo weinig tijd

72

Internet global subject directories: Librarians' Internet Index: screenshot

Example

Page 73: Zoveel informatie, zo weinig tijd

73

Internet global subject directories: IPL: screenshot

Example

Page 74: Zoveel informatie, zo weinig tijd

74

Internet global subject directories: Intute: screenshot

Example

Page 75: Zoveel informatie, zo weinig tijd

75

Internet indexes: scheme of the mechanism

User searching for Internet based information

Internet client hardware and software

user interface to a search engine Internet information source

Internet index search engine Internet crawler and indexing system

database of Internet files, including an index

Page 76: Zoveel informatie, zo weinig tijd

76

Internet indexes: Google

• http://www.google.com/• Available since 2001 with most of its features.• The most popular search system since 2003.

Example

Page 77: Zoveel informatie, zo weinig tijd

77

Internet indexes: Google Scholar

• Google Scholar allows us to search for more scholarly information sources, including journal articles.

• A beta (test) version has been available since November 2004.

• The system is accessible starting from the home page of Google as one of the additional services, or more directly from http://scholar.google.com/

Example

Page 78: Zoveel informatie, zo weinig tijd

78

Internet indexes: Google Scholar: screenshot

Example

Page 79: Zoveel informatie, zo weinig tijd

79

Internet indexes: Bing

• http://www.bing.com/• Available in 2009 in beta = test version.• Replaces

Microsoft Liveas well asYahoo Web Search ?

Example

Page 80: Zoveel informatie, zo weinig tijd

80

Internet indexes: Scirus

• The search interface: http://www.scirus.com/• Since 2001.• Offers not only access to files in html format,

but also to files in PDF. • Allows you to search for more or less “manually” selected

»scientific WWW pages, plus »the contents of some scientific, bibliographic databases.

• In the sense that Scirus is dedicated to scientific information, it is similar to Google Scholar.

Example

Page 81: Zoveel informatie, zo weinig tijd

81

Internet indexes: Ask

• Available from: http://www.ask.com/

• Offers a feature that is not offered by most other search systems: categorization = classification = refinement = clusteringof search results, to help the user coping with the problem of ambiguity of meaning of the search query that was made

Example

Page 82: Zoveel informatie, zo weinig tijd

82

Internet indexes cover only a part of the Internet: metaphore

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW, (that is not searchable using a global index like Google Web Search)

Page 83: Zoveel informatie, zo weinig tijd

83

Databases accessible over the Internet: example: OAISTER

• http://oaister.umdl.umich.edu/• “Our goal is to create a collection of freely available,

previously difficult-to-access, academically-oriented digital resources that are easily searchable by anyone.”

Example

Page 84: Zoveel informatie, zo weinig tijd

84

Databases accessible over the Internet: example: OAISTER

• OAISTER makes searching possible in millions of digital documents that form part of institutional repositories all over the world.

• OAISTER covers this kind of documents better than Google Web Search (according to independent academic investigations in 2006 and 2008).

Example

Page 85: Zoveel informatie, zo weinig tijd

85

Databases accessible over the Internet: example: scientificcommons

Example

• http://www.scientificcommons.org/• Since 2007• Similar to OAISTER:

Allows you to search the full texts in scientific open access repositories all over the world.

Page 86: Zoveel informatie, zo weinig tijd

86

Databases accessible over the Internet: example: Medline

• Medline/PubMed offers bibliographic descriptions of publications on medicine, free of charge.

Example

Page 87: Zoveel informatie, zo weinig tijd

87

Current awareness services focusing on WWW pages: Google Alerts

• Available at http://www.google.com/ and then see the page with additional servicesor more directly from http://www.google.com/alerts/

• Since 2004.• Can discover relevant changed or new WWW pages for

you in the future.• Is based on the popular Internet index Google.• Works with search queries given by you that are stored

on their server computer.

Page 88: Zoveel informatie, zo weinig tijd

88

Internet with WWW and printed books

• Since a few years, Internet with the WWW have become the primary information source for many people.

• However:»A lot of information is still distributed only in the form of

printed books»The content of old printed books can still be interesting.»The content of most printed books is (still) not available on

the Internet.

Page 89: Zoveel informatie, zo weinig tijd

89

Public access book databases: introduction

• Most general WWW search engines do NOT allow you to find out about the existence of books that may be interesting for you, at least not in a systematic and efficient way.

• So, specific search tools to find books can be useful.

Page 90: Zoveel informatie, zo weinig tijd

90

Public access book databases provided by bookshops

• To find currently available books, the bibliographic databases assembled by big bookshops are interesting.

• Several offer a good coverage.• Many are accessible free of charge.• The added price information can be useful for the

acquisition and accounting department of a library or if an individual user wants to buy a book.

• Some provide a current awareness service,also free of charge.

• Take into account delivery costs: postage + import tax

Page 91: Zoveel informatie, zo weinig tijd

91

Book databases accessible free of charge: examples in U.S.A.

• Amazon.com (US):http://www.amazon.com/

• This company offers also different, more local versions that offer books in other languages, such as http://www.amazon.co.uk/http://www.amazon.fr/

• note: amazon, NOT amazone• Subject description is poor.• Take into account delivery costs: postage + import

tax

Examples

Page 92: Zoveel informatie, zo weinig tijd

92

Book databases accessible free of charge: examples in U.S.A.

• Barnes and Noble (US):http://www.barnesandnoble.com/ or http://www.bn.com/

Examples

Page 93: Zoveel informatie, zo weinig tijd

93

Book databases accessible free of charge: examples in U.S.A.

• http://www.completebook.com/cbmsi/bookaction.do

Examples

Page 94: Zoveel informatie, zo weinig tijd

94

Book databases accessible free of charge: examples in U.S.A.

• http://www.overstock.com/

Examples

Page 95: Zoveel informatie, zo weinig tijd

95

Book databases accessible free of charge: examples in U.S.A.

• http://www.powells.com/• Specialised in books only.

Examples

Page 96: Zoveel informatie, zo weinig tijd

96

Book databases accessible free of charge: examples in Europe

• Blackwell’s on the Internet (International, academic books):http://www.blackwell.co.uk/

• VLB for books in Germanhttp://www.buchhandel.de/

• For books in Frenchhttp://www.chapitre.com

• Boeknet - De Nederlandse Internet Boekhandel (Dutch)http://www.boeknet.nl/

Examples

Page 97: Zoveel informatie, zo weinig tijd

97

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Book dealer catalog

database

Page 98: Zoveel informatie, zo weinig tijd

98

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Book dealer catalog

databases

Page 99: Zoveel informatie, zo weinig tijd

99

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Book dealer catalog

databases

Page 100: Zoveel informatie, zo weinig tijd

100

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Multi-dealer database = merged

book dealer databases

Book dealer catalog

databases

Page 101: Zoveel informatie, zo weinig tijd

101

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Multi-dealer databases = merged

book dealer databases

Book dealer catalog

databases

Page 102: Zoveel informatie, zo weinig tijd

102

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Multi-dealer databases = merged

book dealer databases

Book dealer catalog

databases

Page 103: Zoveel informatie, zo weinig tijd

103

Free public access multi-dealer book databases: examples

• http://www.abebooks.com/[accessed 2008]

• http://www.abebooks.fr/offers a user interface in French

• Covers > 10 000 bookshops.

• The company has been acquired by Amazon in 2008.

Page 104: Zoveel informatie, zo weinig tijd

104

Free public access multi-dealer book databases: examples

• http://www.alibris.com/[accessed 2008]

Page 105: Zoveel informatie, zo weinig tijd

105

Free public access multi-dealer book databases: examples

• Amazon Marketplace:http://www.amazon.com/[accessed 2009]

• In synergy with the online bookshop Amazon on 1 WWW site: Used books are displayed alongside Amazon’s new books.

• “the world’s biggest online book bazaar”• Subject description is poor.• Take into account delivery costs: postage + tax

Page 106: Zoveel informatie, zo weinig tijd

106

Free public access multi-dealer book databases: examples

Page 107: Zoveel informatie, zo weinig tijd

107

Free public access multi-dealer book databases: examples

• http://www.biblio.com/ or http://biblio.com/[accessed 2008]

Page 108: Zoveel informatie, zo weinig tijd

108

Free public access multi-dealer book databases: examples

• http://www.boekenverkoper.nl[accessed in 2007]

Page 109: Zoveel informatie, zo weinig tijd

109

Free public access multi-dealer book databases: examples

• http://www.choosebooks.com/[accessed 2008]

Page 110: Zoveel informatie, zo weinig tijd

110

Free public access multi-dealer book databases: examples

• http://www.tomfolio.com/[accessed 2008]

Page 111: Zoveel informatie, zo weinig tijd

111

Full-text databases of books: introduction

• Some organisations have scanned the contents of thousands of books, to make them full-text searchable through the Internet.

Page 112: Zoveel informatie, zo weinig tijd

112

Full-text databases of books: Amazon

• http://www.amazon.com/ and choose BOOKS• Since 2004• Also incorporated in the search engine A9

Page 113: Zoveel informatie, zo weinig tijd

113

Full-text databases of books: Google Book Search

• http://www.books.google• Since 2005

Page 114: Zoveel informatie, zo weinig tijd

114

Online Public Access Catalogues:union catalogues of libraries

• Some systems offer access to the merged catalogues of several libraries, so-called ‘union catalogues’.

• Example: Copachttp://www.copac.ac.uk/is accessible free of charge.

Example

Page 115: Zoveel informatie, zo weinig tijd

115

Online Public Access Catalogues:union catalogues: examples

• European National Libraries, catalogues harvested: http://www.theeuropeanlibrary.org/portal/index.html

Examples

Page 116: Zoveel informatie, zo weinig tijd

116

Online Public Access Catalogues:union catalogues: examples

• Europeana: documents on European culture.http://www.europeana.eu/portal/Metadata are harvested from co-operating organisations.

Examples

Page 117: Zoveel informatie, zo weinig tijd

117

Online access databases about journal articles: overview

• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers.

• Many publishers offer searchable bibliographies, but only of their own publications. (for instance Elsevier, Emerald, Sage)

• Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.

Page 118: Zoveel informatie, zo weinig tijd

118

Online access databases about journal articles: Ingenta

• Available from: http://www.ingentaconnect.com/• Ingenta allows you to search a bibliographic database of

millions of journal articles, including titles, authors, in many cases abstracts.

• The organisation claims to be “The most comprehensive collection of academic and professional publications”

Example

Page 119: Zoveel informatie, zo weinig tijd

119

Online access databases about journal articles: Infotrieve ArticleFinder

• Available from: http://www.infotrieve.com/• Infotrieve allows you to search free of charge

in a bibliographic database of the articles of more than 20 000 journal titles and conferenceproceedings, NOT full-text.

• Payment is required to receive the full text of a document.

Example

Page 120: Zoveel informatie, zo weinig tijd

120

Online access databases about journal articles: Scirus

• The search interface: http://www.scirus.com• This is a specialised Internet index that allows you to

search for selected scientific information (only) on the WWW.

• This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier.

• Offered free of charge by Elsevier.• An article can be downloaded in full-text format only

when a fee has been paid to the publisher.

Example

Page 121: Zoveel informatie, zo weinig tijd

121

Online access databases about journal articles: Google Scholar

• Google Scholar allows us to search for more scholarly information sources, including journal articles.

• A beta (= test) version has been available since November 2004.

• The system is accessible starting from the home page of Google as one of the additional services besides the normal, classical WWW search.

Example

Page 122: Zoveel informatie, zo weinig tijd

122

Online access databases about journal articles: DOAJ screenshot

Example

Page 123: Zoveel informatie, zo weinig tijd

123

Online access databases about journal articles: Eric

• http://ericir.syr.edu/Eric/• Eric allows searching a bibliographic database of articles

and other documents in the fields of information science and education.

+ Available in open access, free of charge- Payment is required to receive the full text of a document.

Example

Page 124: Zoveel informatie, zo weinig tijd

124

Online access databases about journal articles: LISTA

• http://www.libraryresearch.com/• Bibliographic database; covers libraries and information

management, with subjects such as librarianship, classification, cataloging, bibliometrics, online information retrieval, information management and more, from more than 600 periodicals plus books, research reports, and proceedings

• Offered since 2005• Delivered via the EBSCOhost platform+ Free of charge

Example

Page 125: Zoveel informatie, zo weinig tijd

125

Online access databases about journal articles: Teacher Reference Center

• http://www.TeacherReference.com/• Teacher Reference Center (TRC)

Journal Information for Teachersallows to search popular teacher and administrator trade journals, periodicals, and books

• via the EBSCOhost platform• since 2006+ offered free of charge

Example

Page 126: Zoveel informatie, zo weinig tijd

126

Online access databases:Web of Science

• One of the bibliographic databases in Web of Knowledgeis the Web of Science.

• This is a bibliographic database that covers the articles published in the most important scientific journals.

Web of Knowledge

Web of Science

Example

Page 127: Zoveel informatie, zo weinig tijd

127

Finding images on the Internet:introduction

+ Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet.

+ When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).

Page 128: Zoveel informatie, zo weinig tijd

128Examples

Finding images on the Internet:screen shot of a Google image search

Page 129: Zoveel informatie, zo weinig tijd

129Example

Finding images on the Internet:examples of search engines

• http://images.google.com/ ! or through http://www.google.com/[accessed in 2009]

• The largest database in this category (at least in 2002…2008).For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.

Page 130: Zoveel informatie, zo weinig tijd

130

Finding images on the Internet:examples of search engines

• http://www.bing.com/• Available in 2009 in beta = test version.• Replacing

Microsoft Live and Yahoo Search ?

Eample

Page 131: Zoveel informatie, zo weinig tijd

131

Method 2: Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 132: Zoveel informatie, zo weinig tijd

132

Federated searchingthrough scattered databases: why?

• Applications:»Finding information in bibliographic databases»Finding the availability of rooms in various hotels»Finding flights to a particular destination offered by

various airline companies»Finding scientific data that are made available by various

computers all over the world

Page 133: Zoveel informatie, zo weinig tijd

133

Federated searching: application:finding a hotel room in some city

Example

Page 134: Zoveel informatie, zo weinig tijd

134

Federated searching: application:finding scientific data

Example

• OBIS = Ocean BiogeographicInformation System

• http://www.iobis.org/• Gateway to scientific

data on living systems in the oceans.

• The data reside on many computers all over the world.

Page 135: Zoveel informatie, zo weinig tijd

135

Hybrid method:merging data + federated searching

Federated search engine

DatabaseDatabase

UserUser UserUser

Search engine Search engine

Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

Search engine

Page 136: Zoveel informatie, zo weinig tijd

136

Databases accessible over the Internet: example

Example

• http://WorldWideScience.org/• “A global science gateway connecting you to national and

international scientific databases and portals. Accelerates scientific discovery and progress by providing one-stop searching of global science sources.”

Page 137: Zoveel informatie, zo weinig tijd

137

Meta WWW search systems on a server computer in the WWW

User

Client computer

+WWW

client program

WWW server

computer

InternetWWW

WWW server

computerswith Internet

search systems

In Out

Page 138: Zoveel informatie, zo weinig tijd

138

Meta-search systems: terminology / vocabulary / synonyms

“multi-threaded search systems”= “multiple search systems”= “multi-search systems”= “meta-search systems”= “intelligent search agents”= “federated search systems”= “portals”

Page 139: Zoveel informatie, zo weinig tijd

139Examples

Meta-search systems on a server computer

• http://aftervote.com/• http://draze.com/• http://www.all4one.com• http://www.bytesearch.com• http://clusty.com/• http://www.cyber411.com• http://www.dogpile.com = http://dogpile.com/• http://www.go2net.com = http://www.metacrawler.com• http://jux2.com• http://www.kartoo.com• http://www.mamma.com• http://www.museseek.com• http://www.profusion.com• http://www.search.com• http://www.vivisimo.com = http://vivisimo.com/

Page 140: Zoveel informatie, zo weinig tijd

140

Meta-search systems: server-based: example: Vivisimo

Page 141: Zoveel informatie, zo weinig tijd

141

Meta-search systems: server-based: example: Vivisimo

• Vivisimo adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT pre-processing the documents before the search.

Page 142: Zoveel informatie, zo weinig tijd

142Example

Meta-search systems: server-based: example: Clusty

• Adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster.

• Can accomplish this on the fly, that is WITHOUT pre-processing the documents before the search.

Page 143: Zoveel informatie, zo weinig tijd

143Example

Meta-search systems: server-based: example: Clusty screenshot in 2006

Page 144: Zoveel informatie, zo weinig tijd

144

Meta-search systems: disadvantages

- It is not always clear through which Internet indexes the meta-search system will search.

- Not all meta-search systems can search all the major primary search systems; for instance the famous Google Internet index is NOT included in most systems.

- Only a limited number of the results that can be obtained from the various Internet indexes are shown.

Page 145: Zoveel informatie, zo weinig tijd

145

Free public access book meta-search systems: types

We can make the following distinction between various types of meta-systems for searching:

1. Database resulting from merging several existing smaller databases = aggregator databaseIn this case of books: multi-dealer database = “listing service”

2. Federated search system = cross-database search system

Page 146: Zoveel informatie, zo weinig tijd

146

Free public access search systems: federated search systems

• Each of the searched target databases can be »a catalogue database managed by the

owner/dealer/shop/seller, as well as

»a multi-dealer database

Page 147: Zoveel informatie, zo weinig tijd

147

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Book dealer catalog

database

Page 148: Zoveel informatie, zo weinig tijd

148

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Book dealer catalog

databases

Page 149: Zoveel informatie, zo weinig tijd

149

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Book dealer catalog

databases

Page 150: Zoveel informatie, zo weinig tijd

150

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Multi-dealer database = merged

book dealer databases

Book dealer catalog

databases

Page 151: Zoveel informatie, zo weinig tijd

151

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User

Multi-dealer databases = merged

book dealer databases

Book dealer catalog

databases

Page 152: Zoveel informatie, zo weinig tijd

152

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User Federated book search systems

Multi-dealer databases = merged

book dealer databases

Book dealer catalog

databases

Page 153: Zoveel informatie, zo weinig tijd

153

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User Federated book search systems

Multi-dealer databases = merged

book dealer databases

Book dealer catalog

databases

Page 154: Zoveel informatie, zo weinig tijd

154

Search systems for books that are made available by dealers

descriptions of books & real books for sale

User Federated book search systems

Multi-dealer databases = merged

book dealer databases

Book dealer catalog

databases

Page 155: Zoveel informatie, zo weinig tijd

155

Free public access federated search systems for books: examples

-

Page 156: Zoveel informatie, zo weinig tijd

156

Free public access federated search systems for books: examples

• http://www.allbookstores.com/ [accessed 2006]

Page 157: Zoveel informatie, zo weinig tijd

157

Free public access federated search systems for books: examples

Page 158: Zoveel informatie, zo weinig tijd

158

Free public access federated search systems for books: examples

• http://www.BookFinder.com/[accessed 2009]

Page 159: Zoveel informatie, zo weinig tijd

159

Free public access federated search systems for books: examples

• http://www.bookfinder4u.com/ [accessed 2007]

Page 160: Zoveel informatie, zo weinig tijd

160

Free public access federated search systems for books: examples

• http://www.bookpursuit.com/[accessed 2006]

Page 161: Zoveel informatie, zo weinig tijd

161

Free public access federated search systems for books: examples

Page 162: Zoveel informatie, zo weinig tijd

162

Free public access federated search systems for books: examples

• http://www.dealtime.com/ [accessed 2006]

Page 163: Zoveel informatie, zo weinig tijd

163

Free public access federated search systems for books: examples

• http://www.epinions.com/Books [accessed 2006]

Page 164: Zoveel informatie, zo weinig tijd

164

Free public access federated search systems for books: examples

• http://www.fetchbook.info/ [accessed 2006]

Page 165: Zoveel informatie, zo weinig tijd

165

Free public access federated search systems for books: examples

• http://www.gallileus.info/search/[accessed 2006]

Page 166: Zoveel informatie, zo weinig tijd

166

Free public access federated search systems for books: examples

• http://www.priceminister.com/livres-bd [accessed 2007]• Can search not only books but also other products in

various shops.

Page 167: Zoveel informatie, zo weinig tijd

167

Free public access federated search systems for books: examples

• http://www.usedbooksearch.co.uk/books.htm[accessed 2008]

• Specialised in used books, not in new books.

Page 168: Zoveel informatie, zo weinig tijd

168

Free public access federated search systems for books: examples

• http://www.vialibri.net/ [accessed 2008]

Page 169: Zoveel informatie, zo weinig tijd

169

Free public access federated search systems for books are interesting

• Knowledge about their quality is interesting » for end users as well as for librarians who buy books, » for librarians who serve their users by performing

searches for books, » for librarians who propose databases to their users, for

instance on their library WWW site or who want to include one or several book search engines in their own local system for federated searching through several targets in one action.

Page 170: Zoveel informatie, zo weinig tijd

170

Online Public Access Catalogues:simultaneous searching

• Some meta-search services allow simultaneous, parallel searching in one search action over several databases of libraries.

Page 171: Zoveel informatie, zo weinig tijd

171

Online Public Access Catalogues:simultaneous searching: examples

Example

• Simultaneous access to catalogues of libraries related to water, organised by IAMSLIC, using Z39.50

Page 172: Zoveel informatie, zo weinig tijd

172

Information Retrieval in a World of Scattered Information Sources

3. Comparison of methods for efficient information retrieval

Page 173: Zoveel informatie, zo weinig tijd

173

Method 1: Merging = aggregating into a searchable database

Search engine Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

UserUser UserUser

Dor

Page 174: Zoveel informatie, zo weinig tijd

174

Comparison of methods for efficient information retrieval

• Merged=aggregated databases react faster than federated search systems (in most cases).»Explanation:

They do not need several simultaneous Internet connections & they do not have to merge raw intermediate results into the result that is finally shown to the user.

Page 175: Zoveel informatie, zo weinig tijd

175

Method 2: Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

Page 176: Zoveel informatie, zo weinig tijd

176

Hybrid method:merging data + federated searching

Federated search engine

DatabaseDatabase

UserUser UserUser

Search engine Search engine

Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

Search engine

Page 177: Zoveel informatie, zo weinig tijd

177

Comparison of methods for efficient information retrieval

• Federated search systems offer a higher coverage than direct searching of databases or merged databases (in most cases). »Explanation: They can exploit many databases and even

merged=aggregated databases in one search action. For example, in 1 search, they can cover more than 100 million descriptions of physical books= couples of book and dealer (not book titles).

Page 178: Zoveel informatie, zo weinig tijd

178

Comparison of methods for efficient information retrieval

• Federated search systems offer results that are more up to date than when an aggregated database is searched with contents that is (only) a snapshot made in the past.This is important when data should be very fresh = up-to-date.Examples: booking=reservation systems for flights, hotel rooms

Page 179: Zoveel informatie, zo weinig tijd

179

Information Retrieval in a World of Scattered Information Sources

Conclusions

Page 180: Zoveel informatie, zo weinig tijd

180

Conclusions: 2 methods

• A single, simple, standard method = approach = solutiondoes not (yet) exist.

• Two basic methods are common.• They have their own

»advantagesand

»disadvantages.

Page 181: Zoveel informatie, zo weinig tijd

181

Conclusions:1 dimension

• Up to now we have made primarily the distinction» Merging records in 1 database on 1 computer

& searching this database

» Federated searching in one action of databases on various computers

Page 182: Zoveel informatie, zo weinig tijd

182

Conclusions:more dimensions

• However, the location of the databases is only 1 aspect / dimension of possible methodological approaches.

• Other dimensions / aspects are for instance: 2. Unification / standardization of database record structures

in fields according to a standard, for better interoperability.

3. Unification / standardization of subject descriptions, for better interoperability.

• This bring us to 3 aspects / dimensions so we can visualize this as a cube.

Page 183: Zoveel informatie, zo weinig tijd

183

Conclusions:the cube of interoperability

1. Various computers2. Various database field structures3. Various subject description systems

WORST CASE

1. One computer2. One database field structure3. One subject description system

BEST CASE

Inter-operability

Page 184: Zoveel informatie, zo weinig tijd

184

Methods for efficient information retrieval: conclusions

• For end users, the underlying methods of most information systems are either “not clear” (= negative formulation)“transparent” (= positive formulation)

Page 185: Zoveel informatie, zo weinig tijd

185

Methods for efficient

information retrieval:

conclusions• The examples given

show at least that progress in this field is impressive.

Page 186: Zoveel informatie, zo weinig tijd

186

Questions? Suggestions? Remarks?

Page 187: Zoveel informatie, zo weinig tijd

187

• You are free to copy, distribute, display this work under the following conditions:»Attribution:

You must mention the author.»Noncommercial:

You may not use this work for commercial purposes.»No Derivative Works:

You may not change, modify, alter, transform, or build upon this work.

• For any reuse or distribution, you must make clear to others the license terms of this work.