Zoveel informatie, zo weinig tijd
-
Upload
vlaamse-vereniging-voor-bibliotheek-archief-amp-documentatie-vzw -
Category
Documents
-
view
2.583 -
download
2
description
Transcript of Zoveel informatie, zo weinig tijd
1
Zo veel informatieZo weinig tijd
[email protected] to support a presentation
at the bi-annual 2-day conference series “Informatie”organised by VVBAD, in Oostende, BelgiumSeptember 10-11, 2009“Informatie aan zee”
2
0. Introduction with problem statements
1. Methods to make information retrieval efficient in a world of scattered sources
2. Applications of those methods
3. Comparison of the methods
4. Conclusions
contents = summary = structure= overview
of thispresentation
3These slides should be available from the WWW sitehttp://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/(note: BIBLIO and not biblio)and also from the WWW site of the organisers of the conference =
VVBAD
4
Information Retrieval in a World of Scattered Information Sources
0. Introduction and problem statements
5
Introduction: scattering of sources
• Users want to exploit information sources fast and effectively.
• This is hindered by the fact that digital, electronic information sources that may contain relevant information are created and scattered, distributed on numerous computers all over the intranet of the user’s organization AND over the Internet and the WWW.
6
Introduction: scattering of sources
• In other words: integration / aggregation is still far from perfect.
7
Introduction: scattering of sources difficulties
• Using many information retrieval systems costs time:1. They must be used one after the other which requires
many decisions and actions
8
Introduction: scattering of sources difficulties
• Using many information retrieval systems costs time:2. They offer different user interfaces in the retrieval phase,
which is confusing
9
Introduction: scattering of sources difficulties
• Using many information retrieval systems costs time:3. They offer found information items in various data
formats
10
Introduction: scattering of sources difficulties
• Using many information retrieval systems costs time:4. They display found items in different ways on a computer
screen
11
Introduction: scattering of sources difficulties
Small = BEAUTIFUL
12
Introduction: scattering of sources difficulties
13
Introduction: problem statements
1. Which methods have been developed and applied to cope with this reality?
14
Introduction: problem statements
2. Which concrete applications are available and how can an end-user exploit systems created in this domain?
15
Introduction: problem statements
3. How can information intermediaries evaluate and apply these methods to bring information more efficiently to end-users?
16
Information Retrieval in a World of Scattered Information Sources
1. Methods to make information retrieval efficient
in a world of scattered sources
17
Method 1: Merging = aggregating into a searchable database
Search engine Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
UserUser UserUser
Dor
18
Method 2: Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
19
Both methods offer benefits to the users
+ Saves the users time executing queries to various servers or browsing through various systems.
☺
20
Both methods offer benefits to the users
+ Offers a uniform / consistent display of results in the output phase.
☺
21
Both methods offer benefits to the users
+ Some systems offer tools to refine display of the results; for instance + to deduplicate very similar items in the result set,+ to sort the results, + to rank the results, + to visualize the results in a more graphical way,+ to search within the result set,+ … ☺
22
Both methods bringdifficulties / challenges / problems
- In many cases there are differences among the merged sources in the formatting/structuring of their database records in fields. This hinders - searching limited to a field - displaying selected fields only (such as title)- sorting of the displayed records on the contents of a
particular selected field (such as author or date)
23
Both methods bringdifficulties / challenges / problems
- In many cases there are differences among sources in the metadata schemes that are applied in the databases to improve retrieval, such as»classifications»taxonomies»thesaurus systems»ontologies
This hinders the exploitation of the added value of such metadata.
24
Both methods bringdifficulties / challenges / problems
- How to deduplicate/dedupe/cluster very similar entries/results/items= near-duplicates, from various target sources? When is similar similar enough? Which entry/result/item to choose/select as the representative of a cluster of similar entries?
25
Both methods bringdifficulties / challenges / problems
- When some special, non-standard, dedicated retrieval software is made available by a specific target source database, to offer special features to the user to exploit the database better than with a more classical standard retrieval interface, then this may be lost in the new retrieval system.Searches are reduced to the lowest common denominator.Examples: - clustering of results- deduplication of results…
26
Method 1: Merging = aggregating into a searchable database
Search engine Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
UserUser UserUser
Dor
27
Data Providers
Client computer
+ client
software
ServiceProviderSearch
& retrieval Metadata
databaseserver PMH
http protocol
request
http protocol
metadata
metadata
metadataDigital objects
useruser
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
28
Merging into a searchable database offers benefits for the users
+ Applicable even in the absence of data communication to remote servers(whereas federated searching needs good, fast data communication.)Therefore this is the relatively ‘old’ method.
☺
29
Merging into a searchable database brings difficulties / challenges
- The contents of the aggregated database is less up to data than the original information sources.The importance of this aspect depends of course - on the particular application- on the time delay
30
Method 2: Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
31
Federated searching: terminology / vocabulary / synonyms
federated searching= meta-searching = metasearching= cross-database searching= multi-database searching= multi-threaded searching= one-stop searching= poly-searching = polysearching= broadcast searching= searching through a portal / gateway
32
Federated searchingthrough scattered databases: why?
The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum4. Something nice to read (free via your library)
The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum4. Something nice to read (free via your library)
☺
33
Federated searching: application:finding a suitable flight
Example:• http://CheapTickets.com/ for the USA
Example
34
Federated searching: application:finding a hotel room in some city
Example
35
Federated searching: searching in a museum
Example
36
Federated searching: searching in a library
Example
37
Meta-searching systemMeta-searching system
Catalog database(s)
of other libraries
Catalog database(s)
of other libraries
Federated searching: integrating access
Databases(full-text or bibliographic)
Databases(full-text or bibliographic)
PublishersPublishers
JournalsJournals
ArticlesArticlesIntranetIntranet
Local library catalog database(s)
Local library catalog database(s)
WWW search engines
WWW search engines
38
Federated searching: benefits for the users
+ The system can help the user to select appropriate sources.
☺
39
Federated searching: benefits for the users
+ The system can help in the process of authentication and authorization when this involves not only a simple recognition of IP-address of the user’s client computer, but when it involves user-id’s and passwords.
☺
40
Federated searching: benefits for the users
+ The need to know which particular database is suitable for a particular search is reduced, because several ones can be searched in one action.
☺
41
Federated searching: benefits for the users
+ The users have to learn only 1 user interface for searching and only 1 search syntax, instead of a user interface and a search syntax for each database!
☺
42
Federated searching: benefits for the users
+ Can make users search and exploit databases that they would never use otherwise, that is without federated search system!
☺
43
Federated searching: benefits for the users
+ Useful, relevant, interesting items/references can be found/uncovered from unexpected, unknown, unfamiliar databases!This is mainly beneficial in the case of interdisciplinary subjects/topics.
☺
44
Federated searching: benefits for the users
+ Some systems offer tools to refine display of the results; for instance »to dedupe very similar items in the result set,»to sort the results, »to rank the results, »to search within the result set,»…
☺
45
Federated searching: benefits for the users
+ Some systems offer interesting links from a retrieval result to various related sources or services (such as the full text or a document delivery service), using a link generator based on the OpenURL standard.
☺
46
Federated searching: benefits for the users
+ Some systems check for each retrieved bibliographic description if the corresponding full text is immediately available online and indicate this immediately to the user, on the fly.
☺
47
Federated searching: benefits for the users
+ Some systems further process the retrieved results and display them in an interesting way that is not offered by the searched original systems. For instance:
» Clustering of results according to subject or age or availability of full text
» Displaying the results in a graphical way
☺
48
Federated searching: benefits for the users
So far so good !
☺
49
Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
50
Federated searching: difficulties / challenges / problems
- How to provide some useful relevance ranking of search results/entries, even when the target databases can be quite different in type and quality, and even when no index is created in advance, just-in-case, well before the search action, like Google and other Internet search engines do.
51
Federated searching: difficulties / challenges / problems
- Powerful / sophisticated / refined forms of searching may not be applicable in a federated search.Example: limiting to a particular type of document, such as a therapy (in medicine).This may cause a LOSS of time, instead of winning time.
52
Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engine Search engine Search engine
53
Federated searching: difficulties / challenges / problems
- Differences among target sources in the Internet application protocols that are applied normally, by default, for connection/communication and retrieval, such as»(telnet) HTTP»proprietary, non-standard protocols»Z39.50, ISO239.50, SRU, and related protocols that are
developed for federated-searching!
54
Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
55
Federated searching: difficulties / challenges / problems
- Various search engines may act in different ways!For instance:Is truncation of a word in a search query possible?Is limitation to a particular field possible?
How can a federated search engine take these differences into account?
56
Federated searching: difficulties / challenges / problems
- A query with several words and without explicit Boolean operators can be interpreted in various ways by the various database retrieval systems.For instance, the retrieval software may apply the Boolean operator AND to combine all the query words, but it may also use OR. In the case that the federated search system does not take care of this well, then this may lead to lower recall and precision.
57
Federated searching: difficulties / challenges / problems
- When some special, non-standard, dedicated retrieval software is made available by a specific target source databases to offer special features to the user to exploit the database better than with a standard retrieval interface, then the source can probably not be exploited as well by the federated search system.Searches are reduced to the lowest common denominator.
58
Federated searching: difficulties / challenges / problems
- Differences in response time among the target sources.A slow response of a target source can hinder the final analysis and presentation of the results to the user.
59
Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
60
Federated searching: difficulties / challenges / problems
- Some databases can NOT be included as a target database in a federated searching engine, because their owners/producers do not allow this.This is an important difficulty, because in this way interesting / valuable databases are perhaps not exploited by users who rely on federated searching.
61
Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
62
Federated searching: difficulties / challenges / problems
- Users may be less impressed by a federated searching system than by the simple, common, familiar, famous Internet / WWW search engines, as response time is in most cases less impressive, due to differences as follows:- The computer hardware used by the systems- Slower distributed searching through several computer
systems, versus faster searching through a more centralised computer database of a priori compiled records
63
Federated searching: difficulties / challenges / problems
- The evaluation of the quality of each search result from a federated search action may be more difficult than when each database is searched separately, because the user may be less aware of the limitations, strengths, selection criteria and aims of the individual, separate databases that offer each result.For instance, peer-reviewed articles from reputable scientific journals may be mixed with more popular and more biased, unscientific texts from trade literature.
64
Federated searching: conclusion
Federated searching - is a continuous challenge
for developers of the sophisticated software and for the implementers in libraries and information centers
- offers benefits for those end-users who are not enthusiastic to work with separate target source databases
- does not eliminate the need for access to individual databases
65
Hybrid method:merging data + federated searching
Federated search engine
DatabaseDatabase
UserUser UserUser
Search engine Search engine
Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
Search engine
66
Information Retrieval in a World of Scattered Information Sources
2. Applications of methods for efficient information retrieval
67
Method 1: Merging = aggregating into a searchable database
Search engine Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
UserUser UserUser
Dor
68
Internet global subject directories:introduction
• They are virtual libraries with open shelves, for browsing.• They are manually generated, man-made by many
people.• They can be browsed following a tree structure or a more
complicated variation.
69
Internet global subject directories: Yahoo!: screenshot of home page
Example
70
Internet global subject directories: BUBL LINK
• A hypertext global subject directory to more than 10 000 WWW sites for the higher education community can be found athttp://bubl.ac.uk/link/ [accessed 2008]
• Accessible free of charge.• The categories are based on the well-known general
Dewey classification system.
Example
71
Internet global subject directories: dmoz: screenshot of the starting page
Example
72
Internet global subject directories: Librarians' Internet Index: screenshot
Example
73
Internet global subject directories: IPL: screenshot
Example
74
Internet global subject directories: Intute: screenshot
Example
75
Internet indexes: scheme of the mechanism
User searching for Internet based information
Internet client hardware and software
user interface to a search engine Internet information source
Internet index search engine Internet crawler and indexing system
database of Internet files, including an index
76
Internet indexes: Google
• http://www.google.com/• Available since 2001 with most of its features.• The most popular search system since 2003.
Example
77
Internet indexes: Google Scholar
• Google Scholar allows us to search for more scholarly information sources, including journal articles.
• A beta (test) version has been available since November 2004.
• The system is accessible starting from the home page of Google as one of the additional services, or more directly from http://scholar.google.com/
Example
78
Internet indexes: Google Scholar: screenshot
Example
79
Internet indexes: Bing
• http://www.bing.com/• Available in 2009 in beta = test version.• Replaces
Microsoft Liveas well asYahoo Web Search ?
Example
80
Internet indexes: Scirus
• The search interface: http://www.scirus.com/• Since 2001.• Offers not only access to files in html format,
but also to files in PDF. • Allows you to search for more or less “manually” selected
»scientific WWW pages, plus »the contents of some scientific, bibliographic databases.
• In the sense that Scirus is dedicated to scientific information, it is similar to Google Scholar.
Example
81
Internet indexes: Ask
• Available from: http://www.ask.com/
• Offers a feature that is not offered by most other search systems: categorization = classification = refinement = clusteringof search results, to help the user coping with the problem of ambiguity of meaning of the search query that was made
Example
82
Internet indexes cover only a part of the Internet: metaphore
The “visible” part of Internet
The “deep, hidden, invisible” part of Internet and the WWW, (that is not searchable using a global index like Google Web Search)
83
Databases accessible over the Internet: example: OAISTER
• http://oaister.umdl.umich.edu/• “Our goal is to create a collection of freely available,
previously difficult-to-access, academically-oriented digital resources that are easily searchable by anyone.”
Example
84
Databases accessible over the Internet: example: OAISTER
• OAISTER makes searching possible in millions of digital documents that form part of institutional repositories all over the world.
• OAISTER covers this kind of documents better than Google Web Search (according to independent academic investigations in 2006 and 2008).
Example
85
Databases accessible over the Internet: example: scientificcommons
Example
• http://www.scientificcommons.org/• Since 2007• Similar to OAISTER:
Allows you to search the full texts in scientific open access repositories all over the world.
☺
86
Databases accessible over the Internet: example: Medline
• Medline/PubMed offers bibliographic descriptions of publications on medicine, free of charge.
Example
☺
87
Current awareness services focusing on WWW pages: Google Alerts
• Available at http://www.google.com/ and then see the page with additional servicesor more directly from http://www.google.com/alerts/
• Since 2004.• Can discover relevant changed or new WWW pages for
you in the future.• Is based on the popular Internet index Google.• Works with search queries given by you that are stored
on their server computer.
88
Internet with WWW and printed books
• Since a few years, Internet with the WWW have become the primary information source for many people.
• However:»A lot of information is still distributed only in the form of
printed books»The content of old printed books can still be interesting.»The content of most printed books is (still) not available on
the Internet.
89
Public access book databases: introduction
• Most general WWW search engines do NOT allow you to find out about the existence of books that may be interesting for you, at least not in a systematic and efficient way.
• So, specific search tools to find books can be useful.
90
Public access book databases provided by bookshops
• To find currently available books, the bibliographic databases assembled by big bookshops are interesting.
• Several offer a good coverage.• Many are accessible free of charge.• The added price information can be useful for the
acquisition and accounting department of a library or if an individual user wants to buy a book.
• Some provide a current awareness service,also free of charge.
• Take into account delivery costs: postage + import tax
91
Book databases accessible free of charge: examples in U.S.A.
• Amazon.com (US):http://www.amazon.com/
• This company offers also different, more local versions that offer books in other languages, such as http://www.amazon.co.uk/http://www.amazon.fr/
• note: amazon, NOT amazone• Subject description is poor.• Take into account delivery costs: postage + import
tax
Examples
92
Book databases accessible free of charge: examples in U.S.A.
• Barnes and Noble (US):http://www.barnesandnoble.com/ or http://www.bn.com/
Examples
93
Book databases accessible free of charge: examples in U.S.A.
• http://www.completebook.com/cbmsi/bookaction.do
Examples
94
Book databases accessible free of charge: examples in U.S.A.
• http://www.overstock.com/
Examples
95
Book databases accessible free of charge: examples in U.S.A.
• http://www.powells.com/• Specialised in books only.
Examples
96
Book databases accessible free of charge: examples in Europe
• Blackwell’s on the Internet (International, academic books):http://www.blackwell.co.uk/
• VLB for books in Germanhttp://www.buchhandel.de/
• For books in Frenchhttp://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)http://www.boeknet.nl/
Examples
97
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Book dealer catalog
database
98
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Book dealer catalog
databases
99
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Book dealer catalog
databases
100
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Multi-dealer database = merged
book dealer databases
Book dealer catalog
databases
101
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Multi-dealer databases = merged
book dealer databases
Book dealer catalog
databases
102
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Multi-dealer databases = merged
book dealer databases
Book dealer catalog
databases
103
Free public access multi-dealer book databases: examples
• http://www.abebooks.com/[accessed 2008]
• http://www.abebooks.fr/offers a user interface in French
• Covers > 10 000 bookshops.
• The company has been acquired by Amazon in 2008.
104
Free public access multi-dealer book databases: examples
• http://www.alibris.com/[accessed 2008]
105
Free public access multi-dealer book databases: examples
• Amazon Marketplace:http://www.amazon.com/[accessed 2009]
• In synergy with the online bookshop Amazon on 1 WWW site: Used books are displayed alongside Amazon’s new books.
• “the world’s biggest online book bazaar”• Subject description is poor.• Take into account delivery costs: postage + tax
106
Free public access multi-dealer book databases: examples
107
Free public access multi-dealer book databases: examples
• http://www.biblio.com/ or http://biblio.com/[accessed 2008]
108
Free public access multi-dealer book databases: examples
• http://www.boekenverkoper.nl[accessed in 2007]
109
Free public access multi-dealer book databases: examples
• http://www.choosebooks.com/[accessed 2008]
110
Free public access multi-dealer book databases: examples
• http://www.tomfolio.com/[accessed 2008]
111
Full-text databases of books: introduction
• Some organisations have scanned the contents of thousands of books, to make them full-text searchable through the Internet.
112
Full-text databases of books: Amazon
• http://www.amazon.com/ and choose BOOKS• Since 2004• Also incorporated in the search engine A9
113
Full-text databases of books: Google Book Search
• http://www.books.google• Since 2005
114
Online Public Access Catalogues:union catalogues of libraries
• Some systems offer access to the merged catalogues of several libraries, so-called ‘union catalogues’.
• Example: Copachttp://www.copac.ac.uk/is accessible free of charge.
Example
115
Online Public Access Catalogues:union catalogues: examples
• European National Libraries, catalogues harvested: http://www.theeuropeanlibrary.org/portal/index.html
Examples
116
Online Public Access Catalogues:union catalogues: examples
• Europeana: documents on European culture.http://www.europeana.eu/portal/Metadata are harvested from co-operating organisations.
Examples
117
Online access databases about journal articles: overview
• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers.
• Many publishers offer searchable bibliographies, but only of their own publications. (for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.
118
Online access databases about journal articles: Ingenta
• Available from: http://www.ingentaconnect.com/• Ingenta allows you to search a bibliographic database of
millions of journal articles, including titles, authors, in many cases abstracts.
• The organisation claims to be “The most comprehensive collection of academic and professional publications”
Example
119
Online access databases about journal articles: Infotrieve ArticleFinder
• Available from: http://www.infotrieve.com/• Infotrieve allows you to search free of charge
in a bibliographic database of the articles of more than 20 000 journal titles and conferenceproceedings, NOT full-text.
• Payment is required to receive the full text of a document.
Example
120
Online access databases about journal articles: Scirus
• The search interface: http://www.scirus.com• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the WWW.
• This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier.
• Offered free of charge by Elsevier.• An article can be downloaded in full-text format only
when a fee has been paid to the publisher.
Example
121
Online access databases about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly information sources, including journal articles.
• A beta (= test) version has been available since November 2004.
• The system is accessible starting from the home page of Google as one of the additional services besides the normal, classical WWW search.
Example
122
Online access databases about journal articles: DOAJ screenshot
Example
123
Online access databases about journal articles: Eric
• http://ericir.syr.edu/Eric/• Eric allows searching a bibliographic database of articles
and other documents in the fields of information science and education.
+ Available in open access, free of charge- Payment is required to receive the full text of a document.
Example
124
Online access databases about journal articles: LISTA
• http://www.libraryresearch.com/• Bibliographic database; covers libraries and information
management, with subjects such as librarianship, classification, cataloging, bibliometrics, online information retrieval, information management and more, from more than 600 periodicals plus books, research reports, and proceedings
• Offered since 2005• Delivered via the EBSCOhost platform+ Free of charge
Example
125
Online access databases about journal articles: Teacher Reference Center
• http://www.TeacherReference.com/• Teacher Reference Center (TRC)
Journal Information for Teachersallows to search popular teacher and administrator trade journals, periodicals, and books
• via the EBSCOhost platform• since 2006+ offered free of charge
Example
126
Online access databases:Web of Science
• One of the bibliographic databases in Web of Knowledgeis the Web of Science.
• This is a bibliographic database that covers the articles published in the most important scientific journals.
Web of Knowledge
Web of Science
Example
127
Finding images on the Internet:introduction
+ Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet.
+ When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).
128Examples
Finding images on the Internet:screen shot of a Google image search
129Example
Finding images on the Internet:examples of search engines
• http://images.google.com/ ! or through http://www.google.com/[accessed in 2009]
• The largest database in this category (at least in 2002…2008).For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.
130
Finding images on the Internet:examples of search engines
• http://www.bing.com/• Available in 2009 in beta = test version.• Replacing
Microsoft Live and Yahoo Search ?
Eample
131
Method 2: Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
132
Federated searchingthrough scattered databases: why?
• Applications:»Finding information in bibliographic databases»Finding the availability of rooms in various hotels»Finding flights to a particular destination offered by
various airline companies»Finding scientific data that are made available by various
computers all over the world
133
Federated searching: application:finding a hotel room in some city
Example
134
Federated searching: application:finding scientific data
Example
• OBIS = Ocean BiogeographicInformation System
• http://www.iobis.org/• Gateway to scientific
data on living systems in the oceans.
• The data reside on many computers all over the world.
135
Hybrid method:merging data + federated searching
Federated search engine
DatabaseDatabase
UserUser UserUser
Search engine Search engine
Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
Search engine
136
Databases accessible over the Internet: example
Example
• http://WorldWideScience.org/• “A global science gateway connecting you to national and
international scientific databases and portals. Accelerates scientific discovery and progress by providing one-stop searching of global science sources.”
137
Meta WWW search systems on a server computer in the WWW
User
Client computer
+WWW
client program
WWW server
computer
InternetWWW
WWW server
computerswith Internet
search systems
In Out
138
Meta-search systems: terminology / vocabulary / synonyms
“multi-threaded search systems”= “multiple search systems”= “multi-search systems”= “meta-search systems”= “intelligent search agents”= “federated search systems”= “portals”
139Examples
Meta-search systems on a server computer
• http://aftervote.com/• http://draze.com/• http://www.all4one.com• http://www.bytesearch.com• http://clusty.com/• http://www.cyber411.com• http://www.dogpile.com = http://dogpile.com/• http://www.go2net.com = http://www.metacrawler.com• http://jux2.com• http://www.kartoo.com• http://www.mamma.com• http://www.museseek.com• http://www.profusion.com• http://www.search.com• http://www.vivisimo.com = http://vivisimo.com/
140
Meta-search systems: server-based: example: Vivisimo
141
Meta-search systems: server-based: example: Vivisimo
• Vivisimo adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster.
• Vivisimo can accomplish this on the fly, that is WITHOUT pre-processing the documents before the search.
142Example
Meta-search systems: server-based: example: Clusty
• Adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster.
• Can accomplish this on the fly, that is WITHOUT pre-processing the documents before the search.
143Example
Meta-search systems: server-based: example: Clusty screenshot in 2006
144
Meta-search systems: disadvantages
- It is not always clear through which Internet indexes the meta-search system will search.
- Not all meta-search systems can search all the major primary search systems; for instance the famous Google Internet index is NOT included in most systems.
- Only a limited number of the results that can be obtained from the various Internet indexes are shown.
145
Free public access book meta-search systems: types
We can make the following distinction between various types of meta-systems for searching:
1. Database resulting from merging several existing smaller databases = aggregator databaseIn this case of books: multi-dealer database = “listing service”
2. Federated search system = cross-database search system
146
Free public access search systems: federated search systems
• Each of the searched target databases can be »a catalogue database managed by the
owner/dealer/shop/seller, as well as
»a multi-dealer database
147
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Book dealer catalog
database
148
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Book dealer catalog
databases
149
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Book dealer catalog
databases
150
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Multi-dealer database = merged
book dealer databases
Book dealer catalog
databases
151
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User
Multi-dealer databases = merged
book dealer databases
Book dealer catalog
databases
152
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User Federated book search systems
Multi-dealer databases = merged
book dealer databases
Book dealer catalog
databases
153
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User Federated book search systems
Multi-dealer databases = merged
book dealer databases
Book dealer catalog
databases
154
Search systems for books that are made available by dealers
descriptions of books & real books for sale
User Federated book search systems
Multi-dealer databases = merged
book dealer databases
Book dealer catalog
databases
155
Free public access federated search systems for books: examples
-
156
Free public access federated search systems for books: examples
• http://www.allbookstores.com/ [accessed 2006]
157
Free public access federated search systems for books: examples
158
Free public access federated search systems for books: examples
• http://www.BookFinder.com/[accessed 2009]
159
Free public access federated search systems for books: examples
• http://www.bookfinder4u.com/ [accessed 2007]
160
Free public access federated search systems for books: examples
• http://www.bookpursuit.com/[accessed 2006]
161
Free public access federated search systems for books: examples
162
Free public access federated search systems for books: examples
• http://www.dealtime.com/ [accessed 2006]
163
Free public access federated search systems for books: examples
• http://www.epinions.com/Books [accessed 2006]
164
Free public access federated search systems for books: examples
• http://www.fetchbook.info/ [accessed 2006]
165
Free public access federated search systems for books: examples
• http://www.gallileus.info/search/[accessed 2006]
166
Free public access federated search systems for books: examples
• http://www.priceminister.com/livres-bd [accessed 2007]• Can search not only books but also other products in
various shops.
167
Free public access federated search systems for books: examples
• http://www.usedbooksearch.co.uk/books.htm[accessed 2008]
• Specialised in used books, not in new books.
168
Free public access federated search systems for books: examples
• http://www.vialibri.net/ [accessed 2008]
169
Free public access federated search systems for books are interesting
• Knowledge about their quality is interesting » for end users as well as for librarians who buy books, » for librarians who serve their users by performing
searches for books, » for librarians who propose databases to their users, for
instance on their library WWW site or who want to include one or several book search engines in their own local system for federated searching through several targets in one action.
170
Online Public Access Catalogues:simultaneous searching
• Some meta-search services allow simultaneous, parallel searching in one search action over several databases of libraries.
171
Online Public Access Catalogues:simultaneous searching: examples
Example
• Simultaneous access to catalogues of libraries related to water, organised by IAMSLIC, using Z39.50
172
Information Retrieval in a World of Scattered Information Sources
3. Comparison of methods for efficient information retrieval
173
Method 1: Merging = aggregating into a searchable database
Search engine Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
UserUser UserUser
Dor
174
Comparison of methods for efficient information retrieval
• Merged=aggregated databases react faster than federated search systems (in most cases).»Explanation:
They do not need several simultaneous Internet connections & they do not have to merge raw intermediate results into the result that is finally shown to the user.
☺
175
Method 2: Federated searchingthrough scattered databases
Federated search engine
Database Database Database
UserUser UserUser
Search engineSearch engine Search engine
176
Hybrid method:merging data + federated searching
Federated search engine
DatabaseDatabase
UserUser UserUser
Search engine Search engine
Aggregated database
Database or web site
or…
Database or web site
or…
Database or web site
or…
Search engine
177
Comparison of methods for efficient information retrieval
• Federated search systems offer a higher coverage than direct searching of databases or merged databases (in most cases). »Explanation: They can exploit many databases and even
merged=aggregated databases in one search action. For example, in 1 search, they can cover more than 100 million descriptions of physical books= couples of book and dealer (not book titles).
☺
178
Comparison of methods for efficient information retrieval
• Federated search systems offer results that are more up to date than when an aggregated database is searched with contents that is (only) a snapshot made in the past.This is important when data should be very fresh = up-to-date.Examples: booking=reservation systems for flights, hotel rooms
☺
179
Information Retrieval in a World of Scattered Information Sources
Conclusions
180
Conclusions: 2 methods
• A single, simple, standard method = approach = solutiondoes not (yet) exist.
• Two basic methods are common.• They have their own
»advantagesand
»disadvantages.
181
Conclusions:1 dimension
• Up to now we have made primarily the distinction» Merging records in 1 database on 1 computer
& searching this database
» Federated searching in one action of databases on various computers
182
Conclusions:more dimensions
• However, the location of the databases is only 1 aspect / dimension of possible methodological approaches.
• Other dimensions / aspects are for instance: 2. Unification / standardization of database record structures
in fields according to a standard, for better interoperability.
3. Unification / standardization of subject descriptions, for better interoperability.
• This bring us to 3 aspects / dimensions so we can visualize this as a cube.
183
Conclusions:the cube of interoperability
1. Various computers2. Various database field structures3. Various subject description systems
WORST CASE
1. One computer2. One database field structure3. One subject description system
BEST CASE
Inter-operability
184
Methods for efficient information retrieval: conclusions
• For end users, the underlying methods of most information systems are either “not clear” (= negative formulation)“transparent” (= positive formulation)
185
Methods for efficient
information retrieval:
conclusions• The examples given
show at least that progress in this field is impressive.
☺
186
Questions? Suggestions? Remarks?
187
• You are free to copy, distribute, display this work under the following conditions:»Attribution:
You must mention the author.»Noncommercial:
You may not use this work for commercial purposes.»No Derivative Works:
You may not change, modify, alter, transform, or build upon this work.
• For any reuse or distribution, you must make clear to others the license terms of this work.