Grootschalige digitalisering van archivalia

74
Grootschalige digitalisering van archivalia Marc Holtman

description

Cursus technische aspecten bij grootschalige digitalisering van archivalia.

Transcript of Grootschalige digitalisering van archivalia

Page 1: Grootschalige digitalisering van archivalia

Grootschalige digitalisering van archivalia

Marc Holtman

Page 2: Grootschalige digitalisering van archivalia

Analoge originele documenten in depot

Scans

Metadata

(digitaal) ontsluiten

Scanning

zoeken en raadplegen

Usability

Vindbaarheid

Page 3: Grootschalige digitalisering van archivalia

Analoge originele documenten in depot

Scans

Metadata

(digitaal) ontsluiten

Scanning

zoeken en raadplegen

Usability

Vindbaarheid

Page 4: Grootschalige digitalisering van archivalia

Analoge originele documenten in depot

Scans

Scanning

Begrippen

Principes

Technische aspecten

Economische principes

Werkproces

Bij grootschalige digitalisering van archivalia

Page 5: Grootschalige digitalisering van archivalia

Voorstelrondje

Page 6: Grootschalige digitalisering van archivalia

http://195.242.171.17/hga/virtuelestudiezaal/WebsitePubliek/BeschrijvingOverlijdensAkte.aspx?nn=3&akteid=6305722&containerid=6305718

http://www.nationalarchives.gov.uk/dol/images/examples/pdfs/PROB-11-1-0029.pdf

http://beeldbank.nationaalarchief.nl/nl/afbeeldingen/thema/81-topstukken-van-het-nationaal-archief

http://www.hsp.org/default.aspx?id=1035

http://recordsearch.naa.gov.au/SearchNRetrieve/Interface/DetailsReports/ItemDetail.aspx?Barcode=638501

http://burgerlijkestand.zederik.nl/atlantis/?application=nadere%20toegangen&database=nt&entrypoint=personen&query=&vanaf=0&sessienummer=1.ef00&relatienr=1&extra1=&extra2=&aantal_per_pagina=9&service=object&templatename=item.htm&recordnumber=0&detailobject=%3cd%7cw2kzed02%3aD%3a%2fatlantisdatabase%2fatlantis.nt.db%7c86%7c0%7caca04%7c10000%3e#

Greep uit voorbeelden huiswerk

Boeken (Internet Archive)Google books (Google Books)Munich Digitisation Centre (Digital collections)

Page 7: Grootschalige digitalisering van archivalia

VisitorsYear Reading rooms Website1982 24.027  1988 29.788  1992 27.738  1998 26.5982002 25.0142006 17.958200720082009

Page 8: Grootschalige digitalisering van archivalia

VisitorsYear Reading rooms Website1982 24.027  1988 29.788  1992 27.738  1998 26.598 40.0482002 25.014 224.0502006 17.958 512.592200720082009

Page 9: Grootschalige digitalisering van archivalia

VisitorsYear Reading rooms Website1982 24.027  1988 29.788  1992 27.738  1998 26.598 40.0482002 25.014 224.0502006 17.958 512.5922007 92.6782008 118.3122009 106.000

Page 10: Grootschalige digitalisering van archivalia

VisitorsYear Reading rooms Website1982 24.027  1988 29.788  1992 27.738  1998 26.598 40.0482002 25.014 224.0502006 17.958 512.5922007 92.678 520.4832008 118.312 538.4832009 106.000 531.143

Page 11: Grootschalige digitalisering van archivalia

De tijd dat digitalisering een optie was is voorbij. Het aantal gebruikers van de website stijgt sterk. Maar ook de verwachting dat de stukken online te raadplegen zijn.

“We want to use the internet to facilitate historical research; Everybody should be able to use all archival collections at home 24/7”

Page 12: Grootschalige digitalisering van archivalia

V. Hoe lang duurt het om alles te digitaliseren

1 meter = 7.000 scans

Productie = 10.000 scans per week

A. 431 jaar

V. Hoeveel scans levert digitalisering van 32 kilometer archief

A. 224.000.000 scans

Page 13: Grootschalige digitalisering van archivalia

Aantal te digitaliseren documenten in een archief loopt in een project al snel in de honderdduizenden tot miljoenen

Incidentele en structurele kosten moeten ook bij deze enorme aantallen beheersbaar blijven

Incidentele en structurele kosten afhankelijk van:

B. Werkprocessen: organisatie reproductieproces

A. Technische aspecten: kwaliteitsnorm scans / bestandsgrootte

Page 14: Grootschalige digitalisering van archivalia

99% van alle documenten uit een archief betreft tekstdocumenten

Richtlijnen en best practices digitalisering nationaal en internationaal vaak:

- beeldmateriaal: foto’s, prenten tekeningen etc- High-end kwaliteit (preservering digitalisering / vervanging)

Maar wat als het grootschalig wordt, en uitgangspunt vooral “raadpleging” (oftewel “lezen van de tekst”) is?

Page 15: Grootschalige digitalisering van archivalia

Principes, begrippen, technische aspecten en werkproces

Grootschalig digitaliseren van archivalia

Page 16: Grootschalige digitalisering van archivalia

We Scan

We Store

We Do

Principles, image quality and workflow principles

Compression and filesize

Workflow, tools and practical issues

Page 17: Grootschalige digitalisering van archivalia

Goals of digitization projects vary from access to substitution of the originals

In every project quality standard and method are set, depending on purpose and type of material

For all projects we have one workflow

We always work on project basis

We scan

Digitization at the Amsterdam City Archives in general

Page 18: Grootschalige digitalisering van archivalia

We scan

1. At large scale

the more scans being made, the lower the price per scan

Large scale production is a prerequisite in order to keep production costs as low as possible

Page 19: Grootschalige digitalisering van archivalia

Documents that are being digitized in this reproduction process can have the following forms

We scan

Small and large sizeBound and loose-leafed entitiesCard indexesOld and modern materialLow and high contrast documentsText alone, text and image togetherHybrid forms

3. A broad spectrum of document types

Page 20: Grootschalige digitalisering van archivalia

Costs for producing and storing scans are determined to a high extent by the quality standard set for the scans

Purpose of the scans: archival research using the web, straight from screen or print

We scan

4. For archival research from screen or print

The higher the standard of quality, the higher the costs will be

In order to keep costs low it is prudent to allow the standard of quality follow from the requirement the end user places on the scan

Textual information legible in de originals must be legible in the scans

Page 21: Grootschalige digitalisering van archivalia

But has no added value for the customer at all

A quality higher than that inevitably will push up both incidental and structural costs

We scan

4. For archival research from screen or print

Specified (basic) quality standard:

Reproduction of all significant information

Reproduction of details which are not part of the textual information is not required

Page 22: Grootschalige digitalisering van archivalia

We scan

Scan quality and legibility

High quality scan

Modified scan (contrast)

Optimal tonal range

Example: very “light” original

Excellent flexibility

Poor tonal rangeLittle flexibility

Experience in practice learns that what is experienced as being “good legibility” is very personal.

We decided to solve this problem with a smart filter in the document viewer.

Poor legibility

Excellent legibility

Which one would you buy?

Page 23: Grootschalige digitalisering van archivalia

Skimming on the quality of scans (it can be better) is purely an economic decision, not one taken on principle

We scan

4. For archival research from screen or print

Price comparison scanning costsPrice rates scanning, external partnerHigh-end 3 – 10 $

Legibility 0,30 – 0,75 $

Legibility, auto-feed 0,05 $

It does make sense to let the standard of quality follow from the purpose the end-uses places on of the scans

Page 24: Grootschalige digitalisering van archivalia

This way damage or loss of the originals is ruled out

After digitization the originals can not be requested in the reading room anymore

We scan

5. For conservation and security

The scans in the scanning on request service are made for the purpose of access / archival research

Not as a substitute for the originals

Nevertheless, digitization does have a real conservation function

Conservation of the originals remains the major concern

Page 25: Grootschalige digitalisering van archivalia

A file can contain one – hundreds of documents

We scan

By definition the entire file is scanned

Never just a selection of pages

There are a few reasons for this:

6. Always complete files

The costs for scanning are not so much a factor of quantity, but rather of the manual processing involving in it

In the originals or the metadata it has to be indicated which documents are being digitized

When shown in the Archiefbank, the user expects completeness

When non-scanned pages have to be digitized later, the entire preparation process has to be gone through once again

Page 26: Grootschalige digitalisering van archivalia

Contracting out of scanning was a logical choice

We scan

The in-house scan facilities are not designed for large-scale digitizing

The complexity of the workflow and material to be scanned calls for

Investing only makes sense by very high production, organized on a large scale

7. Contracting out the scanning to external partners

Specialized hard- and softwareSpecialized set-upsKnowledgeVery complex technical infrastructure

Page 27: Grootschalige digitalisering van archivalia

This calls for intensive collaboration

Also, the workflows of archive and digitizer have to dovetail

We scan

There are many scanning companies

Most do have experience in bulk processing

But not in this degree of complexity and diversity

7.

Contracting out scanning is more than awarding a contract to a supplier

Contracting out the scanning to external partners

Page 28: Grootschalige digitalisering van archivalia

Customers think a low price is important

This means that costs for producing and storing scans have to be as low as possible

Archival research easily runs into the use of dozens to hundreds of documents

We scan

The price of an ordinary copy in our reading room should be the benchmark

Low costs

100 scans should not cost $ 100

The costs when purchasing scans online should be competitive with travel costs when visiting our reading room

Page 29: Grootschalige digitalisering van archivalia

We use a combination of 1 and 3

We store

Storage costs still are considerably high when producing large quantities of scans

In order to bring structural costs down file size of the scans has to be as low as possible

This can be achieved in three ways

Scans with a file size as small as possible

1. Skimming on resolution

3. Using (lossless or lossy) compression on the files

2. Skimming on bit depth / amount of colors (only possible in formats like TIFF and PNG)

Page 30: Grootschalige digitalisering van archivalia

Hoe fijner het gebruikte raster bij scanning, hoe meer informatie, hoe hoger de detaillering

Maar, hoe dan ook sterke vereenvoudiging van de werkelijkheid

Resolutie

Op een bepaald detailniveau zullen altijd de afzonderlijke “rastercellen” zien

Voor tekstdocumenten moet het raster een fijnmazigheid hebben die overeenkomt met details uit de tekstuele informatie. Een punt op een i moet als zodanig nog te onderscheiden zijn

Maar bijvoorbeeld details in de structuur van het papier hoeven in de scan niet zichtbaar te zijn

Page 31: Grootschalige digitalisering van archivalia

Resolutie wordt meestal uitgedrukt in DPI (Dots Per Inch)

Of – eigenlijk beter – PPI (Pixels Per Inch)

DPI zegt dus iets over de informatiedichtheid per lengtemaat

Resolutie

En daarmee iets over de theoretisch haalbare kwaliteit

Maar verder helemaal niets over de objectieve kwaliteit van een scan

Zowel een scanner van € 50,- van de Aldi, als een high-end scanner van € 50.000 kunnen op 300 dpi scannen

Maar de kwaliteit van de geproduceerde scan zal duidelijk verschillen

Meten van het detailoplossend vermogen van een scanner kan met behulp van controlekaartjes waarmee zogenaamde lijnenparen worden gemeten

Page 32: Grootschalige digitalisering van archivalia

Benchmark resolutie is meestal 300 dpi

Dit is gebaseerd op de kleinste letter e ( 1 mm) in drukwerk

Niet alle documenten bevatten details die zo klein zijn

Resolutie

benodigde resolutie kan o.a. worden berekend met de zogenaamde Quality Index:

http://www.library.cornell.edu/preservation/tutorial/conversion/conversion-04.html

Page 33: Grootschalige digitalisering van archivalia

Resolutie is in sterke mate bepalend voor de bestandsgrootte:

Resolutie (A4) Bestandsgrootte

300 dpi 24 Mb

400 dpi 44 Mb

800 dpi 177 Mb

1600 dpi 708 Mb

3200 dpi 2,8 Gb

Resolutie

Page 34: Grootschalige digitalisering van archivalia

Resolutie

Voorbeelden

300 dpi

200 dpi

150 dpi

Page 35: Grootschalige digitalisering van archivalia

Resolutie

Conclusie: bij 150 dpi: kleine bestanden en meeste tekst nog prima leesbaar

Maar, is het verstandig om hier bij digitaliseren van uit te gaan?

Bij lage resolutie ook lagere structurele beheerkosten. Over enkele jaren wellicht met betere technologie opnieuw scannen.

Maar niet voldoende wanneer we in de toekomst op basis van deze images in een hogere kwaliteit willen leveren, OCR toe willen passen en/of willen converteren naar betere compressie- en bestandsformaten.

Keuze afhankelijk van doelstellingen, middelen, aantallen

Page 36: Grootschalige digitalisering van archivalia

Kleur

Een pixel is een vakje met een enkele kleur

De kleinste eenheid van een digitaal bestand is een bit: deze heeft de waarde 0 of 1

Wanneer een pixel uit 1 bit bestaat kan deze pixel de waarde zwart (0) of wit (1) hebben

Willen we meer kleuren kunnen definiëren bij een pixel dan zullen we het aantal bits per pixel uit moeten breiden

Met 8 bits (die elk de waarde 1 of 0) aan kunnen nemen zijn 256 combinaties, en dus kleuren mogelijk (bijvoorbeeld 0 0 0 1 0 0 1 1 )

Kleurdiepte: bits en bytes

De meeste camera’s gebruiken 8 bits per kleurkanaal (in totaal dus 24 bits)

Hiermee zijn 16,7 miljoen kleuren mogelijk

Page 37: Grootschalige digitalisering van archivalia

24 bits (8 bits per kleurkanaal)

8 bits, grijswaarden

1 bit, zwart-wit

Page 38: Grootschalige digitalisering van archivalia

Compressie

Methode waarmee de informatie efficiënter beschreven kan worden

Peer Spel Spel Spel

Spel Peer Peer Spel

Spel Spel Peer Peer

Opslaan: 48 letters

P = Peer

S = Spel Woorden coderenCompressieBestandsgrootte neemt af

Page 39: Grootschalige digitalisering van archivalia

Compressie

P S S S

S P P S

S S P P

Opslaan: 12 letters (plus coderingstabel

P = Peer

S = Spel Resultaat

Page 40: Grootschalige digitalisering van archivalia

Compressie

Twee soorten compressie:

A. Lossless (exact omkeerbaar)Er gaat geen informatie verlorenVergelijk het met een kussen waar je alle lucht uitdrukt voor je deze verpakt. Haal je het kussen uit de verpakking dan wordt het weer exact het kussen zoals het was voor verpakking.

B. Lossy (niet exact omkeerbaar)Bepaalde informatie wordt weggegooidWeer drukken we lucht uit het kussen, maar omdat we een nog kleinere verpakking willen halen we ook een paar veertjes weg. Dit hoeft niet erg te zijn, want wellicht geeft het gemis van een paar veertjes in het gebruik geen oncomfortabeler kussen. Alleen, weggegooide veertjes zullen ook bij het opnieuw uit de verpakking halen niet meer worden toegevoegd.

Page 41: Grootschalige digitalisering van archivalia

Compressie en informatieverlies

Een veelgehoorde stelling:

Lossy compressie niet gebruiken bij opslag van images, want bij lossy compressie treedt informatieverlies op

Bij lossy compressie treedt inderdaad informatieverlies op, maar dat hoeft niet per definitie verlies van betekenisvolle informatie te betekenen

Sowieso is beter is om te zeggen: verlies van informatie ten opzichte van het ongecomprimeerde bestand.

Scanning is namelijk - ten opzichte van het origineel - onlosmakelijk verbonden met verlies van informatie, ook bij toepassing van lossless compressie.

Page 42: Grootschalige digitalisering van archivalia

Lossy compressie

Voorbeelden

JPEG kwaliteit 10 (300 dpi)

JPEG kwaliteit 12 (300 dpi)

JPEG kwaliteit 4 (300 dpi)

JPEG kwaliteit 4 (200 dpi)

JPEG 2000, part 6

Page 43: Grootschalige digitalisering van archivalia

Compressie en duurzaamheid

Veelgehoorde stelling:

Gecomprimeerde bestanden hebben een grotere kans om corrupt te raken dan niet gecomprimeerde bestanden. Daarom mag er geen datacompressie worden toegepast.

Uit onderzoek is gebleken dat deze stelling niet juist is.

Andere oplossingsrichting voor preservering: redundantie in opslagJuist gecomprimeerde bestanden lenen zich hier goed voor

Page 44: Grootschalige digitalisering van archivalia

We store

Resolution, compression and legibility: an example

300 dpi, high quility JPEG

200 dpi, low quility JPEG

Scans with a file size as small as possible

Page 45: Grootschalige digitalisering van archivalia

FilesizeFormat Compression Type Resolution Color Avg 500.000 %TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%

JPEG

Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%

Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%

Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%

Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%

JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%

Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%

Comparison between file format, compression, resolution and file size

Scans with a file size as small as possible

We store

Page 46: Grootschalige digitalisering van archivalia

FilesizeFormat Compression Type Resolution Color Avg 500.000 %TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%

JPEG

Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%

Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%

Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%

Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%

JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%

Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%

TIFF uncompressedComparison between file format, compression, resolution and file size

Scans with a file size as small as possible

We store

Page 47: Grootschalige digitalisering van archivalia

FilesizeFormat Compression Type Resolution Color Avg 500.000 %TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%

JPEG

Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%

Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%

Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%

Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%

JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%

Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%

JPEG (psd) 10Comparison between file format, compression, resolution and file size

Scans with a file size as small as possible

We store

Page 48: Grootschalige digitalisering van archivalia

FilesizeFormat Compression Type Resolution Color Avg 500.000 %TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%

JPEG

Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%

Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%

Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%

Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%

JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%

Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%

JPEG (psd) 4Comparison between file format, compression, resolution and file size

Scans with a file size as small as possible

We store

Page 49: Grootschalige digitalisering van archivalia

FilesizeFormat Compression Type Resolution Color Avg 500.000 %TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%

JPEG

Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%

Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%

Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%

Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%

JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%

Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%

JPEG2000 losslessComparison between file format, compression, resolution and file size

Scans with a file size as small as possible

We store

Page 50: Grootschalige digitalisering van archivalia

We store

Comparison storage costs

Fileformat Storage Costs 1 year Costs 10 yearsTiff uncompressed 11 TB $ 77.000 $ 770.000JPEG 10 1,1 TB $ 7.700 $ 77.000JPEG 4 (200 dpi) 124 GB $ 868 $ 8.680JPEG 2000 (part 1, ll) 6 TB $ 42.000 $ 420.000

Storage of 500.000 images Avg size per scan uncompressed = 22,1 MB

Price rate: 1 TB, storage in a controlled e-repository environment on two separate locations, including IT costs

$ 7.000 (NLD, nov 2009)

Scans with a file size as small as possible

(File)size still does matter!

Page 51: Grootschalige digitalisering van archivalia

Projects with different goals, document types and partners take place at the same time

A streamlined, standardized process is indispensable when digitizing on a large scale

Guidelines and best practices often take no account of these complex factors and the amount of scans to be produced

We developed a process in which large scale and flexibility are starting points

All digitization projects follow this process

Developing the reproduction process

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

We Do

Page 52: Grootschalige digitalisering van archivalia

We developed a simple, but effective workflow application in-house

This asks for workflow management with a user-friendly application

For all projects, at any moment, it has to be clear:

We Do

What the current status is of each to digitize unitWhere each unit can be locatedWhat current and succeeding tasks are to be performed on each unit

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Developing the reproduction process

Page 53: Grootschalige digitalisering van archivalia

In the following slides we focus on the weekly production of 10.000 scansin the digitizing on request service

We developed a simple, but effective workflow application in-house

This asks for workflow management with a user-friendly application

For all projects, at any moment, it has to be clear:

We Do

What the current status is of each to be digitized unitWhere each unit can be locatedWhat current and succeeding tasks are to be performed on each unit

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Developing the reproduction process

Page 54: Grootschalige digitalisering van archivalia

All public files can be requested for digitization via the findings aids in the Archiefbank

Just by clicking on the “digitize” button

Production of 10.000 scans on weekly basis

1. Requesting for digitization

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 55: Grootschalige digitalisering van archivalia

A unit to be digitized must be able to be identified at each step of the handling process

The units therefore get a unique meaningless order number

An order number is provided by the metadata management systemand is the basis for

In practice: all units to be digitized get an order ticket

2. Providing ordernumbers

Communication with the digitizerScanningAssigning filenamesRegistration of filenamesBilling by digitizer

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 56: Grootschalige digitalisering van archivalia

A unit to be digitized must be able to be identified at each step of the handling process

The units therefore get a unique meaningless order number

An order number is provided by the metadata management systemand is the basis for

In practice: all units to be digitized get an order ticket

2. Providing ordernumbers

Communication with the digitizerScanningAssigning filenamesRegistration of filenamesBilling by digitizer

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 57: Grootschalige digitalisering van archivalia

The workflow system generates a list of all originals to asses from the repositories

The list is sorted on repository / shelf to make retrieval efficient

We Do

3. Assessing the originals

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 58: Grootschalige digitalisering van archivalia

All assessed originals are stored in a special room

In this room all checks are executed

We Do

4. Checking the originals

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 59: Grootschalige digitalisering van archivalia

Information about the originals in our management systems is not always complete

If an item falls into one of these categories the request is rejected

B. Condition of the material

A rough check of the originals takes place

A. Content

We Do

4. Checking the originals

Copyrights Publicity Privacy

Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 60: Grootschalige digitalisering van archivalia

Information about the originals in our management systems is not always complete

If an item falls into one of these categories the request is rejected

B. Condition of the material

A rough check of the originals takes place

A. Content

We Do

4. Checking the originals

Copyrights Publicity Privacy

Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 61: Grootschalige digitalisering van archivalia

Material preparation is limited to the most minimal

We Do

4. Checking the originals

Staples are being removed as a ruleSmall reparations are executed by our restoration employees

The sequence of the originals as found in the repository is not checked or altered

We Do

We don’t

The originals are not numbered

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 62: Grootschalige digitalisering van archivalia

But this is only true when the numbering tallies exact, because:

Numbering the originals has one advantage:

We Do

Not number the originals

The completeness of the scans (compared to the originals) can be guaranteed

Numbers that are assigned double lead to illogical end numbers (100 scans: scan 100 has been numbered as 99)

Experiments with numbering in practice learned that faultless numbering can not be realized

A missing number in a sequence of scans leads to the conclusion that there is one original that has not been scanned

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 63: Grootschalige digitalisering van archivalia

Securing completeness can be realized by other means:

We Do

Comparing scans to originals 1:1 after digitizationScanning the originals twice

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

# scans = 365 # scans = 365

Low quality High quality master files

Not number the originals

Page 64: Grootschalige digitalisering van archivalia

For secure transport, special flight cases are used

We Do

5. Transport

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 65: Grootschalige digitalisering van archivalia

It has to be perfectly clear which filenames this should be

After scanning the scan operator or data manager has to assign filenames to the scans

Because, when the meaning changes, filenames should change too

As a rule filenames contain no meaningful information

We Do

6. / 7. Scanning and assigning filenames

Filenames are the key between scans metadata

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 66: Grootschalige digitalisering van archivalia

Assigning filenames at City Archives AmsterdamCustomer request Management systems

First 6#: ordernrLast 6#: serial nr

Order ticket

FilenameScanning the order

A20758000001

A20758000002

A20758000003

RangeA20758000001 – A20758999999

Archive 195File 836 Order: A20758

A20758000004

A20758000005

Scan report

A20758000001A20758000002A20758000003A20758000004A20758000005

12 digits

Registrationfilenames

Import

Page 67: Grootschalige digitalisering van archivalia

An application from which all checks can be executed is in development

Scans and metadata are checked efficiently

Where possible checks are automated

10. 11. Checking scans and metadata

Check MethodViruses Virus checkerData integrity MD-5 checksum comparisonFile format validity Jhove

Quality scansVisual check reference scansVisual check production scans

Completeness Depends on projectFilenames Script

Basic checks

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 68: Grootschalige digitalisering van archivalia

After import the “order for digitization” of each unit is completed

After approving of all checks, scans and metadata are imported into the managementsystems

The imports are executed automatically, on basis of scripts and standard protocols for file transfer

13. 14. Import metadata and scans into management systems

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 69: Grootschalige digitalisering van archivalia

After import the metadata are optimized for the search system

For exchange of finding aids we use EAD

From any workstation at the archive, directly via the CMS of the website

The website is hosted from an external location

Metadata are uploaded to the webserver by simple HTTP transfer

18. Import metadata into the website

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 70: Grootschalige digitalisering van archivalia

Until then scans are transported by use of portable USB harddisks

Bandwith of the internet connections at the archive is still too small for direct sFTP (or suchlike) upload of large quantities of scans to the webserver

It seems likely that in the near future this will change

17. Import scans into the website

Transport medium

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 71: Grootschalige digitalisering van archivalia

Derivates for use of thumbnails and zoom / contrast functionality are made

After connecting the harddisk to the server the import process starts

Some basic checks are executed on the scans

Import17. Import scans into the website

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 72: Grootschalige digitalisering van archivalia

The requester can decide whether to buy scans or not

When both scans and metadata have been imported, automatically an email is send to the requester for digitization

This email contains a link to the finding aid and thumbnails on the website

Request completed

We Do

The happy customer:

2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 73: Grootschalige digitalisering van archivalia

MARAC Conference October 30 2009

The requester can decide whether to buy scans or not

When both scans and metadata have been imported, automatically an e-mail is sentto the requester for digitization

This email contains a link to the finding aid and thumbnails on the website

Request complete!

The happy customer:

We Do2. Providing Ordernr(s)

3. Assessing the originals

4. Preparing the originals

5. Transport

6. Scanning 7. Assinging filenames

8. Transport

9. Checking originals

10. Checking scans

13. Import in controled

storage system

15. Export scans

17. Import scans

16. Export metadata

18. Import metadata

14. Import in metadata system

11. Checking metadata

1. Requesting digitalization

12. Originals back to

repositry

Page 74: Grootschalige digitalisering van archivalia

Costs Archiefbank (2008)Digitsation on request € 140,000Webservices € 52,000Digitization projects € 200,000

Income Archiefbank (2008)

Digitsation on request € 100,000Project funding € 330,350Government € 40,000

Costs and income