2. A Linked Data Approach to Disclose Handwritten ... · Illustrated Handwritten Archives.” In...

4
10 RCE (2013). RADAR, a Relational Archaeobotanical Database for Advanced Research. Rijksdienst voor het Cultureel Erfgoed, Ministerie van Onderwijs, Cultuur en Wetenschap. Available online at: https://archeologieinnederland.nl/ bronnen-en-kaarten/radar van Reenen, G. (2007). Snippendaalcatalogus database. Hortus Botanicus Amsterdam. Available online at: http://dehortus.nl/en/Snippendaal-Catalogue Schooneveld-Oosterling, J., Knaap, G., Karskens, N., Smit-Maarschalkerweerd, D., Tetteroo, S., van den Tol, J., Nijhuis, H., van Wijk, K., Kunst, A., Buijs, J., Jongma, M., Boer, R. (2013). Boekhouder- Generaal Batavia. Huygens ING. Available online at: http://resources.huygens.knaw.nl/ boekhoudergeneraalbatavia van der Sijs, N. (2001). Chronologisch Woordenboek. Available online at: http://dbnl.org/tekst/sijs002chro01_01/ 2. A Linked Data Approach to Disclose Handwritten Biodiversity Heritage Collections Lise Stork, Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands [email protected] Andreas Weber, Department of Science, Technology and Policy Studies (STePS), University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands [email protected] Over the last decade, natural history museums in and beyond the Netherlands have heavily invested in digitizing and extracting biodiversity information from manuscript and specimen collections (Heerlien et al. 2015; Pethers and Huertas, 2015; Svensson, 2015). In particular handwritten fieldnotes describing occurrences of species in nature (see illustration) form an important but often neglected starting point for researchers interested in long-term habitat developments of a specific area and the history of scientific ordering, writing and collecting practices (Blair 2010; Bourget 2010; Eddy 2016). In order to disclose handwritten descriptions of flora and fauna and related specimen and drawings collections, natural history museums usually resort to manual enrichment methods such as full text transcription or keyword tagging (Ridge 2014; Franzoni et al. 2014). Often these methods rely on crowdsourcing, where online volunteers annotate pages with unstructured textual labels (Field Book Project 2016). More recently, curators of archives, data scientists and historians have started to experiment with semi- automatic annotation systems for historical manuscript collections such as the MONK system (Schomaker et al. 2016). Since MONK is a supervised learning system, a large amount of properly recognized textual labels is necessary to safeguard the system’s recognition abilities. Thus, although such practices have the potential to yield high quality data, merely annotating pages with unstructured textual labels raises two problems: First, without suggestions driven by semantic

Transcript of 2. A Linked Data Approach to Disclose Handwritten ... · Illustrated Handwritten Archives.” In...

Page 1: 2. A Linked Data Approach to Disclose Handwritten ... · Illustrated Handwritten Archives.” In Book of Abstracts, Digital Humanities Conference 2016 Krakow, 764–66, 2016. Svensson,

10

RCE(2013).RADAR,aRelationalArchaeobotanicalDatabaseforAdvancedResearch.RijksdienstvoorhetCultureelErfgoed,MinisterievanOnderwijs,CultuurenWetenschap.Availableonlineat:https://archeologieinnederland.nl/bronnen-en-kaarten/radar

vanReenen,G.(2007).Snippendaalcatalogusdatabase.HortusBotanicusAmsterdam.Availableonlineat:http://dehortus.nl/en/Snippendaal-Catalogue

Schooneveld-Oosterling,J.,Knaap,G.,Karskens,N.,Smit-Maarschalkerweerd,D.,Tetteroo,S.,vandenTol,J.,Nijhuis,H.,vanWijk,K.,Kunst,A.,Buijs,J.,Jongma,M.,Boer,R.(2013).Boekhouder-GeneraalBatavia.HuygensING.Availableonlineat:http://resources.huygens.knaw.nl/boekhoudergeneraalbatavia

vanderSijs,N.(2001).ChronologischWoordenboek.Availableonlineat:http://dbnl.org/tekst/sijs002chro01_01/

2.ALinkedDataApproachtoDiscloseHandwrittenBiodiversityHeritageCollectionsLiseStork,LeidenInstituteofAdvancedComputerScience(LIACS),LeidenUniversity,NielsBohrweg1,2333CALeiden,[email protected]

AndreasWeber,DepartmentofScience,TechnologyandPolicyStudies(STePS),UniversityofTwente,POBox217,7500AEEnschede,[email protected]

Overthelastdecade,naturalhistorymuseumsinandbeyondtheNetherlandshaveheavilyinvestedindigitizingandextractingbiodiversity information frommanuscript and specimencollections(Heerlien et al. 2015; Pethers and Huertas, 2015; Svensson, 2015). In particular handwrittenfieldnotesdescribingoccurrencesofspeciesinnature(seeillustration)formanimportantbutoftenneglectedstartingpointforresearchersinterestedinlong-termhabitatdevelopmentsofaspecificareaand thehistoryof scientificordering,writingandcollectingpractices (Blair2010;Bourget2010;Eddy2016).Inordertodisclosehandwrittendescriptionsof flora andfauna and relatedspecimenanddrawingscollections,natural historymuseums usuallyresort tomanualenrichmentmethods such as full texttranscriptionorkeywordtagging(Ridge2014;Franzonietal.2014).Oftenthesemethodsrelyoncrowdsourcing, whereonlinevolunteersannotatepageswithunstructuredtextual labels (FieldBookProject2016).More recently, curatorsofarchives,datascientistsandhistorianshavestartedtoexperimentwithsemi-automaticannotationsystemsforhistoricalmanuscriptcollectionssuchastheMONKsystem(Schomakeretal.2016).SinceMONKisasupervisedlearningsystem,alargeamountofproperlyrecognizedtextuallabelsisnecessarytosafeguardthesystem’srecognitionabilities.

Thus,althoughsuchpracticeshavethepotentialtoyieldhighqualitydata,merelyannotatingpageswithunstructuredtextuallabelsraisestwoproblems:First,withoutsuggestionsdriven by semantic

Page 2: 2. A Linked Data Approach to Disclose Handwritten ... · Illustrated Handwritten Archives.” In Book of Abstracts, Digital Humanities Conference 2016 Krakow, 764–66, 2016. Svensson,

11

knowledge, itwill be hard for volunteers or amachine to start annotatinghandwrittenpages.Notonlyinthecontextofourcasestudy,whichdealswithfieldnoteswritteninearly nineteenth centuryinsular SoutheastAsia, but also in the contextof othermanuscriptcollections, one needs athorough knowledge of paleography, and historical and taxonomicbackgroundinformation(CauserandTerras2014).Semanticscanaidtheannotationprocesswhendealingwithambiguityorprovidesuggestionsincaseswherewordsarehardtoreadandtoolittleexampleinstancesareavailable.For instance,whenafieldnotedescribesanexpedition inEast-Java,aspeciesoffrogsofWest-Celebescanberuledout.Second,unstructuredtextualannotationwilleventuallyresultinaninefficientsearchprocessonthesideoftheuser.Traditionalkeyword-basedsearch leadstomanyirrelevantresultsorrequiresspecificpriorknowledgeregardingthecontent.Toanswermoregeneralandexpressivequeries,semanticrelationsbetweenannotationsneedtobeconsideredaswell(Elbassuoni,etal.2010).

Inordertohelpsolvesuchproblemsthispaperarguesforthedevelopmentandapplicationof asemantic model for semi-automatic semantic annotation. The model aggregates existingmetadatastandardsandontologies,followingtheLinkedDataprinciples,andpreparesthemforsemantically annotating and interpreting theNamedEntities (NEs) in the fieldnotesofdigitizednaturalhistoricalcollections.10

Thecasestudyofthispaperisacollectionof8000fieldnotesgatheredbytheCommitteefor NaturalHistory of the Netherlands Indies (Natuurkundige Commissie voor Nederlandsch-Indië,furtherreferredtobytheacronymNC).Inthefirsthalfofthenineteenthcentury,naturalistsoftheNCchartedthenaturalandeconomic state of the IndonesianArchipelagoand returnedawealthofscientificobservationswhicharenowstored in the archives anddepotofNaturalisBiodiversityCenterinLeiden(Mees1994;Klaver2007).Anin-depthhistoricalanalysisrevealsthatHeinrichKuhl(1797-1821), Johan Coenraad vanHasselt (1797-1823) and othertravelersof the NC use thefollowingNEstostructuretheirfieldnotes(seeillustrationdisplayingabundleofNCfieldnotes)whiletravelingininsularSoutheastAsia:collectinglocalities,dates,collectors’names,taxonomicnames,andreferencestootherprintedorhandwrittensources.KuhlandVanHasselt, for instance,regularlyusethe illustrationsofprintedworkssuchastheVoyagededécouvertesauxterresaustrales(1807-1816)byM.F.Péronasvisualpointofreferencefortheirfieldnotedescriptions.WhilelinkstopublishedresourcescanbeeasilyestablishedbylinkingthemtodomainspecificrepositoriesofdigitizedbookssuchastheBiodiversityHeritageLibrary(BHL),collectionlocalities,taxonomicnamesandcollectors’namesaremoredifficulttoprocess.

Inordertobeabletoidentify,annotateandinterlinksuchNEsinasemi-automaticway,thispaperproposestheimplementationofaKnowledgeBase(KB).TheKBhastwogoals:first,theunderlyingdatastructureof theKBenablescross-matchingofresourceswithinandacrossfieldnote

10 The project Semantic Blumenbach thinks in a similar direction, but then with a focus on publishedmaterial(Wettlauferetal.2015).

Page 3: 2. A Linked Data Approach to Disclose Handwritten ... · Illustrated Handwritten Archives.” In Book of Abstracts, Digital Humanities Conference 2016 Krakow, 764–66, 2016. Svensson,

12

collections.InordertorealizethisfunctionalightweightapplicationontologywritteninRDF11andOWL12issuggestedthatservesasaschematosemanticallystructuretheKB.Itexpressesspeciesobservations,ensurestheirprovenanceinrelationtothedigitizedfieldnotesandbuildsonexistingmetadataandontologystandards.Entitiesinturnaredescribedusinguniformresourceidentifiers(URIs).ThisallowsforanintegrationofthefieldnoteannotationsintothewebofLinkedData(LD)andensuresinteroperability with other digital collections (Hallo et al. 2016). Second, the logicalcharacteristicsofthepropertiesintheontologyenableareasonersystemtosuggestpossibleNEs.InordertoprovidepossiblelabelsregardingtheseNEs,theKBisprepopulatedwithlistsextractedfromthesauri,gazetteers,andtaxonomies.Asregardscollectionlocalitieswe,forinstance,drawupontheGEOnetsNamesServer(GNS),alargesemanticallystructureddatabasecontaininghistoricalandpresent-daygeographicallocationsininsularSoutheastAsia.BiologicalspeciesnamescanbedrawnfromtheLinnaeantaxonomyofspecieswhichwasalreadywellestablishedatthetimeoftheNC(Farber2000;Beckman2012).AsregardspersonnameswerelyonthedatabaseCyclopediaofMalaysianCollectorswhichM.J.vanSteenis-Krusemancompiledinthe1960sand1970s.13Takentogether, by prompting users to annotate with terms from the KB, a semantic network ofannotations isformedthat isableto improvethequalityoftheannotationsandbootstrapstheannotationprocess.TheontologyandanimplementationoftheKBbasedonourcasestudy,togetherwithpossibilitiesregardingsupportedqueryingandreasoningtechniques,willbediscussedinmoredetailduringthepresentation.

BibliographyBeckman,J.“TheSwedishTaxonomyInitiative :ManagingtheBoundariesof‘Sweden’and‘Taxonomy’” InScientistsandScholars intheField:Studies intheHistoryofFieldworkandExpeditions,editedbyK.H.Nielsen,H.Harbsmeier,andCh.J.Ries,395–414.Aarhus:AarhusUniversityPress,2012.

Bourguet,M.-N.“APortableWorld:TheNotebooksofEuropeanTravellers(EighteenthtoNineteenthCenturies).”IntellectualHistoryReview20,no.3(2010):377–400.

Causer,T.andM.Terras.“‘“ManyHandsMakeLightWork.ManyHandsTogetherMakeMerryWork”:TranscribeBenthamandCrowdsourcingManuscriptCollections.’” InCrowdsourcingOurCulturalHeritage,57–88.Surrey:Ashgate,2014.

Eddy,M.D.“TheInteractiveNotebook:HowStudentsLearnedtoKeepNotesduringtheScottishEnlightenment.”BookHistory19,no.1(2016):86–131.

Elbassuoni,S.,Ramanath,M.,Schenkel,R.,andWeikum,G.“SearchingRDFGraphswithSPARQLandKeywords”.IEEEDataEng.Bull.,33(1),(2010),16-24.

Farber,P.L.FindingOrderinNature:TheNaturalistTraditionfromLinnaeustoE.O.Wilson.Baltimore,Md.:JohnsHopkinsUniversityPress,2000.

FieldBookProject,SmithsonianNationalMuseumofNaturalHistory:http://naturalhistory.si.edu/fieldbooks/[accessed15February2017].

Franzoni,Ch.andH.Sauermann,“Crowdscience:Theorganizationofscientificresearchinopencollaborativeprojects,”Researchpolicy43,no.1(2014),1-20.

11 https://www.w3org/RDF/[accessedFebruary15,2017].12 https://www.w3org/OWL/[accessedFebruary15,2017].13 Thedatabaseisavailableonline:http://www.nationaalherbarium.nl/FMCollectors/[accessedFebruary15,

2017]

Page 4: 2. A Linked Data Approach to Disclose Handwritten ... · Illustrated Handwritten Archives.” In Book of Abstracts, Digital Humanities Conference 2016 Krakow, 764–66, 2016. Svensson,

13

GEONetsNameServer,http://geonames.nga.mil/gns/html/[accessedFebruary15,2017]

Hallo,M.,etal."CurrentstateofLinkedDataindigital libraries."Journalof InformationScience42.2(2016):117-127.

Heerlien,M.,J.VanLeusen,S.Schnörr,S.DeJong-Kole,N.Raes,andKirsten Van Hulsen. “TheNatural History Production Line: An Industrial Approach to theDigitizationofScientificCollections.”J.Comput.Cult.Herit.8,no.1(February2015):3:1–3:11.

Klaver,Ch.J.J.InseparableFriendsinLifeandDeath:TheLifeandWorkofHeinrichKuhl(1797-1821)andJohanConradvanHasselt(1797-1823),StudentsofProf.TheodorusvanSwinderen.Groningen:Barkhuis,2007.

Mees,G.F.andC.vanAchterberg.“VogelkundigonderzoekopNieuwGuineain1828:terugblikopdeornithologischeresultatenvandereisvanZr.Ms.KorvetTritonnaardezuidwestkustvanNieuw-Guinea.”ZoologischeBijdragen40(1994):3–64.

Péron,F.,N.Baudin,L.C.DesaulsesdeFreycinet,Ch.AlexandreLesueur,andN.-M.Petit.VoyagedeDécouvertesAuxTerresAustrales(Paris :Del’Imprimerieimpériale,1807).

Pethers,H. andB.Huertas. “TheDollmannCollection:ACaseStudyof Linking LibraryandHistoricalSpecimenCollectionsattheNaturalHistoryMuseum,London.”TheLinnean31,no.2(2015):18–22.

Ridge,M.(ed.),Crowdsourcingourculturalheritage(Ashgate:Farnham,2014).

Schomaker,L.,A.Weber,M.Thijssen,M.Heerlien,A.Plaat,S.Nijssen,etal.“MakingSenseofIllustratedHandwrittenArchives.”InBookofAbstracts,DigitalHumanitiesConference2016Krakow,764–66,2016.

Svensson,A.“GlobalPlantsandDigitalLetters:EpistemologicalImplicationsofDigitisingtheDirectors’CorrespondenceattheRoyalBotanicGardens,Kew.”EnvironmentalHumanities6(2015):73–102.

Wettlaufer, J, Ch. Johnson,M. Scholz,M. Fichtner, and S. GaneshThotempudi.“SemanticBlumenbach:ExplorationofText–ObjectRelationshipswithSemanticWeb Technology in theHistory of Science.” Digital Scholarship in the Humanities 30, Suppl. 1(December1,2015):187–98.

3.Linkedculturalevents:Digitizingpasteventsanditsimplicationsforanalyzingandtheorizingthe‘creativecity’HarmNijboer(HuygensING)ClaartjeRasterhoff(UniversityofAmsterdam)

IntroductionThispaperintroduces‘linkedculturalevents’asanovelmethodologicalframeworkthatallowsforthesystematicanalysisofculturalexpressionsintheirurbancontext.Theevents-basedapproachisinspiredbydatasetsdevelopedintheresearchprogramCREATE:CreativeAmsterdam:AnE-HumanitiesPerspective(UniversityofAmsterdam,2014-present).14Inthisprogram,theculturalsectorsofperformingartstakeupaparticularlyprominentposition,asdataonforinstancemusic,theatreandcinemaprogrammingisavailableinvariousformats.Intermsofmethodology,thedata

14 www.create.humanities.uva.nl.