Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of...

166
Stylebook for the T¨ ubingen Treebank of Written German (T¨ uBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra K¨ ubler, Heike Zinsmeister, Kathrin Beck Universit¨ at T¨ ubingen Seminar f¨ ur Sprachwissenschaft Wilhelmstr. 19 D-72074 T¨ ubingen July 2017

Transcript of Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of...

Page 1: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Stylebook for the Tubingen Treebankof Written German (TuBa-D/Z)

Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler,Heike Zinsmeister, Kathrin Beck

Universitat TubingenSeminar fur Sprachwissenschaft

Wilhelmstr. 19D-72074 Tubingen

July 2017

Page 2: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Abstract

This stylebook is an updated version of Telljohann et al. (2015). It describesthe design principles and the annotation scheme for the German treebankTuBa-D/Z developed by the Division of Computational Linguistics (LehrstuhlProf. Hinrichs) at the Department of Linguistics (Seminar fur Sprachwis-senschaft – SfS) of the Eberhard Karls Universitat Tubingen, Germany. Theguidelines focus on the syntactic annotation of written language data takenfrom the German newspaper ’die tageszeitung’ (taz).

The treebank comprises 3,816 articles (104,787 sentences) selected from thetaz editions between 1989 and 1999. The average sentence length is 18.7 wordsand the total number of tokens is 1,959,474. Release 11 in July 2017 is thefinal release of the TuBa-D/Z treebank. Information of how to obtain thedata can be found at:

http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html

Please consult this website in order to ensure that you are using the mostrecent and most complete version of the treebank.

The annotation scheme for the TuBa-D/Z treebank is derived from the verb-mobil treebank for spoken German, developed earlier (1997–2000) by the Di-vision of Computational Linguistics of the SfS (Hinrichs et al. 2000). TheTuBa-D/Z annotation scheme has been extended along various dimensionsto accommodate the characteristics of written texts. In order to ensure thereusability of the data, a surface-oriented annotation scheme has been adoptedthat is inspired by the notion of topological fields and is enriched by a level ofpredicate-argument structure. The linguistic inventory used in the treebankannotation is based on a minimal set of assumptions that are uncontroversialamong major syntactic theories. In this sense it is an attempt at theory-neutrality.

1

Page 3: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Acknowledgements

Funding for the TuBa-D/Z has come from a variety of sources:

• the Competence Center for Text- and Information Technology (Kompe-tenzzentrum fur Text- und Informationstechnologie – KIT)) grant by theMinistry of Science, Research and the Arts Baden-Wurttemberg (fundingsince 2000);

• the collaborative research center (Sonderforschungsbereich) grant SFB441 – Linguistic Data Structures, project A1 – Representation and Au-tomatic Acquisition of Linguistic Data funded by the German ResearchCouncil (Deutsche Forschungsgemeinschaft – DFG);

• the collaborative research center (Sonderforschungsbereich) grant SFB833 – The construction of meaning - the dynamics and adaptivity of lin-guistic structures, project A3 – Disambiguating Discourse Connectivesusing Corpus-induced Semantic Relations funded by the German Re-search Council (Deutsche Forschungsgemeinschaft – DFG);

• the ESFRI research infrastructure project grants D-SPIN and CLARIN-D funded by the Federal Ministry of Education and Research (BMBF)(funding since 2008).

A project of this scale would not be possible without the generous supportfrom many contributors:

Our special thanks go to ’die tageszeitung’ (taz) who kindly granted permis-sion to process the newspaper data and to release the treebank.

We would like to acknowledge Rosmary Stegmann for her many contributionsto the treebank of spoken German in verbmobil. Her research laid the foun-dations for the annotation scheme of that treebank, which has been summa-rized in the ’Stylebook for the German Treebank in verbmobil’ (Stegmannet al. 2000).

We would like to thank Manfred Sailer and Frank Richter for their helpfulcomments and support in form of encouragement and critical discussions fromwhich we could strongly benefit for the challenging task of developing a data-oriented syntactic annotation scheme for spoken as well as for written German.

Furthermore, we are indebted to Tylman Ule for his assistance with part-of-speech tagging of the data and with data conversion.

We would also like to acknowledge the support of Martina Liepert and JornVeenstra, who initiated and developed the integration of named entities intothe annotation scheme.

Moreover, we would like to thank Julia Trushkina (Trushkina 2004) and Yan-nick Versley (Versley et al. 2010) who provided the tools for morphologicalpreprocessing.

2

Page 4: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Furthermore, Yannick Versley (Versley et al. 2010) supported the project bydeveloping a tool for lemma disambiguation and for the automatic integrationof semantic classes of named entities.

The quality of the treebank has been considerably improved by feature ori-ented consistency checks developed by Ventsislav Zhechev. Further consis-tency tests were contributed by Tylman Ule and Frank H. Muller in the courseof their research work in the SFB 441. They deserve special mention for theirsupport.

We would like to thank Marie Hinrichs for managing the complete tool chainand carrying out the many steps of data pre-processing, integration, and post-processing required to support the life cycle of a TuBa-D/Z release.

We would like to thank Vera Moller and Karin Naumann (2007) for annotat-ing anaphora and coreference relations and also for doing an excellent job indocumenting the concepts.

Yannick Versley and Holger Wunsch supported the project in various aspects.In the course of their Ph.D. projects in the SFB 441 they enhanced the con-ceptual aspects of the anaphora resolution as annotated in the treebank. Theyalso wrote mapping and conversion tools for integrating the anaphora anno-tion in the Export-XML format.

For their diligence and dedication to the arduous task of linguistic annotationand of post-editing we thank our research assistants Janne Berlacher, AnneBrock, Armin Buch, Nadine Cetin, Heike da Silva Cardoso, Marisa Delz,Silke Dutz, Katrin Eichler, Emilia Ellsiepen, Steffen Froemel, Holger Gauza,Simone Hartung, Daniel Huttl, Heike Johannsen, Miriam Kashammer, LauraKassner, Sarah Klug, Julia Koch, Janina Kopp, Anuschka Kranz, ChristianKreß, Rebecca Kreß, Michael Kossack, Anne Lohse, Wolfgang Maier, NicoleMaruschka, Kai Metzger, Vera Moller, Simone Muller, Till Pachalli, MajaPietsch, Brigitta Rist, Andreas Rudin, Maria Schmidt, Marie Schreier, InsaStarr, Melanie Storzer, Isabel Trott, and Dominikus Wetzel. They also im-proved the linguistic quality of the annotation by dedicated discussions onproblematic and interesting examples.

3

Page 5: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

The development of the TuBa-D/Z treebank was notably facilitated by anumber of former verbmobil partners whose contributions went well beyondthe call of duty. Hans Uszkoreit and his colleagues at the Saarland Universitykindly provided us with the graphical annotation tool Annotate (Plaehn 1998)which was developed as part of the research project (Teilprojekt C3; Princi-pal investigators: Uszkoreit/Smolka) Nebenlaufige grammatische Verarbeitung(NEGRA) in the collaborative research center (Sonderforschungsbereich) 378.The Annotate tool provides human annotators with a graphical, user-friendlyinterface for annotating and editing trees and also offers database support formaintaining large treebanks. We would like to express our special gratitudeto Thorsten Brants, who has kindly and generously provided us with softwaresupport and user assistance for the Annotate tool from the very beginning ofthe Tubingen treebank project.

4

Page 6: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Contents

List of Tables 8

1 Introduction 9

2 Major Challenges and Design Decisions 11

3 The Theoretical Basis of the Annotation Scheme 143.1 Topological Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.1 The Concept of Topological Fields . . . . . . . . . . . . . . . . . 143.2 Constituent Analysis and Topological Fields . . . . . . . . . . . . . . . . 173.3 General Annotation Principles . . . . . . . . . . . . . . . . . . . . . . . 18

3.3.1 Flat Clustering Principle . . . . . . . . . . . . . . . . . . . . . . 183.3.2 Longest Match Principle . . . . . . . . . . . . . . . . . . . . . . . 183.3.3 High Attachment Principle . . . . . . . . . . . . . . . . . . . . . 18

3.4 The Structure of an Annotated Tree . . . . . . . . . . . . . . . . . . . . 183.4.1 The Levels of Annotation . . . . . . . . . . . . . . . . . . . . . . 183.4.2 The Inventory of Labels . . . . . . . . . . . . . . . . . . . . . . . 193.4.3 What Is a Syntactic Unit? . . . . . . . . . . . . . . . . . . . . . 223.4.4 Printing and Spelling Errors . . . . . . . . . . . . . . . . . . . . . 283.4.5 Isolated Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4.6 Long-Distance Dependencies . . . . . . . . . . . . . . . . . . . . 313.4.7 Empty Categories . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Lemma Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5.1 Lemmatization Rules for POS-Tags . . . . . . . . . . . . . . . . 333.5.2 Lemmatization Rules for Specific Linguistic Phenomena . . . . . 37

4 The Annotation of the Internal Structure of Phrases 404.1 Premodification and Postmodification in Phrases . . . . . . . . . . . . . 404.2 Noun Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2.1 Noun Phrases without Modifiers . . . . . . . . . . . . . . . . . . 404.2.2 Prenominal Modification . . . . . . . . . . . . . . . . . . . . . . 414.2.3 Postnominal Modification . . . . . . . . . . . . . . . . . . . . . . 464.2.4 Appositional Constructions . . . . . . . . . . . . . . . . . . . . . 494.2.5 Foreign Language Material . . . . . . . . . . . . . . . . . . . . . 534.2.6 Named Entity Annotation . . . . . . . . . . . . . . . . . . . . . . 564.2.7 Ordinal Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5

Page 7: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

4.2.8 Cardinal Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.9 Letters and Non-Words . . . . . . . . . . . . . . . . . . . . . . . 664.2.10 Expletive and Other Uses of es . . . . . . . . . . . . . . . . . . . 67

4.3 Determiner Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.4 Prepositional Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.4.1 Prepositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.4.2 Circumpositions and Postpositions . . . . . . . . . . . . . . . . . 74

4.5 Adjectival Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.6 Adverbial Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.7 Verb Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.7.1 Head of a Sentence and Verb Complex . . . . . . . . . . . . . . . 814.7.2 Verb Complexes in Verb-second and Verb-final Clauses . . . . . . 814.7.3 Ersatzinfinitiv Constructions . . . . . . . . . . . . . . . . . . . . 834.7.4 Infinitives with zu . . . . . . . . . . . . . . . . . . . . . . . . . . 854.7.5 Coherency and Incoherency of Verbal Constructions . . . . . . . 874.7.6 AcI Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . 884.7.7 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.7.8 Particle Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.7.9 Verbs with Predicate . . . . . . . . . . . . . . . . . . . . . . . . . 914.7.10 Modal Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5 Attachment Principles for Phrases 965.1 Attachment to Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.2 Attachment of Ambiguous Complements . . . . . . . . . . . . . . . . . . 965.3 Modifier Attachment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.3.1 Modifier Attachment in the Initial Field . . . . . . . . . . . . . . 995.3.2 Attachment across Punctuation Marks . . . . . . . . . . . . . . . 995.3.3 Ambiguous Modifiers in Isolated Phrases . . . . . . . . . . . . . 100

6 The Annotation of Sentences 1026.1 Sentence Initial Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.1.1 The C-Field in Verb-Final Clauses . . . . . . . . . . . . . . . . . 1026.1.2 The KOORD-Field in all Clause Types . . . . . . . . . . . . . . 1046.1.3 The PARORD-Field in Verb-Second Clauses . . . . . . . . . . . 1056.1.4 Resumptive Constructions: The LV-Field . . . . . . . . . . . . . 105

6.2 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.2.1 W-Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.2.2 Yes - No Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.3 Clauses of Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.4 Relative Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.4.1 Event-modifying Relative Clauses . . . . . . . . . . . . . . . . . . 1116.4.2 Independent Relative Clauses . . . . . . . . . . . . . . . . . . . . 111

6.5 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.5.1 Coordination of Phrases . . . . . . . . . . . . . . . . . . . . . . . 1136.5.2 Asymmetric Coordination . . . . . . . . . . . . . . . . . . . . . . 1146.5.3 Coordinations with Complex Conjunctions . . . . . . . . . . . . 115

6

Page 8: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.5.4 Coordinations with Truncated Words . . . . . . . . . . . . . . . 1166.5.5 Attachment Principles of Coordination within Phrases . . . . . . 1186.5.6 Coordination of Topological Fields . . . . . . . . . . . . . . . . . 1196.5.7 Attachment of Ambiguous Modifiers in Coordination . . . . . . . 1206.5.8 Coordination of Sentences . . . . . . . . . . . . . . . . . . . . . . 1226.5.9 Paratactic Constructions . . . . . . . . . . . . . . . . . . . . . . 1246.5.10 Conjunctions Occurring with Isolated Phrases . . . . . . . . . . . 1246.5.11 Split Coordinations . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.6 Elliptical Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7 The Annotation of Specific Syntactic Phenomena 1307.1 Superlative and Comparative Forms . . . . . . . . . . . . . . . . . . . . 130

7.1.1 Superlative Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 1307.1.2 The Comparative Particles wie and als . . . . . . . . . . . . . . . 130

7.2 Verbal and Adjectival Use of Participles . . . . . . . . . . . . . . . . . . 1337.3 Topicalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347.4 Headlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1357.5 Discourse Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.6 Parentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8 Criteria for the Distinction of Grammatical Functions 1418.1 Subcategorization of Verbs . . . . . . . . . . . . . . . . . . . . . . . . . 1418.2 Subcategorization of PREDs . . . . . . . . . . . . . . . . . . . . . . . . . 1418.3 Distinction of FOPP, OPP, and V-MOD . . . . . . . . . . . . . . . . . . 1428.4 Distinction of MOD, MOD-MOD, and V-MOD . . . . . . . . . . . . . . 1438.5 Distinction of ON, PRED, ON-MOD, and PRED-MOD . . . . . . . . . . 143

9 The TuBa-D/Z Data Formats 1469.1 The NEGRA Export Format . . . . . . . . . . . . . . . . . . . . . . . . . 1469.2 The Penn Treebank Format . . . . . . . . . . . . . . . . . . . . . . . . . 150

9.2.1 The Penn Treebank Format Version 1 . . . . . . . . . . . . . . . 1509.2.2 The Penn Treebank Format Version 2 . . . . . . . . . . . . . . . 153

9.3 The Export-XML Format . . . . . . . . . . . . . . . . . . . . . . . . . . 1559.4 The CoNLL Format (CoNLL-X 2006, 2010, 2011/2012, CoNLL-U v2) . . 157

9.4.1 The CoNLL-X 2006 Format . . . . . . . . . . . . . . . . . . . . . 1579.4.2 The CoNLL 2010 Format . . . . . . . . . . . . . . . . . . . . . . 1589.4.3 The CoNLL 2011/2012 Format . . . . . . . . . . . . . . . . . . . 1599.4.4 The CoNLL-U v2 Format . . . . . . . . . . . . . . . . . . . . . . 160

References 161

Index 164

7

Page 9: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

List of Tables

3.1 Three clause types according to Hohle (1986) . . . . . . . . . . . . . . . . 153.2 Topological fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Levels of annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 The STTS tag set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.5 Morphological feature combinations for lexical elements . . . . . . . . . . 233.6 Values of morphological features . . . . . . . . . . . . . . . . . . . . . . . 243.7 Node labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.8 Edge labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.9 Syntactic-Semantic Node Labels for Named Entities . . . . . . . . . . . . 273.10 Lemmatization rules for POS-tags . . . . . . . . . . . . . . . . . . . . . . 333.11 Lemmatization rules for specific linguistic phenomena . . . . . . . . . . . 37

4.1 Semantic Classes and Subclasses for Named Entities . . . . . . . . . . . . 574.2 Types of es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8

Page 10: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 1

Introduction

The purpose of this report is to describe the design principles and annotation scheme forthe TuBa-D/Z treebank of German. It is intended as a guide for the treebank annotatorsin Tubingen and for theoretical and computational linguists who want to use annotatedtreebank data for their own research. In addition, we hope that this report may beof some use for researchers who want to construct their own treebank for German orfor some other language. We would like to emphasize that the annotation scheme islanguage-specific, and we advise against adopting this scheme without modification forsome other language. However, we do believe that the type of design decisions that arereported here for German will arise for other languages as well. And it is in this sensethat the current report could provide an useful point of reference.

The TuBa-D/Z treebank was developed by the Division of Computational Linguistics(Lehrstuhl Prof. Hinrichs) at the Department of Linguistics (Seminar fur Sprachwis-senschaft – SfS) of the Eberhard Karls Universitat Tubingen, Germany. The guidelinesfocus on the syntactic annotation of written language data taken from the German news-paper ’die tageszeitung’ (taz).

The treebank comprises 104,787 sentences. The newspaper material is taken from thetaz editions from

1989 661 articles from 261 days over 12 months, 34,057 sentences.

1992 632 articles from 4 days over 1 month, 12,245 sentences.

1995 1,107 articles from 6 days over 1 month, 21,391 sentences

1997 238 articles from 154 days over 12 months, 7,497 sentences

1999 1,039 articles from 6 days over 2 months, 22,195 sentences

1998 139 articles from 21 days over 1 month, 7,402 sentences

Total 3,816 articles from 452 days over 29 months from 6 years, 104,787 sentences

9

Page 11: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

The average sentence length is 18.7 words and the total number of tokens is 1,959,474.Release 11 in July 2017 is the final release of the TuBa-D/Z treebank. Information onhow to obtain the data can be found at:http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html

Please consult this website in order to ensure that you are using the most recent andmost complete version of the treebank.

The annotation scheme for the TuBa-D/Z treebank is derived from the verbmobiltreebank for spoken German, developed earlier (1997–2000) by the Division of Compu-tational Linguistics of the SfS (Hinrichs et al. 2000). The annotation scheme for theverbmobil treebank has been summarized in the ’Stylebook for the German Treebankin verbmobil’ (Stegmann et al. 2000). The TuBa-D/Z annotation scheme has beenextended along various dimensions to accommodate the characteristics of written texts.In order to ensure the reusability of the data, the linguistic inventory used in the tree-bank annotation is based on a minimal set of assumptions that are uncontroversial amongmajor syntactic theories. In this sense it is an attempt at theory-neutrality.

The TuBa-D/Z treebank is released in four different data formats : the Negra Exportformat, the Export-XML format, the Penn treebank format (version 1 and 2), and theCoNLL format (2006, 2010, 2011/2012). More information about each data format isgiven in chapter 9.

To the best of our knowledge, the verbmobil treebank for spoken German is stillthe only treebank based on non-genre-specific German speech data. It is released as TuBa-D/S treebank (http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-ds.html). For written texts, TuBa-D/Z is not the only treebank available for German. Twoother (semi-)manually annotated treebanks are currently available, each with their ownannotation scheme: the Negra treebank (http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/) and the TIGER treebank (http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html).

The Tubingen Partially Parsed Corpus of Written German (TuPP-D/Z; http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tuepp-dz.html) is a project closelyrelated to the TuBa-D/Z treebank. It consists of 200 million word tokens of the ScienceCD (Wissenschafts-CD) of ’die tageszeitung’ (taz), including the sentences which areannotated in the TuBa-D/Z treebank. The texts were automatically annotated withclause structure, topological fields, and chunks, in addition to more low level annotationincluding parts of speech and morphological ambiguity classes. The first release of TuBa-D/Z (12/2003) functioned as training corpus.

10

Page 12: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 2

Major Challenges and DesignDecisions

Most syntactic theories consider individual sentences as the primary domain of linguistictheorizing and of syntactic annotation. For written language, the segmentation intosentences is largely unproblematic and coincides with the domain of syntactic analysis.

However, newspaper texts exhibit a number of phenomena that do not lend themselveseasily to a purely sentence-based annotation. These phenomena include: headlines, titles,parentheses, discourse markers, and sentence conjunction by a colon. These cases aredescribed in more detail in sections 3.4.3 to 3.4.5 of this stylebook.

The second main question, which needed to be addressed at the outset of the projectwas the inventory of syntactic categories and grammatical functions to be used for syntac-tic annotation and specification of predicate-argument structure. Here our choices wereguided by two main considerations:

1. Linguistic adequacy and theory-neutrality: For the purposes of reusability ofthe treebank data, the annotation scheme should not reflect a commitment to a particularsyntactic theory. Rather, the inventory of categories should be a reflection of commonassumptions that syntacticians share across different frameworks concerning questions ofconstituenthood, phrase attachment, and grammatical functions. On this note, the anno-tations should be theory-neutral and minimal. This desideratum is of utmost importanceso as to ensure the reusability of the annotated data.

At the same time, the annotation scheme should reflect as much as possible thoseempirical generalizations that syntacticians, especially from a descriptive perspective,have identified as characteristic of the language in question.

2. Balancing the needs of potential users: Since the construction of a treebankis a labor-intensive and costly enterprise, ideally the TuBa-D/Z treebank should appealto as many potential users as possible. Moreover, the treebank should be of interestto researchers of a wide range of different fields. Considering the renewed interest inthe use of corpora for both theoretical and computational linguistics, choicepoints in theannotation scheme should be resolved in such a way that the needs of potential users arebalanced as much as possible.

11

Page 13: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

To support the use of the TuBa-D/Z treebank in computational linguistics, the an-notation scheme should be sensitive to processing considerations, as long as linguisticadequacy of the choice of annotations is not compromised. Ceteris paribus, processingconsiderations favor annotation schemes that pay close attention to properties of syntac-tic surface structure, particularly to word order regularities and distributional propertiesof words and phrases. At the same time, the use of empty categories and data structureswith crossing dependencies among phrases are to be avoided if the annotations are to beused for parsers that rely on the context-freeness of the underlying grammar.

In order to satisfy the above aims, the annotation scheme is surface-oriented andcontext-free. The theoretical assumptions underlying the levels of annotation and thechoice of labels themselves are as much as possible based on a rich tradition of theoreticaland empirical research on German syntax.

For the treatment of word regularities of German, which is a language with relativelyfree word order, an inventory of topological fields is incorporated into the annotationscheme. Topological fields in the sense of Herling (1821), Erdmann (1886), Drach (1937),and Hohle (1986) are widely used in descriptive studies of German syntax. Such fieldsconstitute an intermediate layer of analysis above the level of individual phrases andbelow the clause level. The concept of topological fields favors tree-based annotations, i.e.bracketings that do not rely on crossing or discontinuous dependencies. Instead, such non-linear dependencies are to be expressed at the level of predicate-argument structure whichconstitutes a second level of annotation with its own descriptive inventory of grammaticalfunctions.

The framework of topological fields is widely used in empirical and theoretical accountsof German syntax. Thus, it is in the linguistics literature. This greatly facilitates thoroughtraining of human annotators, since they can rely on the pre-existing body of literature.One purpose of this stylebook is to add to these reference materials.

A total of 25 syntactic node labels for the encoding of constituent structures are beingused. These include labels for topological fields as well as labels for phrases and theirconstituent parts.

In order to capture grammatical functions of individual phrases and syntactic depen-dencies between phrases, constituent structure trees are enriched by a set of edge labelsbetween constituent structure nodes. The inventory of edge labels comprises 42 distinctcategories. In addition to these primary edge labels, four secondary edge labels are used.These labels indicate phrase-internal government of elements in the verb complex, expressphrase-internal modification of noun phrases, resolve long-distance dependencies amongmodifiers, or relate the phrasal complements of so-called third-construction control verbs.

For certain computational applications, robust identification of named entities, e.g.person names, names of companies and institutions, names of geographical locations, isa major concern. Therefore, such named entities are identified by a special node label,and their internal structure is sometimes identified by an additional secondary edge labelthat is used exclusively for named entities.

At the word level, part-of-speech labels are assigned according to the Stuttgart-Tubin-gen tag set, which is widely accepted for part-of-speech tagging for German and whichprovides an inventory of 54 distinct part-of-speech labels. In addition, information oninflectional morphology is given.

Detailed information about the complete inventory of node labels, edge labels, part-

12

Page 14: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

of-speech labels and inflectional feature clusters is given in section 3.4.2 of this stylebook.The remainder of this stylebook is organized as follows: chapter 3 offers an overview

of the theoretical foundations of the annotation scheme, focusing on the concept of topo-logical fields (3.1) and its relation to constituent structure (3.2), on general annotationprinciples (3.3), as well as an overview of the annotation levels and of the inventory of theannotation labels for each level (3.4). In addition, the applied lemmatization rules aredescribed (3.5). Chapter 4 concerns the annotation of the internal structure of phrases,broken down into major word classes and their phrasal projections. Chapter 5 addressesthe principles for relating individual phrases to each other, particularly for modifier andcomplement attachment. Chapter 6 discusses the annotation of entire sentences, focusingon the relationship between sentence types and topological fields, coordination (includingphrasal conjunction) and elliptical constructions. Chapter 7 is devoted to the annotationof miscellaneous syntactic constructions such as comparatives, verbal and adjectival par-ticiples, topicalization, newspaper headlines, discourse markers, and parentheses, whicheach pose special challenges for the annotation tasks. Chapter 8 describes the criteriaused for distinguishing different grammatical functions. Chapter 9 describes the fourdifferent data formats in which the TuBa-D/Z treebank is distributed. The stylebookconcludes with a bibliography and a subject index.

We do not consider the annotation level of anaphora and coreference relations in thisstylebook. Please consult (Naumann and Moller 2007) for a detailed description of thesephenomena.

13

Page 15: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 3

The Theoretical Basis of theAnnotation Scheme

3.1 Topological Fields

The annotation scheme for the TuBa-D/Z treebank has been developed with specialregard to the characteristics of the German language: the interaction of configurationaland non-configurational syntactic properties, which arise from the partially free wordorder. On the one hand, there exist three different clause types with respect to the fixedposition of the finite verb (verb-second (V-2), verb-initial (V-1), and verb-final (V-end)).On the other hand, there is a high degree of variability of complements and adjuncts. Inorder to treat the relatively high degree of word order freedom in German, the treebankadopts the notion of topological fields as the primary clustering principle of a sentence.

The basic characteristics of the model of topological sequences within a German sen-tence were originally formulated by Herling (1821) and Erdmann (1886). Herling (1821)developed an adequate topological theory for complex sentences in which clauses forma topological carrying a syntactic function and he mentioned the special position of thefinite verb in verb-second und verb-final clauses. Erdmann (1886) established the basicsof a theory of topological fields and pointed out that the first position in a clause is notnecessarily the subject position. The so called Herling/Erdmann scheme already coversa set of word order regularities which apply for all three clause types of German. LaterDrach (1937) introduced the notion of field. Finally, Hohle (1986) developed topologicalschemes for the three clause types.

3.1.1 The Concept of Topological Fields

In a German clause, the finite verb can appear in three different positions: verb-second,verb-initial, and verb-final. Only in verb-final clauses the verb complex consisting of thefinite verb and non-finite verbal elements forms a unit. The discontinuous positioningof the verbal elements in verb-first and verb-second clauses is the traditional reason forstructuring German clauses into fields. The positions of the verbal elements form theSatzklammer (sentence bracket) which divides the sentence into a Vorfeld (initial field),a Mittelfeld (middle field), and a Nachfeld (final field). The Vorfeld and the Mittelfeld

14

Page 16: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

are divided by the linke Satzklammer (left sentence bracket), which is the finite verb,the rechte Satzklammer (right sentence bracket) is the verb complex between the Mit-telfeld and the Nachfeld. Thus, the theory of topological fields states the fundamentalregularities of German word order. It is an important basis for the topological analysisof any German sentence, since subclauses and embedded clauses are treated within thebounds of fields. Identical word order regularities within a specific field can be realizedin all three clause types. But the fields themselves differ in their possible elements andgrammatical rules. Therefore, the theory is a descriptive rather than explanatory theoryfor a specific language.

Hohle (1986) denotes the three clause types as E-Satze (verb-final clauses), F1-Satze(verb-initial clauses), and F2-Satze (verb-second clauses). The topological schemes ofthese types are listed in Table 3.1.

Table 3.1: Three clause types according to Hohle (1986)

E-Satze (KOORD) - (C) - X - VK - YF1-Satze (KOORD - (KL) - FINIT - X - VK - YF2-Satze (KOORD or PARORD) - (KL) - K - FINIT - X - VK - Y

Abbreviations and explanations used in Table 3.1:VK: verb complexFINIT: element denoting categories of finitenessKOORD: coordinating particles (e.g. und, oder)PARORD: non-coordinating particles (e.g. denn, weil)X, Y: sequence of any number of constituentsC: complementizerK: one constituentKL: nominativus pendens, resumptive construction (Linksversetzung)

These schemes topologically analyse not only atomic sentences but also complex sen-tence constructions which contain embedded clauses. Such embedded clauses can occur ina Linksversetzung (resumptive construction), Vorfeld, Mittelfeld, or Nachfeld. Herling’stheory of the coordination and embedding of sentences covers these phenomena in detail(Herling 1821).

According to Hohle (1986), we assume the existence of the following topological fields(cf. Table 3.2):

The following description of the topological fields does not claim completeness regard-ing all descriptive details but rather mentions their main characteristics.1

VF: The Vorfeld consists of only one constituent. Usually it is the subject. But becauseof the high degree of non-configurationality in German, the subject can also occur in theMittelfeld, thus allowing almost every other constituent to occupy the Vorfeld.

1In the following, the abbreviations for the fields listed in Table 3.2 are used.

15

Page 17: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Table 3.2: Topological fields

Field DescriptionVF Vorfeld (initial field)LK Linke (Satz-)Klammer (left sentence bracket)MF Mittelfeld (middle field)VC Verbkomplex (verb complex)NF Nachfeld (final field)LV Linksversetzungsfeld (field for resumptive constructions)C C-Feld (field for complementizers, left from MF)KOORD Koordinationsfeld (field for coordinating particles)

left-most element, optionally in all clause types, (e.g. und, oder)PARORD Koordinationsfeld (field for non-coordinating particles)

left-most element, optionally only in verb-second (e.g. denn, weil)

LK: The Linke Klammer is the position of the finite verb in verb-second and verb-firstclauses or a conjunction in verb-final clauses. It consists of exactly one element.

MF: Apart from those s which are optionally located in other fields, any non-verbalconstituent may occur in the Mittelfeld. It consists of a sequence of any number ofconstituents. The linear order of the constituents depends on the specific word orderprinciples for German and their interaction.

VC: The Verbkomplex is a sequence of verb forms. In verb-second and verb-first clausesit consists of one or more non-finite elements or - depending on the verb - of a separableprefix. In verb-final clauses it also contains the finite verb. The rule for the linear orderin general is: right determines left. If there is a finite verb in the verb complex, it isusually the right-most element (exception: Ersatzinfinitiv constructions (daß er sich einneues Konzept wird uberlegen mussen) (cf. 4.7.3).

NF: For some clause types (e.g. so daß-Satze), the Nachfeld is the obligatory position.Embedded complement clauses, relative clauses, and single constituents can optionallyoccur in the Nachfeld. In contrast to the Vorfeld it may be occupied by any number ofconstituents.

LV: The Linksversetzungsfeld is a field for the left-dislocated phrase of resumptiveconstructions. A Linksversetzung is a pendent constituent. It can be regarded as asyntactic anticipation of a part of a sentence (cf. 6.1.4). There are many restrictionswhich apply for this position.

C: The C-Feld only occurs in verb-final clauses. It is obligatorily occupied in finiteverb-final clauses if there is no conjunction in the Linke Klammer. In non-finite verb-final clauses the C-position may be empty. This field can be occupied by conjunctions

16

Page 18: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

of sentential objects (e.g. daß, ob) or sentence initial conjunctions like um, obwohl, wennand also by complex interrogative or relative phrases, e.g. ..., ’um wieviel Geld’ geht esdabei? / ..., ’an der’ Max Daniel Professor fur Klavier ist. (cf. 6.1.1).

KOORD: The KOORD-field is the field for coordinating particles. In contrast to thePARORD-field, it can optionally occur as the left-most element of all clause types (cf.6.1.2).

PARORD: The PARORD-field is the field for non-coordinating particles which op-tionally occur as the left-most element of a verb-second clause (cf. 6.1.3).

Concerning the distribution of constituents to topological fields see also the chapterDeskriptive Generalisierungen in Grewendorf (1991).

The combination of these fields in order to constitute verb-first, verb-second, or verb-final clauses is described in Hohle (1986).

The topological model, which is the basis of most traditional German grammars,only provides descriptive parameters concerning the sentence structure without makingany statement about the regularities within the fields and the hierarchical constituentstructure of the sentence. For more complicated phenomena, it offers only a catalogue ofdetailed descriptions.

3.2 Constituent Analysis and Topological Fields

The main weakness of the concept of topological fields is the above-mentioned fact thatthe hierarchical constituent structure of a sentence cannot be described. The aim is tofind a form of representation which combines the topological model with a constituentanalysis in order to describe the hierarchy of the linguistic s within the fields. In ourannotation scheme, the integration of a constituent analysis was achieved by a secondlevel of annotation strictly within the bounds of topological fields: a predicate-argumentstructure with its own descriptive inventory of syntactic categories and grammatical func-tions. The constituent structure is represented by phrase structure trees (phrase markers)whose node and edge labels carry this information.

In order to analyse syntactic constructions, it is necessary to define the number andtypes of constituents within the fields.

1. Number of constituents within the fields:In general, C, LK, KOORD, PARORD, and VF contain only one constituent.More than one constituent is allowed within MF and NF.

2. Types of constituents within the fields:Phrasal constituents occur in VF, MF, NF and C (interrogative or relative phrases).Embedded clauses either belong to NF, VF, LV, or in some cases to MF. Usually,outside the spoken language context, verb-final clauses do not occur isolated. Theyneed to be attached if possible.

17

Page 19: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

3.3 General Annotation Principles

Our annotation scheme tries to find a trade-off between pragmatic requirements on theone hand and linguistic reality on the other hand. The following three common annotationprinciples are adopted to group the constituents within a syntactic tree: the flat clusteringprinciple, the longest match principle, and the high attachment principle.

3.3.1 Flat Clustering Principle

The flat clustering principle keeps the number of hierarchy levels in a syntactic structureas small as possible. As a consequence, any degree of branching is allowed. Constituentswhich cannot be assigned a grammatical function within a syntactic construction arestructured as much as possible, but are not typically connected to surrounding con-stituents as a whole.

3.3.2 Longest Match Principle

The longest match principle demands that as many daughter nodes as possible are com-bined into a single mother node, provided that the resulting construction is syntacticallyas well as semantically well-formed.

3.3.3 High Attachment Principle

The high attachment principle prescribes that syntactically and semantically ambiguousmodifiers are attached to the highest possible level in a tree structure. Premodifiers andpostmodifiers are treated in a different way. First, both kinds of modifiers are projectedto their phrase level. Since the modification scope of premodifiers is unambiguous, theyare directly attached to the head of the phrase which they are modifying. By contrast,postmodifiers are always attached on a higher level to preserve ambiguity. This decisionwas taken to avoid the problematic distinction whether a postmodifier is a free adjunctor a complement of the modified phrase.

3.4 The Structure of an Annotated Tree

3.4.1 The Levels of Annotation

A syntactic tree consists of nodes and edges. Nodes represent constituents on differentlevels of annotation. Edges always link daughter nodes to a mother node. The root nodeof a tree is assumed as the sentence node of a construction. One level below the sentencenode, the nodes of the topological fields are located. This is the reason why topologicalfields can be regarded as the top-level ordering principle for sentences in the treebank.The sequence of the fields in the three clause types never violates the topological schemesgiven by Hohle (1986). Within each sentence structure, in general at least two topologicalfields are occupied (exception: infinitive constructions, (cf. 4.7.4). Others may be left

18

Page 20: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

empty (elliptical constructions, cf. 6.6). Table 3.3 lists the four levels of annotationwhich we distinguish within the structure of an annotated syntactic tree2:

Table 3.3: Levels of annotation

Level Inventoryclause level root node labels for different types of clausesfield level node labels for topological fields

(including labels for conjuncts of fields)phrase level node labels for syntactic categories

(including syntactic-semantic node labels for named entities)and edge labels for grammatical functions

lexical level lexical entries tagged with the part-of-speech (POS-)tags taken fromthe STTS tag set (Schiller et al. 1995) and with morphological features(Trushkina 2004, Versley et al. 2010) and lemmata (Versley et al. 2010)

Node labels denote the syntactic category of a phrase or sentence, a topological field,or a grammatical property. Edge labels denote the grammatical function of lexical entries,phrases, topological fields, and clauses.

3.4.2 The Inventory of Labels

The part-of-speech tags used for the annotation are taken from the Stuttgart-Tubingentag set (STTS) (Schiller et al. 1995).3 The STTS is a guideline for the annotation ofGerman text corpora on the lexical level. Every single part-of-speech of a text is assignedone specific tag. The tag set consists of the tags listed in Table 3.4 (cf. (Schiller et al.1995)). The tagging of the data was performed by the tnt tagger (Brants 1998) andmanually corrected with the Annotate tool (Plaehn 1998).

The morphological tags give information about inflectional morphology and includefeatures such as case, number, person, etc. A specific combination of feature-value pairs isdefined for each relevant part-of-speech category, see Table 3.5 for the list of part-of-speechcategories that are annotated with morphological features and the corresponding featurecombinations. The values are represented in a cluster by single character abbreviations,see Table 3.6 for the set of features and their values. Features can uniquely be identifiedby their position in the cluster.

Node labels indicate the syntactic category of a phrase or sentence, but they are alsoused to label topological fields and sequences of topological fields within coordinations orto indicate specific grammatical properties of constituents. Table 3.7 lists all node labelswhich are used in the treebank. (An additional node is introduced for named entities, seeTable 3.9)

2We do not consider the suprasentential annotation level of anaphora and coreference relations in thisstylebook. Please consult (Naumann and Moller 2007) for a detailed description of these phenomena.

3PAV was changed into a new tag called PROP (pronominal form of a prepositional phrase) in orderto justify PX as the syntactic category of its mother.

19

Page 21: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Edge labels indicate the grammatical function of lexical entries, phrases, topologicalfields, and clauses. Since case information is given and a distinction of different modifiersis made by these labels, the syntactic tree structures also contain semantic roles. Thespecific set of edge labels for the German treebank is listed in Table 3.8, including sec-ondary edge labels. The latter ones are used to resolve ambiguities on a different levelof description.

Two specific edge labels denote whether a constituent has the function of a head(HD), e.g. a phrase (NX, PX, ADJX, ADVX, VXFIN, VXINF), or a non-head (-), e.g. adeterminer or a modifier attached to a phrase. On any annotation level, there is at mostone head. Within phrases, these two labels indicate the internal dependency structureof the phrase. The head of a sentence structure (e.g. SIMPX) is always the finite verb.In coordinations, each conjunct depends on the head of the whole construction and isdenoted with a specific edge label (KONJ) in order to distinguish them from conjunctionsand modifying elements within a coordination (see 6.5.1 and 6.5.3). Edge labels belowall root node labels carry only non-head labels (cf. (Kubler and Telljohann 2002)).

In an enhanced version of the TuBa-D/Z treebank, each named entity is assigned oneof the following semantic classes: person (PER), organisation (ORG), location (LOC),geopolitical entity (GPE), or other (OTH). The semantic class OTH comprises all re-maining named entities not fitting into PER, ORG, LOC, or GPE (cf. 4.2.6).

In order to annotate these semantic classes, syntactic-semantic node labels ofthe pattern syntactic category = semantic class are defined as the mother node of namedentities (see Table 3.9). These syntactic-semantic nodes indicate that the structure belowrepresents a (complex) named entity of a certain syntactic category belonging to one ofthe five semantic classes (e.g. Ute Wedemeier (NX=PER), The Jim Wane Swingtett(NX=ORG), Sogestraße (NX=LOC), Auf die sturmische Art (PX=OTH) (cf. 4.2.6).

The former node label ’EN-ADD’ and the secondary edge label ’EN’ are deleted.

The internal syntactic structure of named entities is governed by the general annota-tion rules. All parts below a syntactic-semantic node that do not belong to the namedentity itself are marked as ’-NE’, e.g. [[die (-NE)] AWO] (NX=ORG), [[Der (-NE)] zweiteWeltkrieg] (NX=OTH).

20

Page 22: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Table 3.4: The STTS tag set

POS = description examples

ADJA attributive adjective [das] große [Haus]ADJD adverbial or predicative adjective [er fahrt] schnell, [er ist] schnellADV adverb schon, bald, dochAPPR preposition; left circumposition in [der Stadt], ohne [mich]APPRART preposition + article im [Haus], zur [Sache]APPO postposition [ihm] zufolge, [der Sache] wegenAPZR right circumposition [von jetzt] anART definite or indefinite article der, die, das, ein, eineCARD cardinal number zwei [Manner], [im Jahre] 1994FM foreign language material [Er hat das mit “]

A big fish [” ubersetzt]ITJ interjection mhm, ach, tjaKOUI subordinating conjunction um [zu leben], anstatt [zu fragen]

with zu + infinitiveKOUS subordinating conjunction weil, daß, damit, wenn, ob

with clauseKON coordinative conjunction und, oder, aberKOKOM particle of comparison, no clause als, wieNN noun Tisch, Herr, [das] ReisenNE proper noun Hans, Hamburg, HSVPDS substituting demonstrative dieser, jener

pronounPDAT attributive demonstrative jener [Mensch]

pronounPIS substituting indefinite pronoun keiner, viele, man, niemandPIAT attributive indefinite kein [Mensch], irgendein [Glas]

pronoun without determinerPIDAT attributive indefinite [ein] wenig [Wasser],

pronoun with determiner [die] beiden [Bruder]PPER irreflexive personal pronoun ich, er, ihm, mich, dirPPOSS substituting possessive pronoun meins, deinerPPOSAT attributive possessive pronoun mein [Buch], deine [Mutter]PRELS substituting relative pronoun [der Hund,] derPRELAT attributive relative pronoun [der Mann ,] dessen [Hund]PRF reflexive personal pronoun sich, einander, dich, mirPWS substituting interrogative pronoun wer, wasPWAT attributive interrogative pronoun welche [Farbe], wessen [Hut]PWAV adverbial interrogative warum, wo, wann, woruber, wobei

or relative pronounPROP pronominal adverb dafur, dabei, deswegen, trotzdem

21

Page 23: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

POS = description examples

PTKZU zu + infinitive zu [gehen]PTKNEG negation particle nichtPTKVZ separated verb particle [er kommt] an, [er fahrt] radPTKANT answer particle ja, nein, danke, bittePTKA particle with adjective or adverb am [schonsten], zu [schnell]TRUNC truncated word - first part An– [und Abreise]VVFIN finite main verb [du] gehst, [wir] kommen [an]VVIMP imperative, main verb komm [!]VVINF infinitive, main gehen, ankommenVVIZU infinitive + zu, main anzukommen, loszulassenVVPP past participle, main gegangen, angekommenVAFIN finite verb, aux [du] bist, [wir] werdenVAIMP imperative, aux sei [ruhig !]VAINF infinitive, aux werden, seinVAPP past participle, aux gewesenVMFIN finite verb, modal durfenVMINF infinitive, modal wollenVMPP past participle, modal [er hat] gekonntXY non-word containing D2XW3, letters

special characters$, comma ,$. sentence-final punctuation . ? ! ; :$( other sentence internal punctuation - [ ] ( )

The following POS categories do not contain any morphological information and areassigned the morphological label ”- -”: ADJD, ADV, APZR, CARD, FM, ITJ, KOUI,KOUS, KON, KOKOM, PWAV, PROP, PTKZU, PTKNEG, PTKVZ, PTKANT, PTKA,TRUNC, VVIZU, VVPP, VAPP, VMPP, XY, $, , $. , $( .

3.4.3 What Is a Syntactic Unit?

The newspaper articles of the taz have been defined as the primary segmentation domainof the data. They are preprocessed into syntactic units delimited by punctuation marks(. ? ! ; - ... /) for which specific rules demand or forbid segmentation. Each syntacticunit is assigned a specific code which identifies its origin in the newspaper data, e.g.T990507.123 (T (taz) 99 (year) 05 (month) 07 (day) 123 (article)).

A syntactic unit usually consists of one complete sentence structure with a root node(SIMPX, R-SIMPX, P-SIMPX). But it may also consist of one or more sentences and/orphrases, e.g. headlines, titles, sentences with parentheses, sentences with discourse mark-ers, or sentence conjunction by a colon.

An annotated tree is a complete syntactically and semantically well-formed construc-tion according to the longest match principle. The model of topological fields does notprescribe that all fields have to be occupied. The fact that fields can be left empty, alsohelps us to cope with elliptical constructions (cf. 6.6).

22

Page 24: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Table 3.5: Morphological feature combinations for lexical elements

POS feature combination comments

ADJA case number gender underspecified for gender if the plural nounis underspecified, i.e. the plural noun doesnot morphologically represent its gender, e.g.deadjectival nouns: die/np* nordhessis-chen/np* Grunen/np*

invariant local description e.g. Berliner/***

cardinal numbers as abbreviation: full mor-phology e.g. im 4./dsn Jahrhundert/dsn

APPR case without case if a preposition takes anotherPP as complement, e.g. bis/ zu/d einer/dsfWoche/dsf and in the construction was furein(er/e/...)

APPRART case number genderAPPO caseART case number genderNN case number gender can be underspecified for gender, e.g. deadjec-

tival nouns (Abgeordnete (in plural)) or plu-ralia tantum (Leute)

NE case number genderPDS case number genderPDAT case number genderPIS case number gender underspecified: man/ns*

nichts/*** (cf. nix, sowas)

PIS or PIAT: allerhand/*** (cf. allerlei, al-lzuviel, dergleichen, derlei, etwas, genausoviel,genug, genugend, keinerlei, mehr, reichlich,soviel, viel, wenig, weniger, zuviel, zuwenig)

PIDAT or PIS: sowas/*** (cf. paar, bißchen)

23

Page 25: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

POS feature combination comments

PIAT case number gender plural is underspecified for gender, e.g.lauter/***, see also ’PIS or PIAT’ below

PIDAT case number gender solch/*** (cf. manch, welch, all), see also ’PISor PIDAT’ below

PPER case number genderperson

PPOSS case number genderPPOSAT case number genderPRELS case number gender plural is underspecified for genderPRELAT case number genderPRF case number gender

personsich: underspecified for gender

PWS case number gender underspecified for gender: plural forms andwer, wem, wen

PWAT case number gender wessen/***VAFIN person number mood

tenseVAIMP numberVMFIN person number mood

tenseVVFIN person number mood

tenseVVIMP number German has only second person imperative

forms

Table 3.6: Values of morphological features

Feature Valuescase n (nominative), g (genitive), d (dative), a (accusative), * (underspecified)gender m (masculine), f (feminine), n (neuter), * (underspecified)number s (singular), p (plural), * (underspecified)mood i (indicative), k (subjunctive; German ’Konjunktiv’)person 1 (first), 2 (second), 3 (third), * (underspecified)tense s (present), t (past), * (underspecified)

24

Page 26: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Table 3.7: Node labels

Node Labels DescriptionPhrase Node Labels

ADJX adjectival phraseADVX adverbial phraseDP determiner phrase (e.g. gar keine)FX foreign language phraseNX noun phrasePX prepositional phraseVXFIN finite verb phraseVXINF non-finite verb phrase

Topological Field Node LabelsLV resumptive construction (Linksversetzung)C complementizer field (C-Feld)FKOORD coordination consisting of conjuncts of fieldsKOORD field for coordinating particlesLK left sentence bracket (Linke (Satz-)Klammer)MF middle field (Mittelfeld)MFE middle field between VCE and VCNF final field (Nachfeld)PARORD field for non-coordinating particlesVC verb complex (Verbkomplex)VCE verb complex with the split finite verb

of Ersatzinfinitiv constructionsVF initial field (Vorfeld)FKONJ conjunct consisting of more than one field

Root Node LabelsDM discourse markerP-SIMPX paratactic construction of simplex clausesR-SIMPX relative clauseSIMPX simplex clause

25

Page 27: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Table 3.8: Edge labels

Edge Labels DescriptionEdge Labels denoting Heads and Conjuncts

HD head- non-headKONJ conjunct

Complement Edge LabelsON nominative object (i.e. subject; also clausal subjects)OD dative objectOA accusative objectOG genitive objectOS sentential objectOPP prepositional objectOADVP adverbial objectOADJP adjectival objectPRED predicateOV verbal objectFOPP facultative (i.e. optional) prepositional object,

passivized subject (von-phrase)VPT separable verb prefixAPP apposition

Modifier Edge LabelsMOD ambiguous modifierON-MOD, OA-MOD, OD-MOD, modifiers modifying complements or modifiers,OG-MOD, OS-MOD, OPP-MOD, e.g. V-MOD = modifier of the verbFOPP-MOD, PRED-MOD,OADJP-MO, OADVP-MO,V-MOD, MOD-MOD

Edge Labels in Split CoordinationsONK, OAK, ODK, OGK, second conjunct (K) inOPPK, FOPPK, PREDK, split coordinationsOSK, OADVPK, OA-MODK, e.g. ONK = second conjunctMODK, V-MODK of a nominative object

Edge Label denoting Structural ExpletiveES Vorfeld-es

Secondary Edge Labelsdependency relation between:

refvc two verbal objects in VCrefmod two ambiguous modifiersrefint a phrase internal part and its modifierrefcontr control verb and its complement

across clause boundaries

26

Page 28: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Table 3.9: Syntactic-Semantic Node Labels for Named Entities

Labels DescriptionSyntactic-Semantic Node Labels

ADJX=ORG adjectival phrase, named entity of the semantic class “organisation”ADJX=OTH adjectival phrase, named entity of the semantic class “other”ADVX=ORG adverbial phrase, named entity of the semantic class “organisation”ADVX=OTH adverbial phrase, named entity of the semantic class “other”DM=OTH discourse marker, named entity of the semantic class “other”FX=LOC foreign language phrase, named entity of the semantic class “location”FX=ORG foreign language phrase, named entity of the semantic class “organisation”FX=OTH foreign language phrase, named entity of the semantic class “other”FX=PER foreign language phrase, named entity of the semantic class “person”NX=GPE noun phrase, named entity of the semantic class “geopolitical entity”NX=LOC noun phrase, named entity of the semantic class “location”NX=ORG noun phrase, named entity of the semantic class “organisation”NX=OTH noun phrase, named entity of the semantic class “other”NX=PER noun phrase, named entity of the semantic class “person”PX=GPE prepositional phrase, named entity of the semantic class “geopolitical entity”PX=LOC prepositional phrase, named entity of the semantic class “location”PX=ORG prepositional phrase, named entity of the semantic class “organisation”PX=OTH prepositional phrase, named entity of the semantic class “other”PX=PER prepositional phrase, named entity of the semantic class “person”SIMPX=ORG simplex clause, named entity of the semantic class “organisation”SIMPX=OTH simplex clause, named entity of the semantic class “other”VXINF=ORG non-finite verb phrase, named entity of the semantic class “organisation”VXINF=OTH non-finite verb phrase, named entity of the semantic class “other”

Edge Label-NE non-head, the part below is not part of the named entity

27

Page 29: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Punctuation is not annotated, i.e., all punctuation marks are not attached to the treestructure. Exceptions are punctuation marks which carry a semantic meaning within asentence, e.g. - (bis, und) in expressions like 15.30 - 17.30 Uhr. They are tagged accordingto the part of speech that they represent in the text (cf. 4.4.1).

Constituents are not attached to a tree if they are not assigned a grammatical func-tion within the specific syntactic construction. The following tree diagram shows twoannotated trees in one syntactic unit:4

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505

506 507 508 509

510

511

An

APPR

d

der

ART

dsf

Oder

NE

dsf

wurde

VAFIN

3sit

er

PPER

nsm3

dann

ADV

−−

verwundet

VVPP

−−

,

$,

−−

ein

ART

nsm

Wadendurchschuß

NN

nsm

.

$.

−−

− HD HD HD HD HD − HD

NX

NX

HD

VXFIN

HD

NX

ON

ADVX

MOD

VXINF

OV

PX

V−MOD

VF

LK

MF

VC

SIMPX

The leaves of the trees consist of pairs of non-terminal symbols and part-of-speechtags. Non-terminal symbols are represented by spherical nodes, whereas edge labels aredepicted by rectangular nodes. The tree diagram consists of two trees, a SIMPX andan isolated phrase. In accordance with the four annotation levels shown in Table 3.3,the sentence is annotated top-down by the root node (SIMPX), the field nodes (VF, LK,MF, and VC), the phrase nodes (PX, VXFIN, NX, ADVX, and VXINF), and finallythe tagged lexical entries. The edge labels between the field level and the phrase levelindicate that the syntactic structure contains one unambiguous modifier (V-MOD), asubject (ON), one ambiguous modifier (MOD), a verbal object (OV), and the finite verb,which itself is the head (HD) of the entire syntactic construction. The noun phrase (einWadendurchschuß) is not attached to the sentence structure because otherwise the well-formedness of the construction would be violated. Thus, it has to be annotated as anisolated phrase lacking a verbal constituent.

3.4.4 Printing and Spelling Errors

In contrast to spoken language data like in the Verbmobil (cf. (Stegmann et al. 2000))which exhibit fragmentary utterances, false starts, repetitions, interruptions, and hesita-tion noises as its characteristic properties, data taken from newspaper corpora does notinclude unintentionally formed syntactic constructions.

Deviations from syntactic wellformedness are either intended by the author or arecaused by printing errors. While incorrect writing of words is neglected in the syntactic

4These tree diagrams and all following tree diagrams in this report were generated with the aid of theNegra Annotate tool.

28

Page 30: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

analysis (the respective lexical entry is marked with the correct writing of the word in acomment line below), lexical elements which do not belong to the syntactic construction(intentional or unintentional) are structured as much as possible, but are not attached tothe surrounding constituents:

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505

506 507 508 509

510

511

Jetz

ADV

−−

Jetzt

wollen

VMFIN

3pis

Sie

PPER

np*3

wieder

ADV

−−

ein

ART

asn

solches

PIDAT

asn

System

NN

asn

aufbauen

VVINF

−−

.

$.

−−

HD HD HD HD HD HD

ADVX

MOD

VXFIN

HD −

ADJX

− HD

VXINF

OV

NX

ON

ADVX

MOD

NX

OA

VF

LK

MF

VC

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507

508 509 510 511 512 513 514

515 516

517 518

Am

APPRART

dsm

Abend

NN

dsm

erklärten

VVFIN

3pit

,

$,

−−

sie

PPER

np*3

seien

VAFIN

3pks

dabei

PROP

−−

geschlagen

VVPP

−−

worden

VAPP

−−

$(

−−

von

APPR

d

der

ART

dsf

Polizei

NN

dsf

HD HD HD HD HD HD HD − HD

NX

HD

VXFIN

HD

NX

ON

VXFIN

HD

PX

V−MOD

VXINF

OV

VXINF

HD −

NX

HD

PX

V−MOD

PX

FOPP

VF

LK

SIMPX

VF

LK

MF

VC

NF

SIMPX

3.4.5 Isolated Phrases

There are textual fragments in newspaper data which cannot be analysed as a SIMPXor as a constituent of a SIMPX because they are lacking a verbal constituent or theyare not assigned a specific grammatical function within a well-formed sentence. Thesefragments are annotated as isolated phrases. The isolated elements are structured asmuch as possible (mostly up to the level of phrasal categories), but they are not typicallyconnected to surrounding constituents as a whole, so that a conflict with the topologicalfield analysis is avoided. Their root node carries a phrasal category of their lexical head(NX, PX, ADVX, etc.):

29

Page 31: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500 501 502

503

Warum

PWAV

−−

auch

ADV

−−

nicht

PTKNEG

−−

?

$.

−−

HD HD HD

PX

ADVX

ADVX

HD

ADVX

0 1 2 3

500 501

502

503

Hoffentlich

ADV

−−

ohne

APPR

a

Nebenwirkungen

NN

apf

.

$.

−−

HD HD

NX

HD

ADVX

PX

HD

PX

In accordance with the longest match principle, as many parts of the fragment aspossible are projected to the phrase level and are included into a tree structure. It hasto be decided which part of the whole construction is the head and which parts dependon this head.

Phrases within a syntactic unit are not attached on a higher level if they do not showdependency relation. This is often the case with syntactic elements which are separatedby a colon or a dash (cf. 5.3.2):

0 1 2 3 4 5 6 7

500 501 502 503 504

505 506 507

508 509

ASB

NN

nsm

lädt

VVFIN

3sis

ein

PTKVZ

−−

:

$.

−−

Tag

NN

nsm

der

ART

gsf

offenen

ADJA

gsf

Tür

NN

gsf

HD HD VPT HD HD

NX

ON

VXFIN

HD −

ADJX

− HD

VF

LK

VC

SIMPX

NX

HD

NX

NX

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

500 501 502 503 504 505 506 507

508

Arlington

NE

nsn

Road

NE

nsf

USA

NE

npm

1999

CARD

−−

,

$,

−−

R

NN

nsf

:

$.

−−

Mark

NE

nsm

Pellington

NE

nsm

,

$,

−−

D

NN

npm

:

$.

−−

Jeff

NE

nsm

Bridges

NE

nsm

,

$,

−−

Tim

NE

nsm

Robbins

NE

nsm

− −

NX

HD

NX

HD

NX

HD

NX

− −

NX

HD

NX

− − − −

NX

KONJ

NX

KONJ

NX

30

Page 32: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505 506

507 508 509

510 511

512

Berlin

NE

nsn

(

$(

−−

taz

NE

nsf

)

$(

−−

$(

−−

So

ADV

−−

also

ADV

−−

wird

VAFIN

3sis

man

PIS

ns*

zum

APPRART

dsm

Problemfall

NN

dsm

.

$.

−−

HD

NX

HD

NX

HD HD HD HD HD

ADVX

HD

ADVX

VXFIN

HD −

NX

HD

ADVX

V−MOD

NX

ON

PX

PRED

VF

LK

MF

SIMPX

3.4.6 Long-Distance Dependencies

Our annotation scheme facilitates a surface-oriented representation of long-distance de-pendencies without crossing branches and traces. If a modifying constituent is not adja-cent to the modified constituent, their dependency relation, which can even go beyond theborder of topological fields, is encoded by special naming conventions for edge labels. Weuse edge labels such as OA-MOD (referring to OA) or PRED-MOD (referring to PRED)etc. expressing the non-ambiguity of the modifier.

Beyond this, we make use of secondary edge labels for ambiguity resolution. These la-bels just serve as additional information to the grammatical functions encoded in the edgelabels. These secondary edge labels indicate underspecified long distance dependenciesin the following cases:

1. If the above mentioned edge labels need further disambiguation, e.g. if there aretwo OAs or V-MODs below one SIMPX node (refmod).

2. If the dependency relation exists between two nodes of which at least one is phraseinternal and therefore carries only head or non-head information (refint).

3. If there is a dependency relation outside of SIMPX in control verb constructions(refcontr).

506

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505 506

507 508 509 510 511

512

Die

PDS

np*

werden

VAFIN

3pis

dort

ADV

−−

künftig

ADJD

−−

seliger

ADJD

−−

schlummern

VVINF

−−

denn

KOKOM

−−

je

ADV

−−

.

$.

−−

HD HD HD HD HD HD − HD

NX

ON

VXFIN

HD

ADVX

V−MOD

ADJX

MOD

ADJX

V−MOD

VXINF

OV

ADVX

MOD−MOD

VF

LK

MF

VC

NF

SIMPX

refmod

31

Page 33: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

512

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505

506 507 508 509 510

511 512

513 514

515

Dieser

PDS

nsm

hat

VAFIN

3sis

Auswirkungen

NN

apf

auf

APPR

a

die

ART

asf

Bereitschaft

NN

asf

,

$,

−−

Therapieangebote

NN

apn

anzunehmen

VVIZU

−−

.

$.

−−

HD HD HD − HD HD HD

NX

ON

VXFIN

HD −

NX

HD

NX

OA

VXINF

HD

NX

HD

PX

MF

VC

NX

OA

SIMPX

MOD

VF

LK

MF

NF

SIMPX

refint

500

0 1 2 3 4 5 6 7 8

500 501 502 503 504

505 506 507 508 509

510

511

512

All

PIDAT

***

das

PDS

asn

versuche

VVFIN

3sks

man

PIS

ns*

den

ART

dp*

Angehörigen

NN

dp*

zu

PTKZU

−−

schicken

VVINF

−−

.

$.

−−

− HD HD HD − HD HD −

NX

OA

VXFIN

HD

NX

ON

NX

OD

VXINF

HD

MF

VC

SIMPX

OS

VF

LK

MF

NF

SIMPX

refcontr

3.4.7 Empty Categories

In general, an empty category analysis, e.g. for phrases without heads, is being avoidedin the TuBa-D/Z treebank.

Empty Edge Labels

Specifiers, prepositions,5 complementizers, discourse markers, KOORD and PARORDconstituents, conjunctions, and unambiguous modifiers (that are attached to phrases im-mediately rather than to topological fields ) are not labelled with grammatical functions.Furthermore, the edges below the SIMPX node are empty. They are not labelled in orderto speed up annotation where the information is unnecessary or self-evident.

Furthermore, empty edge labels are used in elliptical phrases, e.g. noun phrases onlyconsisting of an article and an attributive adjective (cf. 6.6).

5In order to facilitate the identification of dependencies between verbs and their nominal complementsand adjuncts and in keeping with basic assumptions in Dependency Grammar, the annotated head of aprepositional phrase is the NX (or complement) rather than the preposition itself. Therefore, prepositionscarry no edge label.

32

Page 34: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

3.5 Lemma Information

The trees in the TuBa-D/Z are enriched with lemma information for all tokens. Morphol-ogy and lemmatization are performed by an automatic pre-tagging, which makes use ofthe existing syntactic annotation of the treebank. The output of this pre-tagging is man-ually disambiguated and corrected. For a detailed description of the pre-tagging systemsee Versley et al. (2010); for an overview of lemmatization problems see Schnorr (1991).

3.5.1 Lemmatization Rules for POS-Tags

In the following Table 3.10, the lemmatization rules applied for open-class words (e.g.nouns, adjectives) and closed-class words (e.g. determiners, pronouns) in the TuBa-D/Zare descibed with respect to the STTS POS-tag of the token.

Table 3.10: Lemmatization rules for POS-tags

POS-tag lemmatization rule examples

ADJA base form: (der) hohe (Anteil) → hochADJD mapping to the (das ist) gut → gut

predicative formexceptions: besondere (Sorgfalt) → besonder

andere (Menschen) → ander

comparative: mappingto the comparative formforattributive adjective bessere (Chancen) → besseradverbial adjective (es dauert) langer → langerpredicative adjective (es sei) besser → besser

superlative: stem with-out ending forattributive adjective (der) schnellste (Schwimmer)→ schnellst

deverbal adjective: gespannt, zerstritten, brennendmapping to thepredicative form

ADV invariant form schon, bald, dochAPPR invariant form inAPPO zufolgeAPZR anAPPRART reduced to preposition im → in

zur → zu

33

Page 35: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

POS-tag lemmatization rule examples

ART base form: nom/sgdefinite article (sg/pl): masc.: der, des, dem, den, die → derlemmata: der, die, das fem.: die, der, den → die

neut.: das, des, dem, die, der, den → das

indefinite article (sg): masc.: ein, eines, einem, einen → einlemmata: ein, eine fem.: eine, einer → eineplural: zero article neut.: ein, eines, einem → ein

CARD invariant form zwei, 2, 10.000ITJ invariant form hallo, aha, heyKOUI invariant form umKOUS invariant form weilKON invariant form undKOKOM invariant form als, wieNE base form: nom/sg Hans → Hans

Bremerhavens → BremerhavenNN base form: nom/sg Schranke → Schrank

Ideen → Ideegender remains Lehrerin → Lehrerinunchanged Kaufmann → Kaufmann

deadjectival nouns: masc.: (der) Schone → Schonerlemmatized to the form fem.: (die) Schone → Schoneof the strong declension neutr.: (das) Schone → Schonesof adjectives in German

deverbal nouns: (das) Reisen → Reisen

plural nouns:base form nom/sg if a Daten → Datumsingular form exists Medien → Medium

base form nom/pl if a Leuten → Leutesingular form does notexist (pluralia tantum)

homonyms and Schlosser → Schloßpolysemes keep Flugeln → Flugeltheir base form

compounds are EU-Kommissar → EU-Kommissarnot split Senioren-Bahncard→ Senioren-Bahncard

34

Page 36: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

POS-tag lemmatization rule examples

PDS base form: nom/sg masc.: dieser/dieses/diesem/diesenPDAT one lemma each for → dieser

masc., fem., neut. fem.: jene/jener → jeneneut.: das/dem/den → das

PIS base form: nom/sg masc.: keiner/keinen/keinem → keinerPIAT one lemma each for fem.: letztere/letzteren → letzterePIDAT masc., fem., neut. neutr.: jedes/jedem → jedes

or one general lemma beiden → beideallen → alleman → man

PPER base form: nom/sg ich/meiner/mir/mich → ichdu/deiner/dir/dich → duer/seiner/ihm/ihn → ersie/ihrer/ihr/sie → siees/seiner/ihm/es → eswir/unser/uns/uns → wirihr/euer/euch/euch → ihrsie/ihrer/ihnen/sie → sie

polite form Sie/Ihrer/Ihnen/Sie → SiePPOSS base form: nom/sg masc.: meiner/meiner/meinem/meinen

lemma according to the → meinergender of the possession fem.: meine/meiner/meiner/meine

→ meineneutr.: mein(e)s/meine(e)s/meinem/mein(e)s → mein(e)s

PPOSAT base form: nom/sg masc.: mein/meines/meinem/meinenlemma according to the → meingender of the possession fem.: meine/meiner/meiner/meine

→ meineneutr.: mein/meines/meinem/mein→ mein

PRELS base form: nom/sg der, dessen, den, dem → derdie, derer, der, die → diedas, dessen, dem, das → das

PRELAT base form: nom/sg masc./neut.: dessen → dessenfem.: deren → deren

PRF reflexive pronouns are mir/mich → #refllabled as #refl dir/dich → #refl

sich → #refluns → #refleuch → #refl

35

Page 37: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

POS-tag lemmatization rule examples

PWS base form: nom/sg wer, wessen, wem, wen → werPWAT masc.: welcher, welchen, welchem

welchen → welcherfem.: welche, welcher, welcher, welche→ welcheneut.: welches, welchen, welchem,welches → welches

invariant form wasPWAV invariant form wo, wie, warum, womit, woraufPROP invariant form damit, davor, seitdem, stattdessenPTKA invariant form amPTKANT jaPTKNEG nichtPTKZU zuPTKVZ no lemma ein → - -

for verb particles,the lemma of the verbis represented asparticle#verb warf → ein#werfen(see Table 3.11) (er warf etwas ein)

TRUNC lemma is the complete In- und Auslandword suffixed with %n, → Inland%n und Ausland%v, %a, %c, %p for the hin- und herziehtrespective part of speech → hinziehen%v und herziehen

VVFIN base form: infinitive ging → gehenVVIMP sprich → sprechenVVINF zahlen → zahlenVVIZU aufzufallen → auf#fallenVVPP getroffen → treffenVAFIN See Table 3.11 for ist → sein%aux, ist → seinVAIMP auxiliary and passive use seid → sein%aux, seid → seinVAINF (%aux, %passiv). haben → haben%aux, haben → habenVAPP gewesen → sein%aux, gewesen → seinVMFIN will → wollen%aux, will → wollenVMINF moge → mogen%aux, moge → mogenVMPP gekonnt → konnenFM foreign language material ad hoc, goes, areas

is invariantXY non-words are invariant, 18a → 18a

lemmata in lower-case H2O → h2oletters

$, $. $( invariant form , . ? ... (

36

Page 38: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

3.5.2 Lemmatization Rules for Specific Linguistic Phenomena

The following Table 3.11 describes the lemmatization rules applied for specific linguisticphenomena in the TuBa-D/Z.

Table 3.11: Lemmatization rules for specific linguistic phenomena

phenomenon lemmatization rule examples

abbreviation abbreviations and z. B., usw., Dr.or acronym acronyms are invariant TSV, FDPspelling mapping to the correct wolte → wollteerrors spelling of the lemma Durchamtmen → Durchatmenmultiword one lemma for each New York → New Yorkterm multiword token Orang Utan → Orang Utandialect the base form of dialect es jutt → es geben

words is the respective snakt → sprechenstandard German word Dag → Tagwith an underscoreappended

contraction mapping to a complex Glaubense → glauben Sieof words lemma with an under- isser → sein er

score between the baseforms of the contractionparts

exception: APPRART zur → zureduced to the preposi-tion

non-standard mapping to the correct seele → Seeleuse of lower- writing of the lemma KOMMENTAR → Kommentarcase and based on Germanupper-case orthographyletters

polite form with upper-case letters

Sie → Sie

spelling are annotated as fantastische → fantastischvariations distinct lemmata phantastische → phantastisch

37

Page 39: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

phenomenon lemmatization rule examples

ambiguous for plurals unmarked Jugendliche →plural forms for gender, all possible Jugendlicher|Jugendliche|Jugendliches

lemmata are listedseparated by die (PDS np*) → der|die|dasa diacritic ,|’, e.g. denen (PRELS dp*) → der|die|daslemmata of deadjectivalplural nouns or pluralpronouns withunderspecified gender

auxiliaries: the lemma is suffixed ist → sein%auxsein, haben, with the tag %auxwerden if used as auxiliary

modal verbs: darf → durfen%auxmussen, sollen,konnen, wollen,durfen, mogenauxiliaries base form: infinitive ist → seinand without %aux suffix (... es ist hochste Zeit ...)modal verbsused as kann → konnenmain verbs (... wer kann das uberhaupt noch ...)passive werden the lemma is suffixed wird (geehrt) → werden%passiv

with the tag %passivverbs with a the verb lemma is de- stellen ... ein → ein#stellenseparable noted as prefix#verb,prefix whether the prefix is eingestellt → ein#stellen

separated or not(See Table 3.10 forverb particles (PTKVZ))

The following tree diagram illustrates the TuBa-D/Z lemma annotation below themorphological feature combinations marked as ”LM=lemma” for each token of the sen-tence:

Aber es gabe intelligente Losungen, die kein Geld kosten.

38

Page 40: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505 506

507 508 509 510 511 512

513 514

515

516

Aber

KON

−−

LM=aber

es

PPER

nsn3

LM=es

gäbe

VVFIN

3skt

LM=geben

intelligente

ADJA

apf

LM=intelligent

Lösungen

NN

apf

LM=Lösung

,

$,

−−

LM=,

die

PRELS

np*

LM=der|die|das

kein

PIAT

asn

LM=kein

Geld

NN

asn

LM=Geld

kosten

VVFIN

3pis

LM=kosten

.

$.

−−

LM=.

− HD HD HD HD − HD HD

NX

ON

VXFIN

HD

ADJX

− HD

NX

ON

NX

OA

VXFIN

HD

NX

OA

C

MF

VC

R−SIMPX

OA−MOD

KOORD

VF

LK

MF

NF

SIMPX

pronouns, nouns, determiner (base form nom/sg):LM=es, LM=Losung, LM=kein, LM=Geld, LM=der|die|das

verbs (base form infinitive):LM=geben, LM=kosten

adjective (base form predicate):LM=intelligent

conjunction, punctuation marks (invariant):LM=aber, LM=, LM=.

39

Page 41: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 4

The Annotation of the InternalStructure of Phrases

4.1 Premodification and Postmodification in Phrases

The annotation of phrases is also carried out following the flat clustering principle inorder to keep the number of hierarchy levels in a syntactic structure as small as possible.As will be shown in the following sections, phrases may include adjectival or nominalpremodifiers and/or postmodifiers of any syntactic category. Both kinds of modifiers arein principle projected to their phrase levels. Since the modification scope of premodifiersis unambiguous, they are directly attached to the head of the phrase which they modify.By contrast, postmodifiers are always attached on a higher level to preserve ambiguity.This decision, referred to in 3.3 as the high attachment principle, was made to avoid theproblematic distinction whether a postmodifier is a free adjunct or a complement of themodified phrase. The attachment strategy for premodifiers and postmodifiers is appliedfor all categories of phrases.

4.2 Noun Phrases

A simple noun phrase (NX) consists of a head noun (noun, proper noun, or a pronoun),(optionally) a determiner and (optionally) an adjectival or a nominal premodifier of anycomplexity preceding the head noun. A complex noun phrase is a simple noun phrasewith a postmodifier of any syntactic category and complexity.

4.2.1 Noun Phrases without Modifiers

Simple noun phrases without modifiers are single nouns, proper nouns, pronouns or propernouns consisting of more than one NE. All of them are directly projected to their phraselevel. While single nouns, proper nouns and pronouns carry the edge label HD, the NE-tagged tokens of a complex proper noun are attached on the same level without headinformation:

40

Page 42: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0

500

Spendengeld

NN

asf

HD

NX

0

500

Hamburg

NE

dsn

HD

NX

0 1

500

Ute

NE

nsf

Wedemeier

NE

nsf

− −

NX

If proper nouns include other parts of speech than NEs, these parts are tagged accord-ing to their distribution. Therefore, proper nouns with a preposition include a preposi-tional phrase.

0 1 2

500

501

502

Ole

NE

nsm

von

APPR

d

Beust

NE

dsm

HD

NX

HD

PX

NX

− −

4.2.2 Prenominal Modification

In a simple noun phrase, both the determiner and the head noun are directly attachedon the same level to NX so that the label of the head noun carries the edge label HD andthe edge label of the determiner is empty.

41

Page 43: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500

die

ART

nsf

Auseinandersetzung

NN

nsf

− HD

NX

0 1

500

jede

PIDAT

nsf

Spur

NN

nsf

− HD

NX

Since prenominal modifiers are directly attached to the head noun on the same level,their edge labels are empty (whereas the edge labels of modifiers that are attached totopological fields are non-empty (cf. 8.4)). Prenominal modifiers are either attributiveadjectives or preceding genitive phrases:

0 1 2

500

501

ein

ART

nsm

externer

ADJA

nsm

Wirtschaftsprüfer

NN

nsm

HD

ADJX

− HD

NX

0 1 2 3

500

501

die

ART

npf

zu

PTKZU

−−

verhandelnden

ADJA

npf

Taten

NN

npf

ADJX

− HD

NX

HD −

42

Page 44: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500

501

Bremens

NE

gsn

Gesundheitssenatorin

NN

nsf

HD

NX

− HD

NX

If there is a PIDAT preceding the article it is directly attached to the noun phrase.

0 1 2 3

500

501

all

PIDAT

***

die

ART

apm

historischen

ADJA

apm

Fehler

NN

apm

HD

− −

ADJX

− HD

NX

If a PIDAT is following the article in adjective position it is projected to its phraselevel (ADJX) with possible premodifiers and then directly attached like an attributiveadjective to the noun phrase.

0 1 2

500

501

Die

ART

npm

meisten

PIDAT

npm

Benutzer

NN

npm

HD

ADJX

− HD

NX

43

Page 45: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5

500 501

502

503

504

die

ART

npm

in

APPR

d

Deutschland

NE

dsn

ohnehin

ADV

−−

wenigen

PIDAT

npm

Gen−Food−Produzenten

NN

npm

HD HD

NX

HD

PX

ADVX

− HD

ADJX

− HD

NX

If there is more than one prenominal modifier, the one on the left hand side of thenoun is modifying the following noun, the one on the left hand side of the modifier ismodifying both, the modifier and the noun, and so on. All of these modifiers are attachedto the head noun on the same level which yields a rather flat noun phrase structure.This strategy is justified by the fact that these modifiers have a scope of modificationbeyond the adjectival phrase, e.g. as in coordinated noun phrases like insgesamt 12.000Studienplatze und 15.000 Lehrstellen, the adverb insgesamt modifies 12.000 Studienplatzeas well as 15.000 Lehrstellen.

0 1 2

500 501

502

lieber

ADJA

nsm

knieartiger

ADJA

nsm

Leser

NN

nsm

HD HD

ADJX

ADJX

− HD

NX

In case of complex head nouns, e.g. complex (proper) nouns consisting of two nominalparts or coordinated head nouns (cf. 6.5.5), first the complex noun respectively the coor-dination (cf. 6.5) is annotated with its own internal dependency structure. Afterwards,the determiner and possible premodifying adjectival phrases are attached on a higherlevel.

44

Page 46: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5

500

501

502

Die

ART

nsf

"

$(

−−

debis

NE

nsf

Systemhaus

NE

nsn

GmbH

NN

nsf

"

$(

−−

− −

NX

− HD

NX

HD

NX

0 1 2

500

501

502

der

ART

gsf

Heinrich

NE

gsm

Böll−Stiftung

NN

gsf

HD

NX

− HD

NX

HD

NX

0 1 2 3 4 5 6

500 501

502

503

504

"

$(

−−

Solidarität

NN

nsf

mit

APPR

d

Miloevic

NE

dsm

Milosevic

"

$(

−−

$(

−−

Parolen

NN

dpf

HD HD

NX

HD

NX

HD

PX

NX

− HD

NX

45

Page 47: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500 501

502

503

ihren

PPOSAT

asm

Sänger

NN

asm

und

KON

−−

Gründer

NN

asm

HD HD

NX

KONJ −

NX

KONJ

NX

HD

NX

4.2.3 Postnominal Modification

Whereas prenominal modifiers are always directly attached to the head noun on the samelevel, postnominal modifiers are attached to the head noun on a higher level. Postnominalmodifiers are also always first projected to the phrase level before they are attached tothe head noun on a higher level. Phrase internal postmodifiers can be of any phrasalcategory. The following tree structures show a prepositional phrase (PX) and a genitivephrase (NX) as postmodifiers. See section 6.4, page 109 for the analysis of relative clauses.

0 1 2

500 501

502

503

Glück

NN

nsn

im

APPRART

dsn

Netz

NN

dsn

HD HD

NX

HD

NX

HD

PX

NX

0 1 2 3 4

500 501

502

503

Die

ART

asf

Mitteilung

NN

asf

des

ART

gsm

Bremer

ADJA

***

Senats

NN

gsm

− HD HD

ADJX

− HD

NX

HD

NX

NX

In case a noun has more than one postmodifier, these modifiers usually show a hi-erarchical structure, for example, the first modifier modifies the head noun, the second

46

Page 48: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

modifier modifies the complete preceding noun phrase structure, and so on.

0 1 2 3 4 5

500 501 502

503 504

505

506

die

ART

apf

guten

ADJA

apf

Beziehungen

NN

apf

Bonns

NE

gsn

zu

APPR

d

Moskau

NE

dsn

HD HD HD

ADJX

− HD −

NX

HD

NX

HD

NX

NX

HD

PX

NX

Attributes of degree and quantity nouns are also defined as postnominal modifiers:

0 1 2

500 501

502

eine

ART

asf

Kiste

NN

asf

Sprengstoff

NN

asm

− HD HD

NX

HD

NX

NX

Cardinal numbers either appear as quantity nouns or premodifying adjectival at-tributes, e.g. the cardinal number 1,000,000 can also be expressed by the quantity nouneine Million. Therefore, we have to distinguish the following two ways of annotation:

0 1 2 3 4 5 6 7

500 501 502 503

504 505

506

507

508

509

510

Der

ART

nsm

Etat

NN

nsm

von

APPR

d

3,5

CARD

−−

Millionen

NN

dpf

Mark

NN

dpf

steht

VVFIN

3sis

.

$.

−−

− HD HD HD HD

ADJX

− HD

VXFIN

HD

NX

HD

NX

NX

HD

NX

HD

PX

NX

ON

VF

LK

SIMPX

47

Page 49: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507

508 509 510 511 512

513 514

515

516

517

"

$(

−−

Das

ART

nsn

Kilo

NN

nsn

Weißbrot

NN

nsn

kostete

VVFIN

3sit

zuletzt

ADV

−−

5

CARD

−−

Mark

NN

apf

"

$(

−−

,

$,

−−

empört

VVFIN

3sis

sie

PPER

nsf3

sich

PRF

as*3

.

$.

−−

− HD HD HD HD HD HD HD HD

NX

HD

NX

VXFIN

HD

ADJX

− HD

VXFIN

HD

NX

ON

NX

OA

NX

ON

ADVX

MOD

NX

OA

VF

LK

MF

SIMPX

OS

VF

LK

MF

SIMPX

For nominal postmodifiers apart from genitive phrases the same attachment rule is ap-plied. This kind of postmodifiers which may also appear in brackets, e.g. Heinz Schleußer(SPD), is semantically closely related to the preceding head noun phrase. die Arbeiter-wohlfahrt Bremen, for instance, means die Arbeiterwohlfahrt which is located in Bremen,but does not mean die Arbeiterwohlfahrt which is called Bremen. Hence, these postmod-ifiers have to be distinguished from appositions (cf. 4.2.4) and complex named entities(cf. 4.2.6).

0 1 2 3 4

500 501

502

Heinz

NE

nsm

Schleußer

NE

nsm

(

$(

−−

SPD

NE

nsf

)

$(

−−

− − HD

NX

HD

NX

NX

0 1 2

500 501

502

die

ART

nsf

Arbeiterwohlfahrt

NN

nsf

Bremen

NE

nsn

− HD HD

NX

HD

NX

NX

48

Page 50: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500 501

502

Zentralkrankenhaus

NN

dsn

Ost

NN

dsn

HD HD

NX

HD

NX

NX

0 1

500 501

502

Kapitel

NN

dsn

VII

CARD

−−

HD HD

NX

HD

NX

NX

0 1 2

500 501

502

des

ART

gsm

ICE

NN

gsm

884

CARD

−−

− HD HD

NX

HD

NX

NX

4.2.4 Appositional Constructions

An apposition is a specific kind of attribute to a noun, which normally agrees in casewith this noun and does not change its overall meaning. There is no consensus amonggrammarians of what is exactly meant by the notion apposition (cf. (Eisenberg 19992001)). Eisenberg (1999 2001), for instance, claims that, e.g. Ute Wedemeier die Lan-desvorsitzende and die Landesvorsitzende Ute Wedemeier are both appositions but it isnot clear which part is the apposition and which part is the head noun. The DudenGrammar (1995) distinguishes between loosely constructed appositions (lockere Apposi-tion) (e.g. Ute Wedemeier, die Landesvorsitzende,), which follow the head noun separatedby a comma, and tightly constructed appositions (enge Apposition) (e.g. (die) Landesvor-sitzende Ute Wedemeier), which precede the head noun (cf. (Drosdowski 1995)). Accord-ing to Helbig/Buscha (1998) there is case agreement between loosely constructed appo-sitions and head nouns which are separated by a punctuation mark. By contrast, Engel(1996) thinks that only loosely constructed appositions can be regarded as appositions.He treats tightly constructed appositions as nomen varians or nomen invarians.

Because of these different definitions of the notion of apposition, we do not decideon what is the head noun and what is the apposition. We assume referential identitybetween the two parts. Loosely constructed appositions as well as tightly constructed ap-positions are treated as appositional constructions, i.e., the head noun and its apposition

49

Page 51: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

form a complex structure which does not give any information about head assignment.Therefore, both parts are first projected to their phrase level and then coordinated on ahigher level, each of them labelled as apposition (APP), i.e. as a part of an appositionalstructure. What is important is the referential identity in meaning. Thus, Nummer 1 isan appositional construction, whereas Seite 1 is a noun phrase with the postmodifier 1.Forms of address for persons and titles, e.g. Herr, Frau, Doktor (Dr.), Professor (Prof.),Bundeskanzler, are also treated as appositional constructions. Here are some examples:

0 1 2 3 4 5

500 501 502

503 504

505

Donnerstag

NN

asm

morgen

ADV

−−

,

$,

−−

den

ART

asm

13.

ADJA

asm

Mai

NN

asm

HD HD HD

NX

HD

ADVX

− −

ADJX

− HD

NX

APP

NX

APP

NX

0 1

500 501

502

Herr

NN

nsm

Taake

NE

nsm

HD HD

NX

APP

NX

APP

NX

0 1 2

500 501

502

Landesvorsitzende

NN

nsf

Ute

NE

nsf

Wedemeier

NE

nsf

HD − −

NX

APP

NX

APP

NX

50

Page 52: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6

500 501 502

503

504

505

Volker

NE

nsm

Tegeler

NE

nsm

,

$,

−−

stellvertretender

ADJA

nsm

Geschäftsführer

NN

nsm

des

ART

gsm

Landesverbandes

NN

gsm

− − HD − HD

ADJX

− HD

NX

HD

NX

NX

APP

NX

APP

NX

0 1 2

500 501

502

die

ART

nsf

Stadt

NN

nsf

Frankfurt

NE

nsn

− HD HD

NX

APP

NX

APP

NX

0 1 2

500 501 502

503

504

Vorwurf

NN

nsm

Nummer

NN

nsf

1

CARD

−−

HD HD HD

NX

APP

NX

APP

NX

APP

NX

APP

NX

0 1

500 501

502

Telefon

NN

dsn

472711

CARD

−−

HD HD

NX

APP

NX

APP

NX

In case of a form of address combined with one or more titles preceding a name, weannotate an embedded appositional construction:

51

Page 53: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5

500 501 502

503 504

505

Die

ART

nsf

Dortmunder

ADJA

***

Psychologin

NN

nsf

Prof.

NN

nsf

Alexa

NE

nsf

Franke

NE

nsf

HD HD − −

ADJX

− HD

NX

APP

NX

APP

NX

APP

NX

APP

NX

The same way, we treat proper nouns which are identical to a preceding proper noun,for example, an actor’s name and role:

0 1 2 3 4

500 501

502

Andrea

NE

dsf

Spatzek

NE

dsf

/

$(

−−

Gabi

NE

dsf

Zenker

NE

dsf

− − − −

NX

APP

NX

APP

NX

Premodification of the whole appositional construction is attached to an additionalNX level.

0 1 2 3

500 501 502

503

504

Auch

ADV

−−

Bundesumweltminister

NN

nsm

Jürgen

NE

nsm

Trittin

NE

nsm

HD HD − −

NX

APP

NX

APP

ADVX

NX

HD

NX

There are some examples in which the appositional construction does not agree in case.These are postnominal titles of books, movies, etc. and translations interspersed in thesentence. In the latter type, we extend the appositional construction also to non-nominalphrases.

52

Page 54: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6

500 501

502

503

in

APPR

d

dem

ART

dsm

Film

NN

dsm

"

$(

−−

Das

ART

nsn

Verhör

NN

nsn

"

$(

−−

− HD − HD

NX

APP

NX

APP

NX

HD

PX

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504

505 506 507

508

509

510

um

KOUI

−−

die

ART

apm

wildernden

ADJA

apm

Hunde

NN

apm

außer

APPR

g

Gesetzes

NN

gsn

(

$(

−−

"

$(

−−

outlaw

FM

−−

"

$(

−−

)

$(

−−

zu

PTKZU

−−

stellen

VVINF

−−

− HD HD HD HD −

ADJX

− HD −

NCX

HD

VXINF

HD

PX

APP

FX

APP

NCX

OA

PX

OPP

C

MF

VC

SIMPX

4.2.5 Foreign Language Material

Words or parts of a text written in a foreign language except foreign language propernouns are tagged as foreign language material (FM), e.g. hello (FM), no (FM) longer(FM) amused (FM). All parts of foreign language proper nouns are tagged as NE (e.g.Mary (NE) , New (NE) York (NE), University (NE) of (NE) Illinois (NE)). Single foreignwords are projected to a syntactic level assigned the node label FX, which is an universallabel for any syntactic category (phrasal and sentential) in the respective foreign language.More complex parts of a text tagged as FM are attached on the same level without anyinternal syntactic structure and head assignment. Their mother node is also assigned thelabel FM, e.g. no longer amused. For foreign language constructions containing a propernoun, the annotation strategy is the following: in a first step, all NEs are projected tothe phrase level (NX), in a second step, these phrase node labels together with all FMsare projected to the next higher level with the node label FX. Again, there is no headassignment directly below the FX node, e.g. Mister Gere himself.

53

Page 55: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0

500

hello

FM

−−

DM

0 1

500

ad

FM

−−

acta

FM

−−

− −

FX

0 1 2 3 4

500

501

das

ART

nsn

no

FM

−−

longer

FM

−−

amused

FM

−−

Kollegium

NN

nsn

− − −

FX

NX

− − HD

0 1 2 3

500

501

502

wie

KOKOM

−−

Mr.

FM

−−

Gere

NE

nsm

himself

FM

−−

HD

NX

− −

FX

HD

NX

Often, foreign language material is a part of a German syntactic construction and playsthe role of a grammatical function. Therefore, the FX node is attached as a constituentto the tree structure. If it is directly attached to a field or a sentence bracket, the edgelabel above the FX node denotes its grammatical function within the clause, e.g. Kafkagoes Kleinkunst (head of the clause).

54

Page 56: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2

500 501 502

503 504 505

506

Kafka

NE

nsm

goes

FM

−−

Kleinkunst

NN

asf

HD HD HD

NX

ON

FX

HD

NX

V−MOD

VF

LK

MF

SIMPX

If a FX or a single FM is head of a phrase which can be identified as a Germanphrase, e.g. by an article and/or an adjective (noun phrase), it is projected to the specificphrasal category, e.g. NX instead of FX in constructions like ein piece of art or ihrernordamerikanischen Brothers.

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503

504

505

506

Oder

KON

−−

aber

ADV

−−

eine

ART

nsf

Skulptur

NN

nsf

,

$,

−−

ein

ART

nsn

Raumelement

NN

nsn

,

$,

−−

ein

ART

nsn

piece

FM

−−

of

FM

−−

art

FM

−−

.

$.

−−

HD − HD − HD − − −

FX

HD

NX

KONJ

NX

KONJ

NX

KONJ

ADVX

NX

HD

NX

0 1 2

500

501

ihrer

PPOSAT

gp*

nordamerikanischen

ADJA

gp*

Brothers

FM

−−

HD

ADJX

− HD

NX

If FX is modified by a postmodifier the mother node of the complex phrase is alsoFX, which again may be preceded by another phrase, e.g. Unter der Uberschrift ’user alslooser’.

55

Page 57: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7

500 501 502

503

504

505

Unter

APPR

d

der

ART

dsf

Überschrift

NN

dsf

$(

−−

user

FM

−−

als

KOKOM

−−

looser

FM

−−

$(

−−

− HD HD − HD

FX

HD

NX

NX

APP

FX

APP

NX

HD

PX

4.2.6 Named Entity Annotation

Proper nouns denote individual living beings, objects, titles, etc. which exist only onceas entities with their own specific properties. The distinction between proper nouns andnouns is not always clear-cut. On the one hand, proper nouns can also become nouns,e.g. Opel as the company is a proper noun POS-tagged as Opel (NE), on the other hand,Opel as the car is a noun POS-tagged as Opel (NN).

In addition to the categories of proper nouns listed in the STTS guidelines (e.g. firstand last names of persons, names of companies, and geographical names), we also definenames of streets and places (e.g. Feldstraße), individual names of institutions (e.g. Max-Planck-Institut, Pergamonmuseum), events (e.g. Marzrevolution), and titles of books,movies, etc. as specific categories of proper nouns . In contrast to the categories definedin the STTS, our additional categories are not POS-tagged as ‘NE’, e.g. Alexanderplatz(NN), heute (ADV).

Complex proper nouns forming a syntagma as well as titles, names of historical events,institutions, and so on, are POS-tagged according to their distribution (e.g. der (ART)Potsdamer (ADJA) Platz (NN), Auf (APPR) die (ART) sturmische (ADJA) Art (NN),Schlaflos (ADJD) in (APPR) Seattle (NE)). On the syntactic level, we define all kinds ofproper nouns as named entities.

In our enhanced version of the TuBa-DZ treebank, named entity information is definedby semantic classes: each named entity is assigned one of the following five semanticclasses (cf. 3.4.2):

1. person (PER)

2. organisation (ORG)

3. location (LOC)

4. geopolitical entity (GPE)

5. other (OTH)

56

Page 58: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Table 4.1 lists the commonly occurring semantic subclasses for named entities in theTuBa-D/Z with examples:

Table 4.1: Semantic Classes and Subclasses for Named Entities

SemanticClasses

Common Semantic Subclasses Examples

PER persons Hans Winklersurnames (Familie) Feuersteinnames of animals (personified) (Schweinchen) Babe

ORG organisations Nato, EUcompanies Microsoft, Bertelsmann,institutes Institut fur chinesische Medizinmuseums Pergamonmuseumnewspapers, journals Suddeutsche Zeitung, Der Spiegelclubs VfB Stuttgarttheaters, cinemas Metropol-Theater, CinemaxXuniversities Freie UniversitatTV and radio stations Arte, Radio Bremenrestaurants, hotels Sassella, Adlonforces Blauhelmefashion labels Chanelsporting events Olympische Spiele, Wimbledonbands Beatles, Die Fantastischen Vier

LOC districts Schonebergsights, churches Brandenburger Tor, Johanniskircheplanets Marsgeographical areas Konigsheidestreets, places Sogestraße, Alexanderplatzmountains, lakes Alpen, Viktoriaseecontinents Europa, Asien

GPE countries, states (incl. historical) Frankreich, Hessen, Assyriencities (incl. historical) Berlin, Babylon

OTH operating systems DOStitles of books, movies, etc. Faust, Schlaflos in Seattlemottoes, slogans Zwischen Himmel und Erdewars Zweiter Weltkrieg

In order to annotate the semantic classes, syntactic-semantic node labels of thepattern syntactic category = semantic class are defined for the mother node of namedentities (see Table 3.9). The syntactic-semantic nodes indicate that the structure belowrepresents a (complex) named entity of a certain syntactic category belonging to one ofthe five semantic classes (cf. 3.4.2).

The former node label ’EN-ADD’ and the secondary edge label ’EN’ are deleted.

57

Page 59: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Our annotation strategy for named entities is shown in the following tree examples:

Named entities may consist of one or more lexical elements tagged as NE. In case ofa single NE, this NE is projected to its phrase level, carrying the respective syntactic-semantic node label (Gutersloh (NX=GPE)). Named entities consisting of two or moreNEs are attached on the same level. None of them carries a head label in order to indicatethat there is no obvious dependency relation between them (St. Pauli (NX=LOC)).

There can occur postmodifiers within a named entity which are a named entity them-selves like St. Pauli (NX=LOC) in [[FC (NX)] [St. Pauli (NX=LOC)]] (NX=ORG).

Parts which do not belong to a named entity are marked with the edge label ‘-NE’asin [[den (-NE) FC (NX)] [Guthersloh (NX=GPE)]] (NX=ORG).

Named entities which are not tagged as NE, e.g. Millerntor (NX=LOC) are alsoassigned a syntactic-semantic node label.

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505 506

508 509511 513

512514

515

517

FC

NN

nsm

St.

NE

nsm

Pauli

NE

nsm

siegt

VVFIN

3sis

1:0

CARD

−−

gegen

APPR

a

den

ART

asm

FC

NN

asm

Gütersloh

NE

asn

am

APPRART

dsn

Millerntor

NN

dsn

.

$.

−−

HD − − HD HD −NE HD HD HD

VXFIN

HD

NX

HD

NX=GPE

NX

HD

NX=LOC

− −

NX=LOC

HD

NX=ORG

HD

NX=ORG

ON

NX

V−MOD

PX

V−MOD

PX

V−MOD

VF

LK

MF

SIMPX

As mentioned above, all elements of German named entities which consist of a com-plex syntactic structure, e.g. a phrase or a sentence, are always tagged according to theirdistribution and annotated with their internal syntactic structure as noun phrases, prepo-sitional phrases, adjectival phrases, clauses, etc., e.g. the movie title (Schlaflos (ADJD)in (APPR) Seattle (NE)) (ADJX=OTH) in the following tree example. If two namedentity nodes are coordinated like Tom Hanks (NX=PER) and Meg Ryan (NX=PER)their mother node is NX which represents the nominal status of the named entity.

58

Page 60: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

500 501 502 503 504 505 506

507 508 509 510

511 512

513

514

515

"

$(

−−

Seit

APPR

d

"

$(

−−

Schlaflos

ADJD

−−

in

APPR

d

Seattle

NE

dsn

"

$(

−−

gelten

VVFIN

3pis

Tom

NE

nsm

Hanks

NE

nsm

und

KON

−−

Meg

NE

nsf

Ryan

NE

nsf

als

KOKOM

−−

Dream−Team

NN

nsn

des

ART

gsm

Biedersinns

NN

gsm

.

$.

−−

HD HD HD − HD

− HD

VXFIN

HD KONJ − KONJ −

NX

HD

NX

ADJX

PX NX

ON

NX

PRED

− HD

PX

V−MOD

VF

LK

MF

SIMPX

ADJX=OTH

HD −

PX=GPE

HD

NX=PER

− −

NX=PER

− −

If the original form of a named entity (e.g. Zweiter Weltkrieg) is inflected and/orpremodified by an article and/or attributive adjective like in the following example tree(dem Zweiten Weltkrieg (NX=OTH)), the mother node of the named entity carries thesemantic class information and all parts that do not belong to the named entity areassigned the edge label ‘-NE’.

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505

506 507 508 509 510

511

512

513

Stimmt

VVFIN

3sis

,

$,

−−

sie

PPER

nsf3

wurde

VAFIN

3sit

erst

ADV

−−

nach

APPR

d

dem

ART

dsm

Zweiten

ADJA

dsm

Weltkrieg

NN

dsm

gebaut

VVPP

−−

.

$.

−−

HD HD HD HD HD HD

VXFIN

DM

NX

ON

VXFIN

HD

ADJX VXINF

OV

− HD

ADVX

MOD

PX

V−MOD

VF

LK

MF

VC

SIMPX

NX=OTH

−NE − HD

59

Page 61: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

If a named entity of a syntactic category other than NX has a premodifier or a post-modifier, (both can be a named entity itself) the mother node of the whole constituentis always NX which represents the nominal status of the named entity:

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505

506 507 508

509 510

511 512

513

Oliver

NE

gsm

Bukowskis

NE

gsm

derbes

ADJA

nsn

"

$(

−−

Bis

APPR

a

Denver

NE

asn

"

$(

−−

feierte

VVFIN

3sit

im

APPRART

dsn

Altonaer

ADJA

***

Theater

NN

dsn

Premiere

NN

asf

− − HD HD HD HD HD

NX=GPE

HD

VXFIN

HD

ADJX

− HD

NX=PER

ADJX

PX=OTH

HD −

NX=ORG

HD

NX

ON

PX

V−MOD

NX

OA

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8

500 501 502 503

504 505 506

507

508

"

$(

−−

Sind

VAFIN

3pis

Sie

PPER

np*3

Luigi

NE

nsm

?

$.

−−

"

$(

−−

von

APPR

d

Stephan

NE

dsm

Brüggenthies

NE

dsm

HD HD − −

VXFIN

HD

NX

ON PRED −

NX=PER

HD

LK

MF

SIMPX=OT

HD

PX

NX

NX=PER

HD

60

Page 62: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

If a named entity is a syntagma with its own internal syntactic structure, i.e. it doesnot agree with the inflection of another constituent of the sentence (e.g. uninflected titlesof books, movies, etc.) all premodifiers are attached on a higher level:

0 1 2 3 4 5 6 7 8

500 501 502 503

504 505 506

507 509

510

511

Im

APPRART

dsn

Radio

NN

dsn

läuft

VVFIN

3sis

Vivaldis

NE

gsm

"

$(

−−

Vier

CARD

−−

Jahreszeiten

NN

npf

"

$(

−−

.

$.

−−

HD HD HD

NX

HD

VXFIN

HD

ADJX

− HD

PX

OPP −

NX=OTH

HD

NX

ON

VF

LK

MF

SIMPX

NX=PER

HD

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504

505 506 507

508

509

510

511

512

Den

ART

asm

Auftakt

NN

asm

macht

VVFIN

3sis

eine

ART

nsf

Aufführung

NN

nsf

von

APPR

d

Ernst

NE

gsm

Jandls

NE

gsm

Aus

APPR

d

der

ART

dsf

Fremde

NN

dsf

.

$.

−−

− HD HD − HD − − − HD

NX

OA

VXFIN

HD −

NX

HD

NX=PER

PX=OTH

HD

NX

HD

NX

HD

PX

NX

ON

VF

LK

MF

SIMPX

61

Page 63: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

If the named entity is inflected, all premodifiers are directly attached to the headnoun, labeled as ‘-NE’:

0 1 2 3 4

500 501

502

503

in

APPR

d

Heinrich

NE

gsm

Bölls

NE

gsm

Irischem

ADJA

dsn

Tagebuch

NN

dsn

− − HD

NX=PER

−NE

ADJX

− HD

NX=OTH

HD

PX

Premodifiers of single word named entities are also directly attached to the head noun:

0 1

500

501

Goethes

NE

gsm

Faust

NE

nsm

HD

NX=PER

−NE HD

NX=OTH

If a postmodifier is part of a named entity which is not a title, all premodifiers aredirectly attached to the NX of the head noun. If the premodifiers are not part of thenamed entity itself, they are assigned the edge label ‘-NE’.

0 1 2 3 4

500 501 502

503

504

die

ART

asf

neugeschaffene

ADJA

asf

Zweite

ADJA

asf

Liga

NN

asf

Nord

NN

asn

HD HD HD

−NE

ADJX

−NE

ADJX

− HD

NX

HD

NX

NX=ORG

0 1 2

500 501

502

der

ART

gsf

IG

NN

gsf

Medien

NN

gpn

−NE HD HD

NX

HD

NX

NX=ORG

62

Page 64: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

In the case of titles including a postmodifier, all premodifiers of the named entity areattached on a higher level:

0 1 2 3 4 5

500 501 502

503

504

Koch−Raphaels

NE

gsm

"

$(

−−

Engel

NN

a*m

der

ART

gsf

Zeit

NN

gsf

"

$(

−−

HD HD − HD

NX

HD

NX

NX=PER

NX=OTH

HD

NX

Foreign Language Named Entities

The syntactic annotation of foreign language named entities differs from the annotationof German named entities in the following aspects. According to the STTS guidelines,foreign language proper nouns are tagged as NE, while all other lexical elements of aforeign language are tagged as foreign language material (FM). A foreign language namedentity which consists of a proper noun, e.g. the title of a movie is assigned a syntactic-semantic node label of the category NX (Forrest Gump (NX=OTH)).

0 1

500

Forrest

NE

nsm

Gump

NE

nsm

− −

NX=OTH

If a foreign language named entity consists of only FM tagged tokens, these tokensare directly attached on the same level without internal syntactic structure. The mothernode of the phrase is marked as a syntactic-semantic node label of the category FX, e.g.Knockin’ on Heaven’s Door (FX=OTH).

0 1 2 3

500

Knockin’

FM

−−

on

FM

−−

Heaven’s

FM

−−

Door

FM

−−

FX=OTH

− − − −

If a foreign language named entity consists of NE as well as FM tagged tokens, e.g.Shakespeare (NE) in (FM) Love (FM), NE is projected to NX=PER. The NX=PER nodeand all FM tagged tokens are attached directly on the same level.

63

Page 65: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2

500

501

Shakespeare

NE

nsm

in

FM

−−

Love

FM

−−

NX=PER

HD

FX=OTH

− − −

4.2.7 Ordinal Numbers

According to their distribution, ordinal numbers occur either as a premodifying attribu-tive adjective (e.g. die dritte (ADJA) Partie) or as a head noun (e.g. er ist der sechste(NN)). In the first case, the premodifier is projected to an adjectival phrase, in the lattercase it is projected to a noun phrase.

0 1 2

500

501

die

ART

nsf

dritte

ADJA

nsf

Partie

NN

nsf

HD

ADJX

− HD

NX

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505 506

507 508 509

510

Aber

KON

−−

das

PDS

nsn

ist

VAFIN

3sis

ja

ADV

−−

auch

ADV

−−

schon

ADV

−−

die

ART

nsf

Vierte

NN

nsf

.

$.

−−

− HD HD HD HD HD − HD

NX

ON

VXFIN

HD

ADVX

MOD

ADVX

MOD

ADVX

MOD

NX

PRED

KOORD

VF

LK

MF

SIMPX

4.2.8 Cardinal Numbers

According to their syntactic function (nominal or adjectival), cardinal numbers (CARD),are either projected to NX or ADJX. If their numerals are written separately or in groups,e.g. numbers of bank accounts, they are attached on the same level like proper nameswithout internal head assignment.

64

Page 66: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500 501

502

Jahr

NN

dsn

2000

CARD

−−

HD HD

NX

APP

NX

APP

NX

0 1 2 3

500

501

502

in

APPR

d

allen

PIDAT

dpm

23

CARD

−−

Bezirken

NN

dpm

HD

ADJX

− HD

NX

HD

PX

0 1 2 3

500 501

502

BLZ

NN

nsf

500

CARD

−−

901

CARD

−−

00

CARD

−−

HD − − −

NX

APP

NX

APP

NX

A premodifying cardinal number is nominal if it does not express a quantity like inthe example above, but a characteristic of the following noun, e.g. the number of a zipcode:

0 1

500

501

13187

CARD

−−

Berlin

NE

nsn

HD

NX

− HD

NX

Complex time expressions or results of competitions are also treated as cardinal num-bers:

65

Page 67: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500

501

20.15

CARD

−−

Uhr

NN

nsf

HD

ADJX

− HD

NX

0 1

500

501

mit

APPR

d

3:0

CARD

−−

HD

NX

PX

− HD

4.2.9 Letters and Non-Words

Letters and non-words are tagged as XY. They are projected to their phrase level andassigned the syntactic category to which they belong in the construction. Signs whichrepresent a lexical element, e.g. the sign for paragraph, are tagged with the respectivepart-of-speech tag:

0 1

500

501

D−76351

XY

−−

Linkenheim

NE

dsn

HD

NX

− HD

NX

0 1 2 3 4

500 501 502 503

504

505

506

§

NN

nsm

220

CARD

−−

a

XY

−−

des

ART

gsn

Strafgesetzbuches

NN

gsn

HD HD HD − HD

NX

HD

NX

NX

HD

NX

NX

HD

NX

NX

66

Page 68: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

4.2.10 Expletive and Other Uses of es

The pronominal form es functions as expletive element in German. Three different ex-pletive usages are traditionally distinguished: formal subject or object, correlate of anextraposed clausal argument, and Vorfeld-es (cf. (Eisenberg 1999 2001), (Putz 1986)).For sake of completeness, the following list begins with an example of es as a referentialpersonal pronoun.

Personal Pronoun

The pronoun functions as an argument of the verb and refers to some person, object, orevent that is salient in the context. It can be tested, whether es is used as a pronoun byreplacing it by another noun or pronoun (such as das or er/ihn).

In the example tree es refers to the neuter noun Gastehaus in the preceding sentence:Die italienische Regierung hat die Familie im staatlichen Gastehaus Casino dell’Algardiuntergebracht.

0 1 2 3 4 5

500 501 502 503

504 505 506 507

508

509

Es

PPER

nsn3

wird

VAFIN

3sis

von

APPR

d

Scharfschützen

NN

dpm

bewacht

VVPP

−−

.

$.

−−

HD HD HD HD

NX

ON

VXFIN

HD −

NX

HD

VXINF

OV

PX

FOPP

VF

LK

MF

VC

SIMPX

Formal Subject or Object

The formal subject obligatorily occurs with weather verbs, e.g. Es regnet and impersonalor agentless constructions such as Es gibt so eine Buchung or Es geht um populare Unter-haltung. Some verbs optionally permit an expletive subject but also occur with referentialsubjects such as Max/Es klopft an der Tur. A formal object is found in constructionslike jmd. legt es an auf etw. or jmd. verdirbt es mit jmdm. In all examples mentioned,es functions as a grammatical argument without semantic contribution, i.e. it does notrefer to a person, object, or event.

In TuBa-D/Z formal subjects and objects are treated like referential pronouns andare labelled alike, e.g. with edge labels ON or OA.

Formal arguments are obligatory and may occur in the Mittelfeld. In case of doubt, itis a good test to paraphrase the sentence such that another element occupies the Vorfeld,e.g. Naturlich gibt es so eine Buchung versus *Naturlich gibt so eine Buchung.

67

Page 69: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7

500 501 502

503 504 505

506

507

"

$(

−−

Es

PPER

nsn3

gibt

VVFIN

3sis

so

ADV

−−

eine

ART

asf

Buchung

NN

asf

.

$.

−−

"

$(

−−

HD HD HD

NX

ON

VXFIN

HD

ADVX

− − HD

NX

OA

VF

LK

MF

SIMPX

Correlate of an Extraposed Clausal Argument

If a clausal argument is extraposed in the Nachfeld, it is optionally doubled by an expletivees in the Vorfeld or Mittelfeld. The expletive is labelled ON-MOD or OS-MOD dependingon the function of the clausal argument.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

500 501 502 503 504 505 506 507 508 509

510 511 512 513 514 515 516

517

518

519

520

521

Aber

KON

−−

es

PPER

nsn3

ist

VAFIN

3sis

übertrieben

ADJD

−−

zu

PTKZU

−−

sagen

VVINF

−−

,

$,

−−

damit

PROP

−−

bekäme

VVFIN

3skt

die

ART

nsf

FU

NE

nsf

erst

ADV

−−

eine

ART

asf

Identität

NN

asf

.

$.

−−

− HD HD HD HD − HD HD − HD HD − HD

NX

ON−MOD

VXFIN

HD

ADJX

PRED

VXINF

HD

PX

V−MOD

VXFIN

HD

NX

ON

ADVX

MOD

NX

OA

VF

LK

MF

SIMPX

OS

VC

NF

SIMPX

ON

KOORD

VF

LK

MF

NF

SIMPX

Vorfeld-es

The last type is a purely structural dummy element. It occurs in Vorfeld position onlyand is not correlated with any argument of the clause. It does not agree with the verbwhich becomes evident if there is a plural subject in the Mittelfeld, which is illustratedin the example tree below. It is ungrammatical in the Mittelfeld, e.g. *. . . dass es ihn dieVolker zahlen. Vorfeld-es is labelled ES to indicate its purely structural function. In thefirst release of TuBa-D/Z, 12/2003, Vorfeld-es was integrated by means of ON-MOD.

68

Page 70: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506 507

508 509 510 511 512 513

514

515

516

es

PPER

****

zahlen

VVFIN

3pis

ihn

PPER

asm3

die

ART

npn

Völker

NN

npn

,

$,

−−

deren

PRELAT

gp*

Menschenrechte

NN

npn

angeblich

ADJD

−−

verteidigt

VVPP

−−

werden

VAFIN

3pis

HD HD HD − HD − HD HD HD HD

NX

ES

VXFIN

HD

NX

OA

NX

ON

NX

ON

ADJX

MOD

VXINF

OV

VXFIN

HD

C

MF

VC

R−SIMPX

ON−MOD

VF

LK

MF

NF

SIMPX

Table 4.2 summarizes tests and labels for the different uses of es.

Table 4.2: Types of es

type referential formaltest pronoun argument correlate Vorfeld-es

substitutable yes no no noby other pronounsoptional no no yes no

correlates with no no yes noclausal argumentungrammatical no no no yesin Mittelfeld

edge label ON, OA, ON, OA ON-MOD, ESetc. OS-MOD

Es sei denn

The lexicalized phrase es sei denn, meaning außer, is analysed as a copula construction.

69

Page 71: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

500 501 502 503 504 505 506 507 508 509

510 511 512 513 514 515 516 517 518

519 520

521

522

523

"

$(

−−

Es

PPER

****

geschieht

VVFIN

3sis

hier

ADV

−−

nichts

PIS

***

,

$,

−−

es

PPER

nsn3

sei

VAFIN

3sks

denn

ADV

−−

,

$,

−−

ich

PPER

ns*1

tu

VVFIN

1sis

es

PPER

asn3

.

$.

−−

"

$(

−−

HD HD HD HD HD HD HD HD HD HD

NX

ES

VXFIN

HD

ADVX

V−MOD

NX

ON

NX

ON

VXFIN

HD

ADVX

MOD

NX

ON

VXFIN

HD

NX

OA

VF

LK

MF

VF

LK

MF

SIMPX

PRED

VF

LK

MF

NF

SIMPX

KONJ

SIMPX

KONJ

SIMPX

4.3 Determiner Phrases

Certain pronouns serving as determiners in noun phrases may be premodified, for in-stance, by degree adverbs such as in so viele Altere, gar kein Schutz, etc.

In the case of so viele Altere, the premodifying adverb so is attached to the indefinitepronoun viele. Together, they form a determiner phrase (DP), which is attached to thehead noun Altere on the same level:

0 1 2

500

501

502

so

ADV

−−

viele

PIDAT

ap*

Ältere

NN

ap*

HD

ADVX

− HD

DP

− HD

NX

0 1 2

500

501

502

gar

ADV

−−

kein

PIAT

nsm

Schutz

NN

nsm

HD

ADVX

− HD

DP

− HD

NX

70

Page 72: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

4.4 Prepositional Phrases

4.4.1 Prepositions

Considering prepositional phrases, it turns out to be appropriate not to annotate thepreposition as the head of the phrase. It is rather reasonable to annotate the complementwithin the prepositional phrase as the head. This decision facilitates the identification ofdependencies between verbs and their nominal complements and adjuncts. Moreover, itis in accordance with basic assumptions in Dependency Grammar .

0 1

500

501

in

APPR

d

Südpolen

NN

dsn

HD

NX

HD

PX

If the preposition is realized as a non-alphabetic sign, e.g. - (bis, gegen), this sign istagged as APPR and annotated like a preposition:

0 1 2 3 4

500 501

502 503

504

505

HSV

NE

nsm

BU

NE

nsn

APPR

a

Bramfelder

ADJA

***

SV

NN

asm

HD HD

NX

ADJX

− HD

NX

HD

NX

HD

PX

NX

Since pronominal adverbs (PROP) are pronominal forms of a prepositional phrase,they are directly projected to PX:

71

Page 73: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5

500 501 502 503 504 505

506 507 508 509

510

Freuden−thal

NE

nsf

Freudenthal

wollte

VMFIN

3sit

gestern

ADV

−−

nichts

PIS

***

dazu

PROP

−−

sagen

VVINF

−−

HD HD HD HD HD HD

NX

ON

VXFIN

HD

ADVX

V−MOD

NX

OA

PX

FOPP

VXINF

OV

VF

LK

MF

VC

SIMPX

In German, there are so-called Verschmelzungsformen, i.e. merged forms of a preposi-tion and a determiner, e.g. in dem Januar amalgamates to im Januar. The merged form isassigned the part-of-speech tag APPRART (including richer morphological annotation).In terms of syntax, it is annotated like a preposition:

0 1

500

501

Im

APPRART

dsm

Januar

NN

dsm

HD

NX

HD

PX

Prepositional phrases expressing intervals, e.g. with von/bis, von/bis zu or zwischen,are annotated in the same way as coordinate structures (cf. 6.5.1), i.e. without headassignment on the level of coordination, since the two phrases are assumed to be conjuncts.If two prepositions follow each other (e.g. bis zum), the result is an embedded structureof a prepositional phrase taking another preposition. The first preposition does herebynot receive a morphological case feature.

0 1 2 3 4

500 501

502 503

504 505

506

vom

APPRART

dsm

23.

ADJA

dsm

bis

APPR

a

25.

ADJA

asm

Juli

NN

asm

HD HD

ADJX

ADJX

− HD

NX

HD −

NX

HD

PX

KONJ

PX

KONJ

PX

72

Page 74: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500 501

502

503

zwischen

APPR

d

1993

CARD

−−

und

KON

−−

1994

CARD

−−

HD HD

NX

KONJ −

NX

KONJ

NX

HD

PX

0 1 2 3

500 501

502

503

504

bis

APPR

−−

zum

APPRART

dsn

Jahr

NN

dsn

2000

CARD

−−

HD HD

NX

APP

NX

APP

NX

HD

PX

HD

PX

As opposed to the case with two prepositions, intervals like dritter bis funfter Novem-ber are annotated as a coordinate attributive adjective phrase within a simple noun phrase(cf. 6.5.1).

Premodification of non-isolated prepositional phrases follows the general principle oflow attachment.

0 1 2 3 4

500 501 502

503

504

Irgendwo

ADV

−−

in

APPR

d

den

ART

dpm

Wäldern

NN

dpm

Schaumburgs

NE

gsn

HD − HD HD

NX

HD

NX

ADVX

− −

NX

HD

PX

There is one exception to the low attachment principle: isolated phrases in which apreceding adverb does not semantically modify the prepositional phrase. In this case theadverbial phrase is high attached to an additional level of PX.

73

Page 75: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2

500 501

502

503

Nun

ADV

−−

zum

APPRART

dsn

Wetter

NN

dsn

HD HD

NX

HD

ADVX

PX

HD

PX

4.4.2 Circumpositions and Postpositions

Circumpositions are treated as ternary branching prepositional phrases. The circumpo-sition on the left hand side is tagged as APPR and the circumposition on the right handside as APZR:

0 1 2

500

501

von

APPR

d

sich

PRF

ds*3

aus

APZR

−−

HD

NX

HD −

PX

Postpositions are tagged as APPO. The complement of the postposition occurs on theleft side and constitutes the head of the prepositional phrase:

0 1 2

500

501

Dem

ART

dsn

Vernehmen

NN

dsn

nach

APPO

d

− HD

NX

HD −

PX

4.5 Adjectival Phrases

We distinguish between attributive adjectives on the one hand and adverbial or pred-icative adjectives respectively on the other hand. Attributive adjectives are tagged as

74

Page 76: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

ADJA (die traditionellen Elemente) or CARD (20.15 Uhr), whereas adverbial or pred-icative adjectives are tagged as ADJD (das Gewicht ist gut; den betriebswirtschaftlichgunstigeren Standort) or PWAV (wie wirke ich).

The annotation of superlative and comparative forms is explained in section 7.1 onpage 130.

In general, German adjectives are inflected when they are an attribute of a noun. Theyare not inflected either when they function as a predicative adjective or a premodifier ofan adjective or an adverb or when they belong to a small class of noninflected adjectives,e.g. some ancient form such as gut Wetter or lieb Mutterlein or some adjectives denotinga colour (mit einer rosa Karte). All adjectives have to be projected to their phrase levelbefore they are attached to another phrase or to a field.

0 1 2

500

501

Die

ART

npn

traditionellen

ADJA

npn

Elemente

NN

npn

HD

ADJX

− HD

NX

0 1 2 3

500

501

502

mit

APPR

d

einer

ART

dsf

rosa

ADJA

dsf

Karte

NN

dsf

HD

ADJX

− HD

NX

HD

PX

0 1 2 3

500 501 502

503 504 505

506

Das

ART

nsn

Gewicht

NN

nsn

ist

VAFIN

3sis

gut

ADJD

−−

− HD HD HD

NX

ON

VXFIN

HD

ADJX

PRED

VF

LK

MF

SIMPX

75

Page 77: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500

501

502

den

ART

asm

betriebswirtschaftlich

ADJD

−−

günstigeren

ADJA

asm

Standort

NN

asm

HD

ADJX

− HD

ADJX

− HD

NX

0 1 2 3 4 5 6

500 501 502 503

504 505 506 507

508

509

Der

ART

nsm

männliche

ADJA

nsm

Trinker

NN

nsm

sei

VAFIN

3sks

gut

ADJD

−−

erforscht

VVPP

−−

,

$,

−−

HD HD HD HD

ADJX

− HD

VXFIN

HD

ADJX

V−MOD

VXINF

OV

NX

ON

VF

LK

MF

VC

SIMPX

A nominalized adjective like Fassbares might be premodified by an adverbial adjective(ADJD) instead of an attributive adjective (ADJA). The former ones do never inflect.

0 1

500

501

physisch

ADJD

−−

Faßbares

NN

asn

HD

ADJX

− HD

NX

Whenever an adjective is modified by another modifier, the same annotation strategyas for noun phrases is applied, i.e., the modifier is directly attached to the adjectivalphrase. The adjectival phrase as a whole is the premodifier of the noun phrase. Forinstance:

76

Page 78: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500

501

502

eine

ART

nsf

sehr

ADV

−−

gute

ADJA

nsf

Quelle

NN

nsf

HD

ADVX

− HD

ADJX

− HD

NX

0 1 2 3 4 5

500 501 502 503

504 505 506

507

508

aber

KON

−−

der

ART

nsm

Text

NN

nsm

ist

VAFIN

3sis

sehr

ADV

−−

abstrakt

ADJD

−−

− − HD HD HD

NX

ON

VXFIN

HD

ADVX

− HD

ADJX

PRED

KOORD

VF

LK

MF

SIMPX

The same holds if an adjective selects an argument. Fur die Weltgesellschaft is thefacultative argument of wesentlich. It is directly attached to the adjectival phrase.

0 1 2 3 4 5

500

501

502

503

Die

ART

nsf

für

APPR

a

die

ART

asf

Weltgesellschaft

NN

asf

wesentliche

ADJA

nsf

Unterscheidung

NN

nsf

− HD

NX

HD

PX

− HD

ADJX

− HD

NX

Premodifying adjectives may occur in a linear order and/or as a coordination (cf.6.5.1) of attributive adjectives:

77

Page 79: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500 501

502

503

28.

ADJA

nsm

und

KON

−−

29.

ADJA

nsm

Mai

NN

nsm

HD HD

ADJX

KONJ −

ADJX

KONJ

ADJX

− HD

NX

0 1 2 3 4 5

500 501 502

503

504

Die

ART

npf

großen

ADJA

npf

,

$,

−−

bekannten

ADJA

npf

serbischen

ADJA

npf

Oppositionsparteien

NN

npf

HD HD HD

ADJX

KONJ

ADJX

KONJ

ADJX

ADJX

− HD

NX

0 1 2 3 4 5

500 501 502

503

504

ihre

PPOSAT

asf

eigene

ADJA

asf

demokratische

ADJA

asf

und

KON

−−

freiheitliche

ADJA

asf

Tradition

NN

asf

HD HD HD

ADJX

KONJ −

ADJX

KONJ

ADJX

ADJX

− HD

NX

If the premodifying adjective is deverbal, the adjectival phrase can be of any com-plexity. In this case, the adjectival phrase has its own internal dependency structure.All elements which depend on the adjective are annotated as its premodifiers. Deverbaladjectives are either attributive or adverbial and predicative respectively, and occur asthe present participle or past participle form of a verb.

78

Page 80: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500

501

502

das

ART

asn

aktuell

ADJD

−−

diskutierte

ADJA

asn

Thema

NN

asn

HD

ADJX

− HD

ADJX

− HD

NX

0 1 2 3 4 5 6

500 501

502

503

504

Die

ART

nsf

teilweise

ADV

−−

in

APPR

a

die

ART

asf

Erde

NN

asf

gebaute

ADJA

nsf

Sporthalle

NN

nsf

HD − HD

NX

HD

ADVX

PX

− HD

ADJX

− HD

NX

In the following example, postmodification of an adjectival phrase is shown:

0 1 2

500 501

502

besser

ADJD

−−

als

KOKOM

−−

gut

ADJD

−−

HD − HD

ADJX

HD

ADJX

ADJX

4.6 Adverbial Phrases

Besides adverbials also negation particles (PTKNEG) project to an adverbial phrase.They either occur as premodifiers1 or postmodifiers or they are directly attached to afield.

1bis zu, uber are considered to be ADV rather than APPR because of their semantic meaning.

79

Page 81: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507 508

509 510 511 512 513

514

515

Doch

KON

−−

heute

ADV

−−

will

VMFIN

3sis

Ina

NE

dsf

Terre

NE

dsf

(

$(

−−

Hannelore

NE

dsf

Droege

NE

dsf

)

$(

−−

es

PPER

nsn3

nicht

PTKNEG

−−

so

ADV

−−

recht

ADJD

−−

munden

VVINF

−−

− HD HD − − − − HD HD HD HD

ADVX

V−MOD

VXFIN

HD

NX

APP

NX

APP

ADVX

− HD

VXINF

OV

NX

OD

NX

ON

ADVX

MOD

ADJX

V−MOD

KOORD

VF

LK

MF

VC

SIMPX

0 1 2 3

500 501 502

503

bis

ADV

−−

zu

ADV

−−

300.000

CARD

−−

Leute

NN

np*

HD HD HD

ADVX

ADVX

ADJX

− HD

NX

0 1 2

500 501

502

über

ADV

−−

350.000

CARD

−−

Auskünfte

NN

apf

HD HD

ADVX

ADJX

− HD

NX

0 1

500 501

502

heute

ADV

−−

abend

ADV

−−

HD HD

ADVX

HD

ADVX

ADVX

80

Page 82: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6

500 501 502 503 504

505 506 507 508

509

510

Der

ART

nsm

Fahrer

NN

nsm

konnte

VMFIN

3sit

nicht

PTKNEG

−−

mehr

ADV

−−

bremsen

VVINF

−−

.

$.

−−

− HD HD HD HD HD

NX

ON

VXFIN

HD

ADVX

HD

ADVX

VXINF

OV

ADVX

V−MOD

VF

LK

MF

VC

SIMPX

4.7 Verb Phrases

Whereas finite verb phrases are labelled VXFIN, non-finite verb phrases are labelledVXINF.

Since infinitives and past participles share certain properties (e.g. exchangeability inMan hat nur noch das eigene Herz schlagen horen/gehort.), they are assumed to carrythe same phrase label (VXINF). The finite verb in LK as well as the non-finite verbs inVC are always projected to their phrase level. All verb phrases of the verb complex areattached on the same level to form the verb complex. In order to follow the flat clusteringprinciple, no internal hierarchy of the verb complex is annotated.

4.7.1 Head of a Sentence and Verb Complex

The finite verb which can either appear in LK (verb-first clauses and verb-second clauses)or in VC (verb-final clauses), is always the head of the entire sentence. Non-finite verbalelements belong to VC. If the finite verb is located in LK and if there is more than onenon-finite element in VC, the non-finite element which is selected by the finite verb isdenoted as the head of VC. All other elements of VC are verbal objects. The head of VCselects the verbal object OV. This verbal object may select another verbal object OV,and so on. In order to denote the dependency relations between verbal objects withinthe verb complex, we attach a secondary edge label refvc between their phrase nodes.

4.7.2 Verb Complexes in Verb-second and Verb-final Clauses

The following example shows a verb-second clause with the head of the sentence in LKand a verb complex consisting of a single non-finite element.

81

Page 83: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505

506 507 508 509

510 511

512

513

Der

ART

nsm

überehrgeizige

ADJA

nsm

Bürgermeister

NN

nsm

will

VMFIN

3sis

die

ART

asf

Bergidylle

NN

asf

in

APPR

a

ein

ART

asn

Mekka

NE

asn

des

ART

gsm

Massentourismus

NN

gsm

verwandeln

VVINF

−−

HD HD − HD − HD − HD HD

ADJX

− HD

VXFIN

HD

NX

HD

NX

VXINF

OV

NX

ON −

NX

HD

NX

OA

PX

FOPP

VF

LK

MF

VC

SIMPX

If the verb complex comprises more than one immediate daughter, the one that isselected by the finite verb is the head of VC.

0 1 2 3 4 5 6

500 501 502 503 504

505 506 507 508

509

Es

PPER

nsn3

müsse

VMFIN

3sks

ein

ART

nsm

Buchungsfehler

NN

nsm

gewesen

VAPP

−−

sein

VAINF

−−

.

$.

−−

HD HD − HD HD HD

NX

ON

VXFIN

HD

NX

PRED

VXINF

OV

VXINF

HD

VF

LK

MF

VC

SIMPX

The following trees demonstrate verb complexes with two or more verbal objects. Thesecondary edge label refvc is pointing from the selecting OV to the depending OV.

503

0 1 2 3 4 5

500 501 502 503 504 505

506 507

508

Wenn

KOUS

−−

da

ADV

−−

was

PIS

***

gebucht

VVPP

−−

worden

VAPP

−−

ist

VAFIN

3sis

− HD HD HD HD HD

ADVX

MOD

NX

ON

VXINF

OV

VXINF

OV

VXFIN

HD

C

MF

VC

SIMPX

refvc

82

Page 84: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

505 506

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505 506 507 508

509 510

511

512

513

daß

KOUS

−−

auch

ADV

−−

sein

PPOSAT

nsm

Vater

NN

nsm

auf

APPR

a

Anregung

NN

asf

Andreottis

NE

gsm

umgebracht

VVPP

−−

worden

VAPP

−−

sein

VAINF

−−

könnte

VMFIN

3skt

− HD − HD HD HD HD HD HD HD

NX

HD

NX

VXINF

OV

VXINF

OV

VXINF

OV

VXFIN

HD

NX

HD

ADVX

MOD

NX

ON

PX

V−MOD

C

MF

VC

SIMPX

refvc refvc

If there is no finite verb at all, the rightmost element of the verb complex (if there ismore than one element) is annotated as the head of the sentence. This often occurs inheadlines (cf. 5.2 and 7.4).

0 1 2

500 501

502 503

504

Prachtwicken

NN

apf

gucken

VVINF

−−

.

$.

−−

HD HD

NX

OA

VXINF

HD

MF

VC

SIMPX

4.7.3 Ersatzinfinitiv Constructions

In order to indicate Ersatzinfinitiv constructions, two specific field node labels are intro-duced. VCE is the node label for the part of the verb complex consisting of the finiteverb which subcategorizes for the Ersatzinfinitiv. MFE is the node label for the secondpart of MF between VCE and the second part of the verb complex VC (e.g. [C die] [MFuns] [VCE hatten] [MFE mißtrauisch] [VC machen mussen]).

83

Page 85: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7

500 501 502 503 504 505 506

507 508 509

510

511

daß

KOUS

−−

Fischer

NE

nsm

und

KON

−−

ich

PPER

ns*1

dazu

PROP

−−

haben

VAFIN

1pis

beitragen

VVINF

−−

können

VMINF

−−

− HD HD HD HD HD HD

NX

KONJ −

NX

KONJ

VXFIN

HD

VXINF

OV

VXINF

HD

NX

ON

PX

OPP

C

MF

VCE

VC

SIMPX

0 1 2 3 4 5

500 501 502 503 504 505

506 507 508 509 510

511

die

PRELS

np*

uns

PPER

ap*1

hätten

VAFIN

3pkt

mißtrauisch

ADJD

−−

machen

VVINF

−−

müssen

VMINF

−−

HD HD HD HD HD HD

NX

ON

NX

OA

VXFIN

HD

ADJX

PRED

VXINF

OV

VXINF

HD

C

MF

VCE

MFE

VC

R−SIMPX

In the example below, the finite verb precedes the non-finite verbs although mussen isno Ersatzinfinitiv. Since its position corresponds to the position of the finite verb in realErsatzinfinitiv constructions and here also a second middle field is possible, we follow thesame annotation strategy.

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506 507 508

509 510 511

512

513

514

daß

KOUS

−−

die

ART

nsf

Nato

NE

nsf

sich

PRF

ds*3

doch

ADV

−−

noch

ADV

−−

ein

ART

asn

ganz

ADV

−−

neues

ADJA

asn

Konzept

NN

asn

wird

VAFIN

3sis

überlegen

VVINF

−−

müssen

VMINF

−−

− − HD HD HD HD HD HD HD HD

ADVX

− HD

VXFIN

HD

VXINF

OV

VXINF

HD

ADJX

− HD

NX

ON

NX

OD

ADVX

MOD

ADVX

MOD

NX

OA

C

MF

VCE

VC

SIMPX

84

Page 86: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

4.7.4 Infinitives with zu

Regarding infinitives with zu, zu determines the non-finiteness of the verb on its righthand side. This is the reason why zu is considered the head of the VXINF whereas theinfinitive is assumed to be the complement. Like other infinitives, they occur in the verbcomplex:

0 1 2 3 4 5 6 7 8 9 10

500 502 503 504 505513 514

501 507 508 509

506

510

511

512

Erkenntnisse

NN

npf

der

ART

gsf

Friedens−

TRUNC

−−

und

KON

−−

Konfliktforschung

NN

gsf

scheinen

VVFIN

3pis

ihm

PPER

dsm3

fremd

ADJD

−−

zu

PTKZU

−−

sein

VAINF

−−

.

$.

−−

HD HD HD HD HD −− HD

NX

KONJ −

NX

KONJ

VXFIN

HD

NX

OD

ADJX

PRED

VXINF

OV

NX

HD

NX

HD

NX

NX

ON

VF

LK

MF

VC

SIMPX

0 1 2 3 4 5 6 7

500 501 502 503 504

505 506 507 508

509

510

Über

APPR

a

Details

NN

apn

werde

VAFIN

3sks

noch

ADV

−−

zu

PTKZU

−−

verhandeln

VVINF

−−

sein

VAINF

−−

.

$.

−−

HD HD HD HD − HD

NX

HD

VXFIN

HD

ADVX

MOD

VXINF

OV

VXINF

HD

PX

OPP

VF

LK

MF

VC

SIMPX

The infinitive with zu can also be realized as an infix of the verb. In this case, the verbis tagged as VVIZU. Moreover, it is projected to VXINF with the grammatical functionHD:

85

Page 87: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8

500 501 502 503 504

505 506

507

508

509

um

KOUI

−−

neben

APPR

d

neuen

ADJA

dpn

Baugesetzen

NN

dpn

auch

ADV

−−

mehr

PIAT

***

Mitspracherechte

NN

apn

einzufordern

VVIZU

−−

.

$.

−−

− HD HD − HD HD

ADJX

− HD

VXINF

HD

NX

HD

PX

MOD

ADVX

MOD

NX

OA

C

MF

VC

SIMPX

Besides the examples above, the infinitive with zu occurs in optional (in most caseswith um zu) and obligatory infinitive clauses.

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505 506 507

508 509 510 511

512 513

514

515

Wenn

KOUS

−−

Angehörige

NN

np*

es

PPER

asn3

sich

PRF

dp*3

zur

APPRART

dsf

Lebensaufgabe

NN

dsf

machen

VVFIN

3pis

,

$,

−−

den

ART

asm

Kranken

NN

asm

zu

PTKZU

−−

kontrollieren

VVINF

−−

− HD HD HD HD HD − HD HD −

NX

HD

VXFIN

HD

NX

OA

VXINF

HD

NX

ON

NX

OS−MOD

NX

OD

PX

OPP

MF

VC

SIMPX

OS

C

MF

VC

NF

SIMPX

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505 506

507 508 509 510

511

512

513

um

KOUI

−−

Freunden

NN

dpm

zu

PTKZU

−−

sagen

VVINF

−−

,

$,

−−

daß

KOUS

−−

ihr

PPOSAT

nsm

Zug

NN

nsm

Verspätung

NN

asf

hat

VAFIN

3sis

− HD HD − − − HD HD HD

NX

OD

VXINF

HD

NX

ON

NX

OA

VXFIN

HD

C

MF

VC

SIMPX

OS

C

MF

VC

NF

SIMPX

86

Page 88: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Infinitive clauses can consist of only one verb complex:

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506

507 508 509 510 511

512 513

514 515

516

517

518

Er

PPER

nsm3

wendet

VVFIN

3sis

den

ART

asm

Blick

NN

asm

von

APPR

d

der

ART

dsf

Wand

NN

dsf

und

KON

−−

fängt

VVFIN

3sis

an

PTKVZ

−−

zu

PTKZU

−−

erzählen

VVINF

−−

.

$.

−−

HD HD − HD − HD HD VPT HD −

NX

ON

VXFIN

HD −

NX

HD

VXFIN

HD

VXINF

HD

NX

OA

PX

V−MOD

VC

LK

MF

SIMPX

OS

LK

VC

NF

FKONJ

KONJ −

FKONJ

KONJ

VF

FKOORD

SIMPX

4.7.5 Coherency and Incoherency of Verbal Constructions

The notion of coherency attributed to Bech (1955 57) covers the relation of dependencybetween adjacent verbal elements, i.e. the relation of subcategorization between a verband a non-finite verbal complement. Kiss (1995) calls this relation infinitive Komplemen-tation (non-finite complementation). Bech (1955 57) distinguishes between three differentmodi of obligatory and optional coherency:

1. verbs constructing coherently and incoherently, e.g. versprechen, versuchen

coherent, extraposition possible:a. [wie er mit kritischen politischen Gegenpositionen umzugehen versteht]incoherent, extraposition:b. [wie er versteht,][mit kritischen politischen Gegenpositionen umzugehen]

2. verbs constructing only coherently, e.g. wollen, mochten

coherent, no extrapostion possible:a. [wie er mit kritischen politischen Gegenpositionen umgehen will]b.*[wie er will mit kritischen politischen Gegenpositionen umgehen]

3. verbs constructing only incoherently, e.g. uberreden, uberzeugen

incoherent, extraposition obligatory:a. [wie er sie uberredet,][mit kritischen politischen Gegenpositionen umzugehen]b.*[wie er sie [mit kritischen politischen Gegenpositionen umzugehen] uberredet]

87

Page 89: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Coherent and incoherent constructions of verbs are annotated differently. In case ofcoherency, the verbal complement is part of the verb complex. In the clause wie er mitkritischen politischen Gegenpositionen umzugehen versteht, for instance, the infinitivewith zu is the verbal object of the finite verb. While in case of incoherency, the ver-bal complement is annotated as a sentential complement, i.e., mit kritischen politischenGegenpositionen umzugehen in the clause wie er sie uberredet, mit kritischen politischenGegenpositionen umzugehen is a sentential object in NF.

We define that a construction is incoherent, if extraposition in NF is possible. Thatis, whenever it is possible to shift the infinitival complement together with a constituentof MF, which it subcategorizes for, into NF, these elements are annotated as senten-tial objects. Therefore, the coherent example above (wie er mit kritischen politischenGegenpositionen umzugehen versteht) is annotated with a sentential object in MF sinceextraposition is possible (cf. the incoherent example 1.b.).

0 1 2 3 4 5 6 7

500 501 502 503 504 505

506 507 508

509

510

511

512

513

wie

KOUS

−−

er

PPER

nsm3

mit

APPR

d

kritischen

ADJA

dpf

politischen

ADJA

dpf

Gegenpositionen

NN

dpf

umzugehen

VVIZU

−−

versteht

VVFIN

3sis

− HD HD HD HD HD

ADJX

ADJX

− HD

VXINF

HD

VXFIN

HD

NX

HD

PX

OPP

MF

VC

NX

ON

SIMPX

OS

C

MF

VC

SIMPX

If a complement of the verb within the sentential object is located out of the sen-tence boundaries, e.g. in the C-field, the secondary edge label refcontr gives additionalinformation about the dependency relation (cf. 3.4.6).

4.7.6 AcI Constructions

AcI (accusativus cum infinitivo) verbs are a small group of verba sentiendi (e.g. sehen,horen, fuhlen, spuren) which subcategorize for an accusative and an infinitive. The verbslassen, machen, heißen have a modal verb like reading in which they also select anaccusative and an infinitive.

The infinitive itself subcategorizes for complements with respect to its valency but itssubject is realized by an accusative which is the direct object of the AcI verb.

Since AcI constructions are coherent infinitive constructions in which extraposition isnot possible (cf. (Eisenberg 1999 2001), p.355), the AcI is not annotated as a sententialobject (* wenn man nur noch hort das eigene Herz schlagen). The infinitive as the verbalobject of the AcI verb is located in the verb complex and the accusative is realized as OAin MF.

88

Page 90: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505 506

507 508

509

510

Wenn

KOUS

−−

man

PIS

ns*

nur

ADV

−−

noch

ADV

−−

das

ART

asn

eigene

ADJA

asn

Herz

NN

asn

schlagen

VVINF

−−

hört

VVFIN

3sis

− HD HD HD HD HD HD

ADJX

− HD

VXINF

OV

VXFIN

HD

NX

ON

ADVX

MOD

ADVX

MOD

NX

OA

C

MF

VC

SIMPX

As a consequence of this analysis, we annotate two accusative objects (OA) if theAcI construction comprises a transitive infinitive verb such as beenden in the followingexample. Uns functions as its subject and die Diskussion as its direct object. Both arein accusative case and both are labelled OA.

0 1 2 3 4 5 6

500 501 502 503 504 505

506 507 508

509

Lassen

VVFIN

3pis

Sie

PPER

np*3

uns

PPER

ap*1

die

ART

asf

Diskussion

NN

asf

jetzt

ADV

−−

beenden

VVINF

−−

HD HD HD − HD HD HD

VXFIN

HD

NX

ON

NX

OA

NX

OA

ADVX

MOD

VXINF

OV

LK

MF

VC

SIMPX

4.7.7 Imperatives

Imperative verbs have only one singular and one plural form and are not inflected concern-ing the grammatical category person. Their form corresponds to second person singularand plural verbs which are tagged as VVIMP or VAIMP.

Warte mal! (warte/VVIMP:s)instead ofWartest du mal? (wartest/VVFIN:2sis)

It is important to keep apart imperative sentences from imperative verbs. An im-perative sentence does not need to comprise an imperative verb form as is shown in thefollowing examples

Warten Sie mal bitte! (warten/VVFIN:3pis)Bitte warten! (warten/VVINF:–)

89

Page 91: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4

500 501 502

503 504

505

506

(

$(

−−

vgl.

VVIMP

s

Seite

NN

asf

32

CARD

−−

)

$(

−−

HD HD HD

VXFIN

HD

NX

HD

NX

NX

OA

LK

MF

SIMPX

Normally, imperative verbs are lacking the subject, but the addressed person can alsobe mentioned to stress the utterance:

0 1 2

500 501

502 503

504

Maikäfer

NN

nsm

flieg

VVIMP

s

...

$(

−−

HD HD

NX

DM

VXFIN

HD

LK

SIMPX

4.7.8 Particle Verbs

Separable verb particles are tagged as PTKVZ and annotated with the edge label VPT:

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505

506 507 508

509 510

511

Auch

ADV

−−

die

ART

npm

Vertreter

NN

npm

der

ART

gsf

AfB

NE

gsf

stimmten

VVFIN

3pit

den

ART

dpf

86

CARD

−−

Millionen

NN

dpf

zu

PTKVZ

−−

.

$.

−−

HD − HD − HD HD HD VPT

ADVX

NX

HD

NX

VXFIN

HD −

ADJX

− HD

NX

ON

NX

OD

VF

LK

MF

VC

SIMPX

90

Page 92: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

In verb-final clauses, the particle verb occurs unseparated within the verb complex:

0 1 2 3 4 5 6

500 501 502 503 504 505

506 507 508 509

510

Rußland

NE

nsn

wollte

VMFIN

3sit

bislang

ADV

−−

einer

ART

dsf

UN−Resolution

NN

dsf

nur

ADV

−−

zustimmen

VVINF

−−

HD HD HD − HD HD HD

NX

ON

VXFIN

HD

ADVX

MOD

NX

OD

ADVX

MOD

VXINF

OV

VF

LK

MF

VC

SIMPX

4.7.9 Verbs with Predicate

Typically, the complement type PRED (predicate) occurs with verbs like sein, haben,scheinen, aussehen, sich anhoren, klingen, etc. PRED is annotated, if the followingconditions apply:

• if it is not possible to determine the case of the constituent in question properly(e.g. gut in Das ist gut.)

• if the constituent in question actually predicates the subject, i.e. the subject ischaracterized as having the property expressed by PRED (e.g. in Die Ursache warunklar. Die Ursache is characterized by the property of being unclear)

• many PRED verbs are raising-verbs (subject without theta-role)

• if als-phrases are selected by the verb they are labelled as PRED (e.g. Unter demMotto Kino-Extrem agiert der Regisseur als Filmjockey.)

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504

505 506 507

508

509

510

Unter

APPR

d

dem

ART

dsn

Motto

NN

dsn

"

$(

−−

Kino−Extrem

NN

dsn

"

$(

−−

agiert

VVFIN

3sis

der

ART

nsm

Regisseur

NN

nsm

als

KOKOM

−−

Filmjockey

NN

nsm

− HD HD HD − HD − HD

NX

APP

NX

APP

VXFIN

HD

NX

ON

NX

PRED

NX

HD

PX

V−MOD

VF

LK

MF

SIMPX

91

Page 93: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Some examples for verbs that take predicates: recht sein, recht haben, leid tun, freisein, fertig sein, sich gut/schlecht treffen, gut/schlecht finden, etc.

PRED verbs have to be distinguished carefully from verbs occurring with ordinarymodifiers (e.g. ON-MOD, V-MOD) such as gut passen.

With respect to topological fields, note that PRED usually marks the border betweenMF and NF, i.e., whatever constituent occurs on the right hand side of PRED belongs toNF. In general, this constituent is an adjunct which PRED does not subcategorize for:

0 1 2 3

500 501 502 503

504 505 506 507

508

Das

PDS

nsn

ist

VAFIN

3sis

Politik

NN

nsf

hier

ADV

−−

HD HD HD HD

NX

ON

VXFIN

HD

NX

PRED

ADVX

ON−MOD

VF

LK

MF

NF

SIMPX

0 1 2 3 4 5

500 501 502 503

504 505 506 507

508

509

es

PPER

nsn3

ist

VAFIN

3sis

kalt

ADJD

−−

an

APPR

d

diesem

PDAT

dsm

Tag

NN

dsm

HD HD HD − HD

NX

ON

VXFIN

HD

ADJX

PRED −

NX

HD

PX

V−MOD

VF

LK

MF

NF

SIMPX

But there are exceptions in which PRED does not necessarily constitute the borderbetween MF and NF:

• Another constituent may occur between PRED and VC, for instance, if an ambigu-ous modifier follows PRED.

92

Page 94: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504

505 506 507 508 509

510

511

das

PRELS

nsn

an

APPR

d

diesem

PDAT

dsm

Abend

NN

dsm

der

ART

nsm

schönste

ADJA

nsm

Platz

NN

nsm

im

APPRART

dsn

All

NN

dsn

war

VAFIN

3sit

.

$.

−−

HD − HD HD HD HD

NX

ON −

NX

HD −

ADJX

− HD −

NX

HD

VXFIN

HD

PX

V−MOD

NX

PRED

PX

MOD

C

MF

VC

R−SIMPX

• PRED subcategorizes for the constituent that follows it. Complements of PREDsare always attached to a field since they are assigned a grammatical function withinthe sentence structure (cf. 8.1):

0 1 2 3 4 5 6 7

500 501 502 503 504

505 506 507

508 509

510

Einer

PIS

nsm

meiner

PPOSAT

gpm

Freunde

NN

gpm

wurde

VAFIN

3sit

süchtig

ADJD

−−

nach

APPR

d

Nachrichten

NN

dpf

.

$.

−−

HD − HD HD HD HD

NX

HD

NX

VXFIN

HD −

NX

HD

NX

ON

ADJX

PRED

PX

FOPP

VF

LK

MF

SIMPX

0 1 2 3

500 501 502 503

504 505 506

507

ich

PPER

ns*1

bin

VAFIN

1sis

froh

ADJD

−−

darum

PROP

−−

HD HD HD HD

NX

ON

VXFIN

HD

ADJX

PRED

PX

FOPP

VF

LK

MF

SIMPX

• Because of the word order rule that pronouns in MF have to precede other con-stituents, PRED might not be the last element in MF if it is a pronoun:

93

Page 95: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6

500 501 502 503 504 505

506 507 508 509

510

Bravo

NN

nsf

kann

VMFIN

3sis

es

PPER

nsn3

nicht

PTKNEG

−−

gewesen

VAPP

−−

sein

VAINF

−−

.

$.

−−

HD HD HD HD HD HD

NX

ON

VXFIN

HD

NX

PRED

ADVX

MOD

VXINF

OV

VXINF

HD

VF

LK

MF

VC

SIMPX

0 1 2 3

500 501 502 503

504 505 506

507

er

PPER

nsm3

war

VAFIN

3sit

es

PPER

nsn3

nicht

PTKNEG

−−

HD HD HD HD

NX

ON

VXFIN

HD

NX

PRED

ADVX

MOD

VF

LK

MF

SIMPX

4.7.10 Modal Verbs

Modal verbs are always tagged as VMFIN or VMINF regardless of their use as an auxiliaryor a main verb. If a modal verb functions as an auxiliary verb, it is projected like anyother auxiliary verb. If a modal verb is the main verb of a sentences, verbal modifiersrefer to the modal verb in the same way as they refer to other main verbs:

0 1 2 3 4 5 6 7

500 501 502 503

504 505 506 507

508

"

$(

−−

Die

PDS

np*

wollten

VMFIN

3pit

die

ART

asf

BLG

NE

asf

schonen

VVINF

−−

.

$.

−−

"

$(

−−

HD HD − HD HD

NX

ON

VXFIN

HD

NX

OA

VXINF

OV

VF

LK

MF

VC

SIMPX

94

Page 96: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505 506 507 508

509 510 511

512

Hätte

VAFIN

3skt

sie

PPER

nsf3

sich

PRF

ds*3

das

PDS

asn

nicht

PTKNEG

−−

alles

PIS

asn

vorher

ADV

−−

überlegen

VVINF

−−

können

VMINF

−−

?

$.

−−

HD HD HD HD HD HD HD HD HD

VXFIN

HD

NX

ON

NX

OD

NX

OA

ADVX

MOD

NX

OA−MOD

ADVX

V−MOD

VXINF

OV

VXINF

HD

LK

MF

VC

SIMPX

0 1 2 3 4

500 501 502 503

504 505 506

507

508

Warum

PWAV

−−

Daewoo

NE

ns*

nach

APPR

d

Bremen

NE

dsn

mußte

VMFIN

3sit

HD HD HD HD

PX

V−MOD −

NX

HD

VXFIN

HD

NX

ON

PX

OPP

C

MF

VC

SIMPX

95

Page 97: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 5

Attachment Principles for Phrases

5.1 Attachment to Fields

Phrases are attached to the topological field in which they occur. Their edge labels denotetheir grammatical function within the sentence structure. In LK and VC there can onlyoccur verb forms, separable verbal prefixes, or infinitive particles. LK and VC mark thebeginning and the end of MF (cf. 3.2).

5.2 Attachment of Ambiguous Complements

The partially free word order and the morphological properties of German can causeambiguity concerning the grammatical function of a constituent. In the following example,the syntactic structure does not give any information about case assignment. Both nounphrases can be identified as ON or OA:

0 1 2 3 4 5 6 7 8

500 501 502 503

504 505 506

507

508

509

Ein

ART

asn

Bad

NN

asn

in

APPR

d

der

ART

dsf

Menge

NN

dsf

verhindert

VVFIN

3sis

das

ART

nsn

Sicherheitsgitter

NN

nsn

.

$.

−−

− HD − HD HD − HD

NX

HD

VXFIN

HD

NX

ON

NX

HD

PX

NX

OA

VF

LK

MF

SIMPX

Headlines like the following are lacking the finite verb. Therefore, in the first exampleit cannot be decided if it is an active or a passive construction, i.e., if the noun phrase isON or OA. The second example is an active construction, but again the noun phrase canbe both, ON or OA:

96

Page 98: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500 501

502 503

504

Kriegsverbrecher

NN

npm

verurteilt

VVPP

−−

HD HD

NX

ON

VXINF

HD

MF

VC

SIMPX

0 1 2

500 501

502 503

504

Prachtwicken

NN

apf

gucken

VVINF

−−

.

$.

−−

HD HD

NX

OA

VXINF

HD

MF

VC

SIMPX

Since we do not assign specific edge labels for ambiguous complements, we formulatethe following preference principle for case assignment:

Preference principle for case assignment:If case assignment is ambiguous, we decide on the more plausible grammatical function

and on the more plausible sequence of grammatical functions respectively. The maincriteria for the decision are the unmarked word order and the semantic content.

Therefore, in the first example above, OA appears in VF whereas ON has its positionin MF. For elliptical headlines, we assume a passive construction if the verb in VC is apast participle and an active construction if the verb in VC is an infinitive (cf. 4.7.2 and7.4).

5.3 Modifier Attachment

Modifiers either modify one specific constituent or more than one constituent. The scopeof modification can even range over the whole sentence structure. Therefore, they areeither unambiguous or ambiguous. An unambiguous constituent that modifies just oneother constituent within a tree structure is either adjacent or discontinuous. In thefirst case, it is immediately attached to the constituent which it modifies, concerningthe attachment rules for phrases. In the second case, the dependency, which can even gobeyond the border of topological fields, is indicated by X-MOD edge labels, which expressthe non-ambiguity of the modifier (e.g. OA-MOD is the modifier of OA). Thus, edgelabels like OA-MOD, V-MOD, OPP-MOD, MOD-MOD, etc. express that the respectiveconstituent modifies only one other constituent in the sentence (OA, V, OPP, a modifier,etc.) which is not adjacent:

97

Page 99: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505

506 507 508 509

510

511

Für

APPR

a

diese

PDAT

asf

Behauptung

NN

asf

hat

VAFIN

3sis

Beckmeyer

NE

nsm

bisher

ADV

−−

keinen

PIAT

asm

Nachweis

NN

asm

geliefert

VVPP

−−

.

$.

−−

− HD HD HD HD − HD HD

NX

HD

VXFIN

HD

NX

ON

ADVX

MOD

NX

OA

VXINF

OV

PX

OA−MOD

VF

LK

MF

VC

SIMPX

If a modifying constituent is ambiguous (i.e. it modifies more than one constituent,the entire sentence, or a constituent that occurred in previous sentences), it is attachedto its topological field and given the ambiguous edge label MOD to preserve ambiguity.In the following example an der Uni either modifies the accusative object den Entwick-lungsprozeß or the verb fortsetzen:

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505 506

507 508 509 510

511 512

513

Phantasievoll

ADJD

−−

und

KON

−−

energisch

ADJD

−−

will

VMFIN

3sis

er

PPER

nsm3

den

ART

asm

Entwicklungsprozeß

NN

asm

an

APPR

d

der

ART

dsf

Uni

NN

dsf

fortsetzen

VVINF

−−

HD HD HD HD − HD − HD HD

ADJX

KONJ −

ADJX

KONJ

VXFIN

HD −

NX

HD

VXINF

OV

ADJX

V−MOD

NX

ON

NX

OA

PX

MOD

VF

LK

MF

VC

SIMPX

We formulate the following definitions for MOD and X-MOD:

Definition of MOD:A constituent is called MOD, if it cannot be assigned a more specific label, either

because it is ambiguous or because there is no more specific label (e.g. for sentencemodifiers or for constituents that refer to some sentence external expression). Sometimesit is difficult to determine whether a modifier is definite or not. In cases of doubt, modifiersare marked as ambiguous (MOD) rather than as definite modifiers.

Definition of X-MOD:X is a variable that can be replaced by labels for syntactic categories like OA, OPP,

MOD, V. X-MOD marks long-distance modification which is unambiguous, e.g. relativeclauses (Aber es gabe (intelligente Losungen OA), (die kein Geld kosten OA-MOD)).

Typical MODs and V-MODs:Generally, modifying subclauses (e.g. Katastrophenstimmung herrscht erst, [wenn nichts

mehr zu verheimlichen ist] (MOD).) are MOD because they modify the complete main

98

Page 100: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

clause. Modifying particles and adverbs like da, dann, auch, eigentlich, ja, vielleicht,auch, naturlich usually show attachment ambiguity and therefore are annotated as MOD.Only if they unambiguously express the modification of the verb (e.g. Das Buch liegt da.or Er geht auch.) they carry the edge label V-MOD. Pronominal adverbs (PROP) likedabei, dafur, trotzdem, deswegen, hierauf, etc. are either ambiguous (e.g. Dabei (MOD)erscheinen Sie in anderen Verlagen.) or unambiguous [e.g. Er achtet dabei (V-MOD) aufalles.). Non-pronominal adverbs such as vorher, spater, etc. in most cases give temporalor local information. Thus, they are rather V-MOD than MOD.

5.3.1 Modifier Attachment in the Initial Field

Since only one constituent is allowed in the initial field, all elements preceding and fol-lowing the head are attached as premodifiers (low attachment) or postmodifiers (highattachment) according to the attachment rules explained in 4.1.

509

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505

506 507 508

509 510

511 512

513

Auch

ADV

−−

für

APPR

a

Rumänien

NE

asn

selbst

ADV

−−

ist

VAFIN

3sis

der

ART

nsm

Papst−Besuch

NN

nsm

von

APPR

d

großer

ADJA

dsf

Bedeutung

NN

dsf

.

$.

−−

HD HD HD HD − HD HD

NX

HD

ADVX

VXFIN

HD

ADJX

− HD

ADVX

− −

NX

HD −

NX

HD

PX

MOD

NX

ON

PX

PRED

VF

LK

MF

SIMPX

refint

5.3.2 Attachment across Punctuation Marks

The punctuation marks : and - and ... separate a syntactic construction within a unitunless there is no syntactic dependency relation between the two parts (cf. 3.4.5) like inthe following:

0 1 2 3 4 5 6 7

500 501 502 503 504

505 506 507

508 509

ASB

NN

nsm

lädt

VVFIN

3sis

ein

PTKVZ

−−

:

$.

−−

Tag

NN

nsm

der

ART

gsf

offenen

ADJA

gsf

Tür

NN

gsf

HD HD VPT HD HD

NX

ON

VXFIN

HD −

ADJX

− HD

VF

LK

VC

SIMPX

NX

HD

NX

NX

99

Page 101: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5

500 501

502

Sein

PPOSAT

nsn

Zuhause

NN

nsn

:

$.

−−

stilvolles

ADJA

nsn

Entertainment

NN

nsn

.

$.

−−

− HD

NX

HD

ADJX

− HD

NX

Attachment is necessary if the part following the punctuation mark has a grammaticalfunction within the sentence structure:

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504

505 506 507 508 509

510

511

512

Er

PPER

nsm3

meinte

VVFIN

3sit

:

$.

−−

$(

−−

das

PDS

nsn

ist

VAFIN

3sis

meine

PPOSAT

nsf

Geschichte

NN

nsf

$(

−−

.

$.

−−

"

$(

−−

HD HD HD HD − HD

NX

ON

VXFIN

HD

NX

ON

VXFIN

HD

NX

PRED

VF

LK

MF

SIMPX

OS

VF

LK

NF

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

500 501 502 503 504 505 506 507

508 509 510 511

512 513

514

515

516

Doch

KON

−−

Zweifel

NN

npm

blieben

VVFIN

3pit

$(

−−

sowohl

KON

−−

bei

APPR

d

Joergensen

NE

dsm

selbst

ADV

−−

als

KON

−−

auch

ADV

−−

in

APPR

d

der

ART

dsf

Redaktion

NN

dsf

der

ART

gsf

taz

NE

gsf

.

$.

−−

− HD HD HD HD HD − HD − HD

NX

ON

VXFIN

HD

NX

HD

ADVX

NX

HD

NX

NX

HD −

NX

HD

PX

KONJ −

ADVX

PX

KONJ

PX

V−MOD

KOORD

VF

LK

NF

SIMPX

5.3.3 Ambiguous Modifiers in Isolated Phrases

Since isolated phrases (cf. 3.4.5) do not consist of topological fields, ambiguous modifiers(MOD) have to be attached to the phrase itself. The isolated phrase is projected onelevel higher and the modifier is attached on this higher level. Thus, the information aboutambiguity can be preserved even without topological fields or explicit MOD labelling, justby the existence of yet another projection level of the phrase.

The overall attachment strategy has been chosen in order to keep syntactic structure

100

Page 102: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

flat and to be able to preserve attachment ambiguity where necessary.

In the following examples, so may refer to something that is implicit or has beenmentioned before:

0 1

500 501

502

so

ADV

−−

Winkler

NE

nsf

HD HD

ADVX

NX

HD

NX

If there is more than one ambiguous modifier in an isolated phrase, all of them areattached on the next higher level. The mother node of this isolated phrase is markedwith the node label of the modified phrase.

0 1 2 3

500 501 502

503

Zunächst

ADV

−−

natürlich

ADV

−−

Durcheinander

NN

nsn

.

$.

−−

HD HD HD

ADVX

ADVX

NX

HD

NX

0 1 2 3 4

500 501 502 503

504

vielleicht

ADV

−−

mal

ADV

−−

ein

ART

nsm

Mini−Hit

NN

nsm

da

ADV

−−

HD HD − HD HD

ADVX

ADVX

NX

HD

ADVX

NX

101

Page 103: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 6

The Annotation of Sentences

The approach of topological fields supports the flat clustering principle inasmuch MFand NF allow for more than one constituent being attached to the same field node. Thefield nodes form a level of annotation between the phrase level and the sentence level.The last step to complete a sentence structure is to attach the field nodes to the highestannotation level of the whole structure: the root node.

In the following sections, the annotation of sentence structures will be demonstrated.

6.1 Sentence Initial Fields

6.1.1 The C-Field in Verb-Final Clauses

The C-field (complementizer field) is the field for subordinating conjunctions KOUS (e.g.daß, wenn, da, weil, ob), KOUI (e.g. um (+zu)), relative pronouns (PRELS), interrogative(PWAV) pronouns and (complex) interrogative or relative phrases. Thus, it only occursin verb-final clauses.

In case of a conjunction, we directly project to the C-field:

503

0 1 2 3 4 5

500 501 502 503 504 505

506 507

508

Wenn

KOUS

−−

da

ADV

−−

was

PIS

***

gebucht

VVPP

−−

worden

VAPP

−−

ist

VAFIN

3sis

− HD HD HD HD HD

ADVX

MOD

NX

ON

VXINF

OV

VXINF

OV

VXFIN

HD

C

MF

VC

SIMPX

refvc

There are conjunctions in German which consist of two elements (e.g. so daß and alsob). Both of them are also directly attached to the C-field, while none of them carries ahead label.

102

Page 104: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505

506 507

508

509

510

so

KOUS

−−

daß

KOUS

−−

der

ART

nsm

Maschinenpark

NN

nsm

heute

ADV

−−

für

APPR

a

lukrative

ADJA

apf

Sonderanfertigungen

NN

apf

unbrauchbar

ADJD

−−

ist

VAFIN

3sis

.

$.

−−

− − − HD HD HD HD HD

ADJX

− HD

VXFIN

HD

NX

HD

NX

ON

ADVX

V−MOD

PX

PRED−MOD

ADJX

PRED

C

MF

VC

SIMPX

Since C generally does not contain more than one constituent, the adverb auch in thefollowing example is not supposed to occur in the C-field together with the conjunctionwenn. The wenn-clause is annotated as the modifier of the adverbial phrase auch, i.e.,the adverbial phrase subcategorizes for the verb-final clause.

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507 508

509 510 511

512 513

514

515

516

auch

ADV

−−

,

$,

−−

wenn

KOUS

−−

man

PIS

ns*

wie

KOKOM

−−

ARD

NE

nsf

und

KON

−−

ZDF

NE

nsn

relativ

ADJD

−−

viele

PIDAT

apn

Teams

NN

apn

überall

ADV

−−

postieren

VVINF

−−

kann

VMFIN

3sis

HD − HD HD HD HD HD HD HD

NX

KONJ −

NX

KONJ

ADJX

− HD

VXINF

OV

VXFIN

HD

NX

HD

DP

− HD

NX

ON

NX

MOD

NX

OA

ADVX

V−MOD

C

MF

VC

ADVX

HD

SIMPX

ADVX

If the constituent in the C-field is a pronoun or a complex phrase, it is first projectedto the phrase level and then projected to the C-field. The edge label below the C-Fielddenotes the grammatical function of this constituent.

0 1 2 3 4

500 501 502 503 504

505 506 507

508

Wieviel

PWS

nsn

da

ADV

−−

monatlich

ADJD

−−

fällig

ADJD

−−

wird

VAFIN

3sis

HD HD HD HD HD

NX

ON

ADVX

MOD

ADJX

V−MOD

ADJX

PRED

VXFIN

HD

C

MF

VC

SIMPX

103

Page 105: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8

500 501 502 503 504

505 506 507

508 509

510

zu

APPR

d

deren

PRELAT

gp*

Ablaß

NN

dsm

die

ART

nsf

tonale

ADJA

nsf

Ebene

NN

nsf

natürlich

ADV

−−

nicht

PTKNEG

−−

ausreicht

VVFIN

3sis

− HD HD HD HD HD

NX

HD −

ADJX

− HD

VXFIN

HD

PX

FOPP

NX

ON

ADVX

MOD

ADVX

MOD

C

MF

VC

R−SIMPX

6.1.2 The KOORD-Field in all Clause Types

The KOORD-field is optionally the left-most field of all clause types (V-1, V-2, V-end).Therefore, it can only occur at the beginning of a syntactic unit (cf. 3.4.3).

For verb-second clauses, it can be regarded as an alternative field to the PARORD-field. The KOORD-field contains coordinative particles like und, oder, aber, etc. (cf.Hohle (1986)). Here are two examples of different clause types:

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505 506

507 508 509 510

511

512

513

Und

KON

−−

Koring

NE

nsm

war

VAFIN

3sit

früher

ADV

−−

einmal

ADV

−−

in

APPR

a

schiefes

ADJA

asn

Licht

NN

asn

geraten

VVPP

−−

− HD HD HD HD HD HD

NX

ON

VXFIN

HD

ADJX

− HD

VXINF

OV

NX

HD

ADVX

V−MOD

ADVX

MOD

PX

OPP

KOORD

VF

LK

MF

VC

SIMPX

0 1 2 3 4 5

500 501 502 503 504

505 506

507

Oder

KON

−−

ist

VAFIN

3sis

Bremerhaven

NE

nsn

nicht

PTKNEG

−−

günstiger

ADJD

−−

?

$.

−−

− HD HD HD HD

VXFIN

HD

NX

ON

ADVX

MOD

ADJX

PRED

KOORD

LK

MF

SIMPX

104

Page 106: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.1.3 The PARORD-Field in Verb-Second Clauses

PARORD is an alternative field to KOORD for verb-second clauses only. Typical PARORDexpressions are denn, weil1:

0 1 2 3 4 5

500 501 502 503 504

505 506 507

508

509

Denn

KON

−−

auch

ADV

−−

die

PDS

np*

gehen

VVFIN

3pis

davon

PROP

−−

aus

PTKVZ

−−

− HD HD HD VPT

ADVX

− HD

VXFIN

HD

PX

OPP

NX

ON

PARORD

VF

LK

MF

VC

SIMPX

6.1.4 Resumptive Constructions: The LV-Field

Resumptive constructions are analysed as suggested by Hohle (1986) and Kathol (1995),by using the field LV (Linksversetzung) which is located to the left of VF. In general,the LV-field is not restricted to one constituent. The typical feature of a resumptiveconstruction is that there is a (pronominal) constituent somewhere in the sentence, onthe right hand side of the LV-field, which refers back to the expression within the LV-field.Therefore, we use the X-MOD label to indicate this kind of long-distance dependency.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

500 501 502 503 504 505 506 507 508 509

510 511 512 513 514 515

516 517

518

519

520

Vom

APPRART

dsm

introvertierten

ADJA

dsm

Einzelgänger

NN

dsm

zum

APPRART

dsm

stilbildenden

ADJA

dsm

Popstar

NN

dsm

,

$,

−−

das

PDS

nsn

muß

VMFIN

3sis

ja

ADV

−−

auch

ADV

−−

erst

ADV

−−

einmal

ADV

−−

verkraftet

VVPP

−−

werden

VAINF

−−

HD HD HD HD HD HD HD HD HD HD

ADJX

− HD

ADJX

− HD

NX

ON

VXFIN

HD

ADVX

MOD

ADVX

MOD

ADVX

MOD

ADVX

MOD

VXINF

OV

VXINF

HD

NX

HD −

NX

HD

PX

KONJ

PX

KONJ

PX

ON−MOD

LV

VF

LK

MF

VC

SIMPX

Grammatical functions within a LV-construction are assigned according to the follow-ing principle:

• The LV-constituent is licensed by some (pronominal) constituent within the coresentence. The core sentence exceeds from VF to NF. Therefore, the licensing con-stituent is considered to be modified by the constituent within the LV-field.

1weil can occur in verb-second and in verb-final clauses. In the first case, it is in the PARORD-field,in the latter case, it belongs to the C-field.

105

Page 107: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

For instance, ON-MOD is licensed by ON like in the example above, which is alsoin strong accordance with the assumption that the original position of the subject inverb-second clauses is VF.

In constructions with wenn ... dann ..., the wenn-clause, which is semantically a pre-condition to the dann-clause, is in the LV-field in correlation with dann. Therefore, dann(MOD) refers back to the wenn-clause (MOD-MOD):

503 516

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506 507 508 509 510

511 512 513 514 515

516 517

518

519

Wenn

KOUS

−−

da

ADV

−−

was

PIS

***

gebucht

VVPP

−−

worden

VAPP

−−

ist

VAFIN

3sis

,

$,

−−

dann

ADV

−−

ist

VAFIN

3sis

das

PDS

nsn

nicht

PTKNEG

−−

in

APPR

d

Ordnung

NN

dsf

− HD HD HD HD HD HD HD HD HD HD

ADVX

MOD

NX

ON

VXINF

OV

VXINF

OV

VXFIN

HD

ADVX

MOD

VXFIN

HD −

NX

HD

C

MF

VC

NX

ON

ADVX

MOD

PX

PRED

SIMPX

MOD−MOD

LV

VF

LK

MF

SIMPX

refvc

refmod

If dann is not present in the matrix clause , the wenn-clause occurs in VF. In this case,the wenn-clause is labelled as MOD because there is no explicit correlating constituent.It rather refers to the whole matrix clause, e.g. (Wenn da was gebucht worden ist (MOD),ist das nicht in Ordnung.)

6.2 Questions

6.2.1 W-Questions

In general, w-questions are verb-second clauses with interrogative pronouns in VF. Theproblem here is to decide on the syntactic category of the interrogative phrase.

We follow the strategy to assign PX to all PWAVs, which compositionally comprisea preposition such as wobei, wofur, wogegen, woher, womit, woran, worauf, wovon, wozuand also to causal PWAVs such as warum, wieso, weshalb. The (non-compositional)PWAVs wann, wo are analysed as ADVX.

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505 506

507 508 509

510

Warum

PWAV

−−

machen

VVFIN

1pis

wir

PPER

np*1

den

ART

asm

Computer

NN

asm

nicht

PTKNEG

−−

einfach

ADV

−−

aus

PTKVZ

−−

?

$.

−−

HD HD HD − HD HD HD VPT

PX

V−MOD

VXFIN

HD

NX

ON

NX

OA

ADVX

MOD

ADVX

MOD

VF

LK

MF

VC

SIMPX

106

Page 108: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.2.2 Yes - No Questions

Yes - no questions may occur in various forms, but the most typical form is the verb-firstclause:

0 1 2 3 4

500 501 502

503 504

505

Veruntreute

VVFIN

3sit

die

ART

nsf

AWO

NN

nsf

Spendengeld

NN

asn

?

$.

−−

HD − HD HD

VXFIN

HD

NX

ON

NX

OA

LK

MF

SIMPX

Otherwise, a question mark at the end of a verb-second or verb-final clause indicatesthat it is actually meant as a question:

0 1 2 3 4 5

500 501 502 503

504 505 506

507

508

Das

PDS

nsn

ist

VAFIN

3sis

doch

ADV

−−

ganz

ADV

−−

klar

ADJD

−−

?

$.

−−

HD HD HD HD

NX

ON

VXFIN

HD

ADVX

− HD

ADVX

MOD

ADJX

PRED

VF

LK

MF

SIMPX

0 1 2 3 4 5

500 501 502 503

504 505

506

507

Ob

KOUS

−−

Ampler

NE

nsm

auf

APPR

a

Sieg

NN

asm

fahre

VVFIN

3sks

?

$.

−−

− HD HD HD

NX

HD

VXFIN

HD

NX

ON

PX

V−MOD

C

MF

VC

SIMPX

107

Page 109: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.3 Clauses of Comparison

Clauses of comparison with als and wie which are semantically equated with a constituent(e.g. an adverb or an adjective) in the main clause, i.e. the comparison expresses anidentity, are annotated with als and wie as particle of comparison (KOKOM). The subclause can either be a verb-initial clause (e.g. ..., als ware ...; ..., als hatte ...) or averb-final clause (e.g. ..., als ob ...; ..., wie wenn ...; ..., als daß ...).

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506 507 508

509 510 511 512 513 514

515

516

517

518

Das

PDS

nsn

klingt

VVFIN

3sis

ja

ADV

−−

so

ADV

−−

,

$,

−−

als

KOKOM

−−

hätten

VAFIN

3pkt

Sie

PPER

np*3

die

ART

apn

Segel

NN

apn

vorerst

ADV

−−

gestrichen

VVPP

−−

.

$.

−−

HD HD HD HD HD HD − HD HD HD

NX

ON

VXFIN

HD

ADVX

MOD

ADVX

PRED

VXFIN

HD

NX

ON

NX

OA

ADVX

MOD

VXINF

OV

LK

MF

VC

SIMPX

HD

SIMPX

PRED−MOD

VF

LK

MF

NF

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507 508 509 510

511 512 513 514

515

516

517

518

Tun

VVFIN

3pis

heute

ADV

−−

aber

ADV

−−

so

ADV

−−

,

$,

−−

als

KOKOM

−−

ob

KOUS

−−

sie

PPER

np*3

die

PDS

ap*

schon

ADV

−−

immer

ADV

−−

durchschaut

VVPP

−−

haben

VAFIN

3pis

.

$.

−−

HD HD HD HD − HD HD HD HD HD HD

VXFIN

HD

ADVX

V−MOD

ADVX

MOD

ADVX

OADVP

NX

ON

NX

OA

ADVX

MOD

ADVX

MOD

VXINF

OV

VXFIN

HD

C

MF

VC

SIMPX

HD

SIMPX

OADVP−MO

LK

MF

NF

SIMPX

In contrast, clauses of comparison expressing a difference are annotated with thesubordinating conjunction als (KOUS):

108

Page 110: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12

500 503 505 506 507 509 510 511 513 514

501 504 508 512 515

502 516

517

518

Zum

APPRART

dsn

Glück

NN

dsn

ging

VVFIN

3sit

es

PPER

nsn3

dann

ADV

−−

schneller

ADJD

−−

,

$,

−−

als

KOUS

−−

ich

PPER

ns*1

eigentlich

ADV

−−

erwartet

VVPP

−−

hatte

VAFIN

1sit

.

$.

−−

HD HD HD HD HD − HD HD HD HD

NX

HD

VXFIN

HD

NX

ON

ADVX

MOD

ADJX

OADJP

NX

ON

ADVX

MOD

VXINF

OV

VXFIN

HD

PX

MOD

C

MF

VC

SIMPX

OADJP−MO

VF

LK

MF

NF

SIMPX

6.4 Relative Clauses

Considering relative clauses (R-SIMPX), the relative pronoun occurs in the C-field. It isfirst projected to the phrase level before it is attached to the C node. The relative clauseitself is located in NF like in the following example if no other constituent follows. Itsedge label shows to which constituent of the matrix clause it is related. OA-MOD, forexample, suggests that the relative clause refers to OA:

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505 506

507 508 509 510 511 512

513 514

515

516

Aber

KON

−−

es

PPER

nsn3

gäbe

VVFIN

3skt

intelligente

ADJA

apf

Lösungen

NN

apf

,

$,

−−

die

PRELS

np*

kein

PIAT

asn

Geld

NN

asn

kosten

VVFIN

3pis

.

$.

−−

− HD HD HD HD − HD HD

NX

ON

VXFIN

HD

ADJX

− HD

NX

ON

NX

OA

VXFIN

HD

NX

OA

C

MF

VC

R−SIMPX

OA−MOD

KOORD

VF

LK

MF

NF

SIMPX

If the head noun phrase of the relative clause is the noun phrase of a prepositionalphrase or a postmodifier within a complex phrase, the relative clause is labelled as MOD.Additionally, there is a secondary edge label named refint (cf. 3.4.6) from the head nounNX to the relative clause:

109

Page 111: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

515

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505

506 507 508 509 510 511

512 513 514

515

516

517

Ein

ART

nsm

Bettenrost

NN

nsm

mutiert

VVFIN

3sis

zu

APPR

d

einem

ART

dsn

Gefängnisgitter

NN

dsn

,

$,

−−

hinter

APPR

d

dem

PRELS

dsn

freier

ADJA

nsm

Himmel

NN

nsm

lockt

VVFIN

3sis

.

$.

−−

− HD HD − HD HD HD HD

NX

ON

VXFIN

HD −

NX

HD −

NX

HD

ADJX

− HD

VXFIN

HD

PX

OPP

PX

V−MOD

NX

ON

C

MF

VC

R−SIMPX

MOD

VF

LK

MF

NF

SIMPX

refint

The position of the relative clause in NF is justified by the fact that it does notnecessarily occur as an immediate constituent located to the right of the noun phraseto which it refers. For example, a verb complex can occur between the noun phraseand the relative clause (Der Bettenrost ist zu einem Gefangnisgitter mutiert, hinter demfreier Himmel lockt.). In sentences like this, the complexity of the noun phrase (NP +relative clause) is important. This so called heavyness follows Behaghel’s first physicallaw (Behaghel 1932): complex noun phrases tend to find a position at the end of thesentence even if they deviate from their basic order. If the relative clause does not followthe noun phrase immediately, its unmarked position is in NF. Unless there is strongevidence for a position in MF, the relative clause is located in NF.

If the relative clause and its head noun phrase are adjacent constituents in VF or MF,the relative clause modifies the noun phrase directly as a postmodifier.

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505 506

507 508 509 510 511

512

513

514

515

die

PRELS

nsf

die

ART

asf

AWO

NN

asf

,

$,

−−

wo

PWAV

−−

er

PPER

nsm3

Kreisvorsitzender

NN

nsm

ist

VAFIN

3sis

,

$,

−−

prüfte

VVFIN

3sit

HD − HD HD HD HD HD HD

NX

ON

ADVX

V−MOD

NX

ON

NX

PRED

VXFIN

HD

VXFIN

HD

C

MF

VC

NX

HD

R−SIMPX

NX

OA

C

MF

VC

R−SIMPX

110

Page 112: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.4.1 Event-modifying Relative Clauses

Relative clauses that modify an event which is not expressed by a nominal expression arealso annotated as R-SIMPX.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

500 501 502 503 504 505 506 507

508 509 510 511 512 513

514 515

516

517

518

...

−−

−−

haben

VAFIN

3pis

den

ART

asm

Text

NN

asm

ins

APPRART

asn

Niederdeutsche

NN

asn

übertragen

VVPP

−−

,

$,

−−

was

PRELS

nsn

mit

APPR

d

jeder

PIDAT

dsf

Szene

NN

dsf

mehr

PIAT

***

Sinn

NN

asm

macht

VVFIN

3sis

.

$.

−−

HD − HD HD HD HD − HD − HD HD

VXFIN

HD −

NX

HD

VXINF

OV

NX

ON −

NX

HD

VXFIN

HD

NX

OA

PX

FOPP

PX

V−MOD

NX

OA

C

MF

VC

R−SIMPX

MOD

LK

MF

VC

NF

SIMPX

6.4.2 Independent Relative Clauses

Independent relative clauses (also ’nominal relative clauses’, in German ’Freie Rela-tivsatze’) do not modify a head word but substitute an argument or adjunct in theclause. Consequently, they are labelled SIMPX on sentential level (instead of R-SIMPX)and they function as (sentential) subject (ON) or sentential object (OS). The latter isnot uncontroversial since they are distributed like non-sentential, nominal arguments withrespect to subcategorization restrictions.

The relative pronoun used in independent relative clauses normally belongs to thew-class of relative pronouns such as wer or was and is tagged with the STTS tag PWS.

503

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505 506 507

508 509 510 511 512

513 514

515

516

Wer

PWS

ns*

einfach

ADV

−−

gut

ADJD

−−

unterhalten

VVPP

−−

werden

VAINF

−−

will

VMFIN

3sis

,

$,

−−

kommt

VVFIN

3sis

auf

APPR

a

seine

PPOSAT

ap*

Kosten

NN

ap*

.

$.

−−

HD HD HD HD HD HD HD − HD

NX

ON

ADVX

MOD

ADJX

V−MOD

VXINF

OV

VXINF

OV

VXFIN

HD

VXFIN

HD −

NX

HD

C

MF

VC

PX

OPP

SIMPX

ON

VF

LK

MF

SIMPX

refvc

111

Page 113: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505 506 507

508 509 510 511 512 513 514

515

516

517

Manchmal

ADV

−−

muß

VMFIN

3sis

man

PIS

ns*

klar

ADJD

−−

sagen

VVINF

−−

,

$,

−−

was

PWS

asn

man

PIS

ns*

will

VMFIN

3sis

.

$.

−−

HD HD HD HD HD HD HD HD

ADVX

MOD

VXFIN

HD

NX

ON

ADJX

V−MOD

VXINF

OV

NX

OA

NX

ON

VXFIN

HD

C

MF

VC

SIMPX

OS

VF

LK

MF

VC

NF

SIMPX

Independent relative clauses introduced by wie are currently annotated in a differentmanner. Wie is analysed as subordinating conjunction (KOUS).

6.5 Coordination

Coordination is a syntactic phenomenon that occurs on the following annotation levels:phrase level, field level, and sentence level. Within coordinations, the conjuncts are firstprojected to their phrase, field, or clause level. In a second step, they are attached totheir mother node which is n-ary branching (conjunctions between the conjuncts). Thisscheme is the same for all syntactic categories.

The edge labels between the mother node and the conjuncts of the coordination arelabelled as KONJ. This edge label supports the distinction between conjuncts, modi-fiers, and conjunctions within complex conjunctions (cf. 6.5.3), as well as the distinctionbetween coordinations and elliptical constructions (cf. 6.6).

In contrast to coordinating conjunctions in the KOORD-field, coordinating conjunc-tions in coordinations (und, oder, etc.) are directly attached to the mother node of theconjuncts. The class of coordinating conjunctions consists of single, e.g. und, oder, aber,als, as well as of complex conjunctions, e.g. entweder oder, weder noch, sowohl als. Gen-erally, coordinating conjunctions may coordinate constituents of any category. Moreover,they can form asymmetric coordinations in which the conjuncts belong to different syn-tactic categories (cf. 6.5.2).2 In order to distinguish conjunctions from conjuncts withina coordination, their edge labels are empty.

In the following, coordination on all annotation levels as well as specific cases ofcoordination, e.g. split coordinations, will be demonstrated.

2If bis is used as a conjunction like in 10.000 bis (KON) 20.000 koreanischen Daewoo PKW it istagged as KON. But remember that von ... bis ... phrases are treated differently (cf. 4.4.1).

112

Page 114: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.5.1 Coordination of Phrases

Noun Phrases

0 1 2 3 4 5 6

500 501 502 503

504 505

506

Ende

NN

asn

der

ART

gpm

Kämpfe

NN

gpm

und

KON

−−

Verurteilung

NN

asf

der

ART

gsf

Selbstmandatierung

NN

gsf

HD − HD HD − HD

NX

HD

NX

NX

HD

NX

NX

KONJ −

NX

KONJ

NX

Prepositional Phrases

0 1 2 3 4 5

500 501

502 503

504

am

APPRART

dsm

Arbeitsplatz

NN

dsm

oder

KON

−−

in

APPR

d

der

ART

dsf

Familie

NN

dsf

HD − HD

NX

HD −

NX

HD

PX

KONJ −

PX

KONJ

PX

Adjectival Phrases

0 1 2 3

500 501

502

503

Heimliche

ADJA

nsf

und

KON

−−

illegale

ADJA

nsf

Pioniertat

NN

nsf

HD HD

ADJX

KONJ −

ADJX

KONJ

ADJX

− HD

NX

113

Page 115: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505

506 507 508

509

510

Das

PDS

nsn

klingt

VVFIN

3sis

anmaßend

ADJD

−−

,

$,

−−

pathosschwer

ADJD

−−

,

$,

−−

elektrisch

ADJD

−−

,

$,

−−

laut

ADJD

−−

.

$.

−−

HD HD HD HD HD HD

NX

ON

VXFIN

HD

ADJX

KONJ

ADJX

KONJ

ADJX

KONJ

ADJX

KONJ

ADJX

PRED

VF

LK

MF

SIMPX

Adverbial Phrases

0 1 2

500 501

502

solo

ADV

−−

oder

KON

−−

zusammen

ADV

−−

HD HD

ADVX

KONJ −

ADVX

KONJ

ADVX

6.5.2 Asymmetric Coordination

Since constituents of different syntactic categories can be coordinated, it has to be decidedon a label for the mother node of the coordination. In this case, the default strategy hasbeen adopted to choose the syntactic category of the left-most conjunct as the categoryof the entire coordination:

0 1 2 3 4 5 6 7

500 501 502 503

504

heute

ADV

−−

,

$,

−−

So.

NN

nsm

,

$,

−−

Mo.

NN

nsm

,

$,

−−

u.

KON

−−

Di.

NN

nsm

HD HD HD HD

ADVX

KONJ

NX

KONJ

NX

KONJ −

NX

KONJ

ADVX

114

Page 116: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

500 501 502 503 504 505 506 507 508

509 510 511 512 513 514

515 516

517

518

519

Die

ART

nsf

Farbpalette

NN

nsf

ist

VAFIN

3sis

zart

ADJD

−−

und

KON

−−

geschmackvoll

ADJD

−−

,

$,

−−

von

APPR

d

Bordeaux

NN

dsn

bis

APPR

a

Flieder

NN

asn

,

$,

−−

auch

ADV

−−

Orange

NN

nsn

oder

KON

−−

Rosa

NN

nsn

.

$.

−−

− HD HD HD HD HD HD HD HD HD

NX

ON

VXFIN

HD

ADJX

KONJ −

ADJX

KONJ −

NX

HD −

NX

HD

NX

KONJ −

NX

KONJ

PX

KONJ

PX

KONJ

ADVX

NX

HD

ADJX

KONJ

PX

KONJ

NX

KONJ

ADJX

PRED

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504

505 506 507 508 509

510

511

512

Ich

PPER

ns*1

bin

VAFIN

1sis

weder

KON

−−

ausgesprochene

ADJA

nsf

Meat−Loaf−FanIn

NN

nsf

noch

KON

−−

in

APPR

d

dem

ART

dsn

Konzert

NN

dsn

gewesen

VAPP

−−

.

$.

−−

HD HD HD − HD HD

NX

ON

VXFIN

HD

ADJX

− HD −

NX

HD

VXINF

OV

NX

KONJ −

PX

KONJ

NX

PRED

VF

LK

MF

VC

SIMPX

6.5.3 Coordinations with Complex Conjunctions

The conjuncts and conjunctions of a coordination with complex conjunctions are alsoattached on the same level following the above mentioned rules for coordination. Bothparts of complex conjunctions like entweder oder and sowohl als are tagged as KON.The latter one usually occurs together with the adverb auch, which is tagged as ADV,projected to the phrase level, and then attached to the mother node of the coordination.The same applies for nicht in coordinations with sondern. Sondern is tagged as KON,whereas nicht is always tagged as PTKNEG:

115

Page 117: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507 508

509 510 511 512

513

514

515

516

Immerhin

ADV

−−

wird

VAFIN

3sis

es

PPER

nsn3

nicht

PTKNEG

−−

noch

ADV

−−

obendrein

ADV

−−

kalt

ADJD

−−

,

$,

−−

sondern

KON

−−

bei

APPR

d

20

CARD

−−

Grad

NN

dpn

erträglich

ADJD

−−

.

$.

−−

HD HD HD HD HD HD HD HD HD

ADVX

MOD

VXFIN

HD

NX

ON

ADVX

MOD

ADVX

MOD

ADVX

MOD

ADJX

PRED

ADJX

− HD

NX

HD

PX

V−MOD

ADJX

PRED

MF

KONJ −

MF

KONJ

VF

LK

FKOORD

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

500 501 502 503 504 505 506 507

508 509 510 511

512 513

514 515

516

517

Der

ART

nsm

Papst−Besuch

NN

nsm

in

APPR

d

Bukarest

NE

dsn

spielt

VVFIN

3sis

sowohl

KON

−−

außenpolitisch

ADJD

−−

als

KON

−−

auch

ADV

−−

für

APPR

a

Rumänien

NE

asn

selbst

ADV

−−

eine

ART

asf

große

ADJA

asf

Rolle

NN

asf

.

$.

−−

− HD HD HD HD HD HD HD HD

NX

HD

VXFIN

HD

NX

HD

ADVX

− −

ADJX

− HD

NX

HD

PX

− −

NX

HD

NX

ON −

ADJX

KONJ −

ADVX

PX

KONJ

ADJX

V−MOD

NX

OA

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

500 501 502 503 504 505 506 507 508 509 510

511 512 513 514 515 516 517 518 519

520 521 522

523

524

525

Entweder

KON

−−

haben

VAFIN

3pis

die

PDS

np*

überhaupt

ADV

−−

nicht

PTKNEG

−−

begriffen

VVPP

−−

,

$,

−−

worum

PWAV

−−

es

PPER

nsn3

geht

VVFIN

3sis

,

$,

−−

oder

KON

−−

es

PPER

nsn3

ist

VAFIN

3sis

ihnen

PPER

dp*3

egal

ADJD

−−

.

$.

−−

HD HD HD HD HD HD HD HD HD HD HD

VXFIN

HD

ADVX

− HD

VXINF

OV

PX

OPP

NX

ON

VXFIN

HD

NX

ON

VXFIN

HD

NX

OD

ADJX

PRED

NX

ON

ADVX

MOD

C

MF

VC

VF

LK

MF

SIMPX

OS

LK

MF

VC

NF

SIMPX

KONJ −

SIMPX

KONJ

SIMPX

6.5.4 Coordinations with Truncated Words

Truncated words are projected to the phrase level. Their edge labels are empty. Thephrases of both conjuncts are coordinated. The truncated words do not receive morpho-logical annotation.

116

Page 118: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2

501 502

500

Bau−

TRUNC

−−

und

KON

−−

Verkehrsplanungen

NN

dpf

− HD

NX

KONJ −

NX

NX

KONJ

0 1 2 3 4 5

500 501

502

503

bei

APPR

d

einer

ART

dsf

SPD−

TRUNC

−−

oder

KON

−−

einer

ART

dsf

CDU−Veranstaltung

NN

dsf

− − − HD

NX

KONJ −

NX

KONJ

NX

HD

PX

0 1 2 3

500503

501

502

Noch−Frauen−

TRUNC

−−

und

KON

−−

bald

ADV

−−

Fußballsender

NN

nsm

HD−

ADVX

− HD

NX

NX

KONJ

NX

KONJ

In the case of complex conjunctions, the conjuncts are annotated in the same way.

0 1 2 3 4

500 501 502

503

sowohl

KON

−−

kultur−

TRUNC

−−

als

KON

−−

auch

ADV

−−

stadtentwicklungspolitisch¦

ADJD

−−

− HD HD

ADJX

KONJ −

ADVX

ADJX

KONJ

ADJX

0 1 2 3 4 5 6

500 501 502

503

nicht

PTKNEG

−−

die

ART

nsf

Sozial−

TRUNC

−−

,

$,

−−

sondern

KON

−−

die

ART

nsf

Bildungsbehörde

NN

nsf

HD − − − HD

ADVX

NX

KONJ −

NX

KONJ

NX

117

Page 119: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Word initial TRUNCs are different from truncated words which include the secondpart of a word. The latter ones are treated like complete lexical heads, because theycomprise the head morpheme of the complex word.

0 1 2 3 4 5 6 7 8

500 501 502

503 504

505

506

Originaltitel

NN

nsm

und

KON

−−

−fassung

NN

nsf

von

APPR

d

"

$(

−−

8

CARD

−−

MM

NN

dsm

"

$(

−−

.

$.

−−

HD HD HD

NX

KONJ −

NX

KONJ

ADJX

− HD

NX

HD

NX

HD

PX

NX

6.5.5 Attachment Principles of Coordination within Phrases

If two or more nominal conjuncts occur together with a common determiner and/oradjectival phrase, first the conjuncts are projected to their phrase level and then thedeterminer or the adjectival phrase is attached to the coordination on a higher levelaccording to the high attachment principle. Thus, the modification scope comprises theentire coordination. The coordinated part is assigned the head function.

0 1 2 3

500 501

502

503

den

ART

dp*

Angestellten

NN

dp*

und

KON

−−

Beamten

NN

dpm

HD HD

NX

KONJ −

NX

KONJ

NX

HD

NX

0 1 2 3 4

500 501 502

503

504

Die

ART

np*

türkischen

ADJA

np*

Instrumente

NN

npn

und

KON

−−

Harmonien

NN

npf

HD HD HD

NX

KONJ −

NX

KONJ

ADJX

NX

HD

NX

118

Page 120: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.5.6 Coordination of Topological Fields

The conjuncts of a coordination of topological fields are either single fields (cf. 6.5.4) or acombination of fields. Possible combinations are, for instance, (MF + VC), (LK + MF),(LK + MF + VC). The node label for these conjuncts is FKONJ (conjunct consisting offields) and the mother node of a coordination of conjuncts of fields is FKOORD.

In a coordination of conjuncts of fields, the following annotation steps are involved:

1. The constituents are attached to the fields in which they occur in (MF, VC, NF,etc.).

2. Each conjunct (concatenation of fields or single field) is labelled as FKONJ.

3. The conjuncts are attached to the general coordination field FKOORD.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

500 501 502 503 504 505 506 507 508 509

510 511 512 513 514 515 516 517

518 519 520

521 522

523 524

525

526

Wir

PPER

np*1

glauben

VVFIN

1pis

an

APPR

a

die

ART

asf

totale

ADJA

asf

Gegenwart

NN

asf

und

KON

−−

tun

VVFIN

1pis

hier

ADV

−−

und

KON

−−

jetzt

ADV

−−

alles

PIS

asn

,

$,

−−

was

PRELS

asn

wir

PPER

np*1

können

VMFIN

1pis

.

$.

−−

HD HD HD HD HD HD HD HD HD HD

NX

ON

VXFIN

HD −

ADJX

− HD

VXFIN

HD

ADVX

KONJ −

ADVX

KONJ

NX

OA

NX

ON

VXFIN

HD

NX

HD

ADVX

V−MOD

NX

OA

C

MF

VC

PX

OPP

R−SIMPX

OA−MOD

LK

MF

LK

MF

NF

FKONJ

KONJ −

FKONJ

KONJ

VF

FKOORD

SIMPX

Often, the subject of the sentence occurs only in the left field conjunct:

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505

506 507 508 509 510

511 512

513

514

"

$(

−−

Immer

ADV

−−

kommt

VVFIN

3sis

einer

PIS

nsm

und

KON

−−

stiehlt

VVFIN

3sis

mir

PPER

ds*1

meine

PPOSAT

asf

Krise

NN

asf

"

$(

−−

.

$.

−−

HD HD HD HD HD − HD

ADVX

MOD

VXFIN

HD

NX

ON

VXFIN

HD

NX

OD

NX

OA

LK

MF

LK

MF

FKONJ

KONJ −

FKONJ

KONJ

VF

FKOORD

SIMPX

A coordination of fields may also be an embedded structure. In this case, FKOORDfunctions also as conjunct label:

119

Page 121: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507

508 509 510 511 512 513 514

515 516 517

518

519

520

521

Die

ART

np*

Älteren

NN

np*

sind

VAFIN

3pis

teurer

ADJD

−−

,

$,

−−

haben

VAFIN

3pis

familiäre

ADJA

apf

Verpflichtungen

NN

apf

und

KON

−−

oft

ADV

−−

ein

ART

asn

Haus

NN

asn

abzuzahlen

VVIZU

−−

.

$.

−−

− HD HD HD HD HD HD − HD HD

NX

ON

VXFIN

HD

ADJX

PRED

VXFIN

HD

ADJX

− HD

ADVX

MOD

NX

OA

VXINF

OV

LK

MF

NX

OA

MF

VC

MF

KONJ −

FKONJ

KONJ

LK

FKOORD

FKONJ

KONJ

FKONJ

KONJ

VF

FKOORD

SIMPX

6.5.7 Attachment of Ambiguous Modifiers in Coordination

Within phrases, the modification scope of a premodifier can be ambiguous. Therefore,high attachment is applied to preserve ambiguity. In the following example, the adverbmodifies the coordination of adjectives rather than only the first adjective:

0 1 2 3

500 501 502

503

504

Viel

ADV

−−

größer

ADJD

−−

und

KON

−−

brutaler

ADJD

−−

HD HD HD

ADJX

KONJ −

ADJX

KONJ

ADVX

ADJX

HD

ADJX

Modifying constituents are attached to a conjunct rather than to a field if their mod-ification scope is limited to the conjunct.

120

Page 122: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505

506 507 508 509

510

511

512

Wir

PPER

np*1

glauben

VVFIN

1pis

nicht

PTKNEG

−−

an

APPR

a

die

ART

asf

Vergangenheit

NN

asf

und

KON

−−

nicht

PTKNEG

−−

an

APPR

a

die

ART

asf

Zukunft

NN

asf

.

$.

−−

HD HD HD − HD HD − HD

NX

ON

VXFIN

HD

ADVX

− −

NX

HD

ADVX

− −

NX

HD

PX

KONJ −

PX

KONJ

PX

OPP

VF

LK

MF

SIMPX

Also in coordinations with complex conjunctions, attachment on the phrase level isapplied if possible.

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505

506 507 508 509

510 511

512

513

Radunskis

NE

gsm

Sprecher

NN

nsm

,

$,

−−

Axel

NE

nsm

Wallrabenstein

NE

nsm

,

$,

−−

wollte

VMFIN

3sit

die

ART

asf

Entscheidung

NN

asf

weder

KON

−−

bestätigen

VVINF

−−

noch

KON

−−

dementieren

VVINF

−−

.

$.

−−

HD − − HD − HD HD HD

NX

− HD

VXFIN

HD

NX

OA −

VXINF

KONJ −

VXINF

KONJ

NX

APP

NX

APP

VXINF

OV

NX

ON

VF

LK

MF

VC

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

500 501 502 503 504 505 506 507

508 509 510 511

512

513

514

Die

ART

nsf

Lufthansa

NE

nsf

hatte

VAFIN

3sit

nicht

PTKNEG

−−

etwa

ADV

−−

unsere

PPOSAT

asf

Lektüre

NN

asf

,

$,

−−

sondern

KON

−−

gleich

ADV

−−

das

ART

asn

ganze

ADJA

asn

Flugzeug

NN

asn

rationiert

VVPP

−−

.

$.

−−

− HD HD HD HD − HD HD HD HD

NX

ON

VXFIN

HD −

ADJX

− HD

VXINF

OV

ADVX

ADVX

NX

KONJ −

ADVX

NX

KONJ

NX

OA

VF

LK

MF

VC

SIMPX

If there is more than one constituent within a conjunct, each with its own grammaticalfunction, these constituents are first attached to the respective field node. Then, the fieldsare coordinated:

121

Page 123: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507 508

509 510 511 512

513

514

515

516

Immerhin

ADV

−−

wird

VAFIN

3sis

es

PPER

nsn3

nicht

PTKNEG

−−

noch

ADV

−−

obendrein

ADV

−−

kalt

ADJD

−−

,

$,

−−

sondern

KON

−−

bei

APPR

d

20

CARD

−−

Grad

NN

dpn

erträglich

ADJD

−−

.

$.

−−

HD HD HD HD HD HD HD HD HD

ADVX

MOD

VXFIN

HD

NX

ON

ADVX

MOD

ADVX

MOD

ADVX

MOD

ADJX

PRED

ADJX

− HD

NX

HD

PX

V−MOD

ADJX

PRED

MF

KONJ −

MF

KONJ

VF

LK

FKOORD

SIMPX

6.5.8 Coordination of Sentences

In accordance with the longest match principle, complete sentences are coordinated asparatactic constructions when they belong to the same syntactic unit (cf. 3.4.3), i.e., theyare coordinated by a conjunction, a comma, or a dash:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

500 501 502 503 504 505 506 507 508

509 510 511 512 513 514 515

516 517 518

519

520

521

522

Nashorn−Bürgermeister

NN

nsm

Henning

NE

nsm

Storchbein

NE

nsm

setzt

VVFIN

3sis

auf

APPR

a

die

ART

asf

Sammlung

NN

asf

der

ART

gpf

positiven

ADJA

gpf

Kräfte

NN

gpf

und

KON

−−

prompt

ADJD

−−

wird

VAFIN

3sis

da

ADV

−−

gepöbelt

VVPP

−−

.

$.

−−

HD − − HD − HD HD HD HD HD HD

NX

APP

NX

APP

VXFIN

HD −

ADJX

− HD

ADJX

V−MOD

VXFIN

HD

ADVX

MOD

VXINF

OV

NX

ON

NX

HD

NX

VF

LK

MF

VC

NX

HD

PX

OPP

VF

LK

MF

SIMPX

KONJ −

SIMPX

KONJ

SIMPX

122

Page 124: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507

508 509 510 511 512 513

514 515 516 517

518 519

520

So

ADV

−−

aber

ADV

−−

blieb

VVFIN

3sit

alles

PIS

nsn

beim

APPRART

dsn

alten

NN

dsn

$(

−−

nur

ADV

−−

der

ART

nsm

Innensenator

NN

nsm

ist

VAFIN

3sis

ein

ART

nsm

neuer

ADJA

nsm

.

$.

−−

HD HD HD HD HD HD HD HD

ADVX

HD

ADVX

VXFIN

HD −

NX

HD

ADVX

− − HD

VXFIN

HD −

ADJX

ADVX

MOD

NX

ON

PX

OPP

NX

ON

NX

PRED

VF

LK

MF

VF

LK

MF

SIMPX

KONJ

SIMPX

KONJ

SIMPX

A coordination may also consist of two sentences with the subject of the whole con-struction only occurring in the left conjunct of the coordination.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

500 501 502 503 504 505 506 507 508

509 510 511 512 513 514

515 516

517 518

519

520

Ohne

KOUS

−−

sie

PPER

ap*3

gesehen

VVPP

−−

zu

PTKZU

−−

haben

VAINF

−−

?

$.

−−

,

$,

−−

kontert

VVFIN

3sis

der

ART

nsm

Popstar

NN

nsm

und

KON

−−

wirft

VVFIN

3sis

einen

ART

asm

Blick

NN

asm

nach

APPR

d

rechts

ADV

−−

.

$.

−−

− HD HD HD − HD − HD HD − HD HD

NX

OA

VXINF

OV

VXINF

HD

VXFIN

HD

NX

ON

VXFIN

HD −

ADVX

HD

C

MF

VC

NX

OA

PX

OPP

SIMPX

OS

LK

MF

VF

LK

MF

SIMPX

KONJ −

SIMPX

KONJ

SIMPX

Subclauses (either in VF or in NF) with or even without a conjunction can also becoordinated.

510

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

500 502 504 505 507 508 510 511 512 515 517 518 520 521

501 503 506 509 513 516 519 522

514 523

524

525

526

Unklar

ADJD

−−

ist

VAFIN

3sis

aber

ADV

−−

noch

ADV

−−

,

$,

−−

wie

KOUS

−−

diese

PDAT

nsf

Leistung

NN

nsf

beurteilt

VVPP

−−

werden

VAINF

−−

kann

VMFIN

3sis

$(

−−

und

KON

−−

wer

PWS

ns*

dafür

PROP

−−

zuständig

ADJD

−−

sein

VAINF

−−

soll

VMFIN

3sis

.

$.

−−

HD HD HD HD − − HD HD HD HD HD HD HD HD HD

ADJX

PRED

VXFIN

HD

ADVX

MOD

ADVX

MOD

NX

ON

VXINF

OV

VXINF

OV

VXFIN

HD

NX

ON

PX

OPP

ADJX

PRED

VXINF

OV

VXFIN

HD

C

MF

VC

C

MF

VC

SIMPX

KONJ −

SIMPX

KONJ

SIMPX

ON

VF

LK

MF

NF

SIMPX

refvc

123

Page 125: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.5.9 Paratactic Constructions

Paratactic constructions consisting of verb-second clauses conjoined by the conjunctionsdenn and weil, which also occur in the PARORD-field in the beginning of a sentence,are treated as syntactically equivalent conjuncts (verb-second instead of verb-final inweil-clause). In order to distinguish coordination of sentences with a conjunct of thePARORD field from the above mentioned coordinations of sentences, these paratacticconstructions are labelled as P-SIMPX instead of SIMPX.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

500 501 502 503 504 505 506 507

508 509 510 511 512 513 514

515 516

517 518

519

520

Kilos

NN

npn

und

KON

−−

Fitneß

NN

npf

sollen

VMFIN

3pis

stimmen

VVINF

−−

,

$,

−−

denn

KON

−−

Wesemann

NE

nsm

will

VMFIN

3sis

die

ART

asf

Tour

NE

asf

de

NE

asf

France

NE

asn

gewinnen

VVINF

−−

.

$.

−−

HD HD HD HD HD HD HD HD

NX

KONJ −

NX

KONJ

VXFIN

HD

VXINF

OV

NX

ON

VXFIN

HD − −

NX

VXINF

OV

NX

ON −

NX

HD

VF

LK

VC

NX

OA

VF

LK

MF

VC

SIMPX

KONJ −

SIMPX

KONJ

P−SIMPX

Syntactically coordinated but semantically subordinated main clauses with adversa-tive conjunctive adverbs (e.g. doch) are also annotated as paratactic constructions:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

500 501 502 503 504 505 506 507 508 509 510

511 512 513 514 515 516

517 518

519 520

521

Für

APPR

a

sie

PPER

ap*3

ist

VAFIN

3sis

Berlin

NE

nsn

bereits

ADV

−−

ein

ART

nsn

Gründermekka

NN

nsn

,

$,

−−

liegt

VVFIN

3sis

es

PPER

nsn3

doch

ADV

−−

bei

APPR

d

den

ART

dpf

Gewerbeanmeldungen

NN

dpf

im

APPRART

dsm

Bundesdurchschnitt

NN

dsm

vorne

ADV

−−

.

$.

−−

HD HD HD − HD HD HD HD − HD HD HD

NX

HD

VXFIN

HD ON

ADVX

MOD

NX

PRED

VXFIN

HD −

NX

HD −

NX

HD

PX

MOD

NX

ON

ADVX

MOD

PX

MOD

PX

MOD

ADVX

OADVP

VF

LK

MF

LK

MF

SIMPX

KONJ

SIMPX

KONJ

P−SIMPX

NX

HD

6.5.10 Conjunctions Occurring with Isolated Phrases

If a conjunct occurs isolated with a conjunction, high attachment is applied like in com-plete coordinations. But for isolated conjuncts, the conjunct is annotated as the head ofthe construction (HD instead of KONJ).

124

Page 126: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500

501

und

KON

−−

jetzt

ADV

−−

HD

ADVX

HD

ADVX

0 1 2 3

500 501

502

Oder

KON

−−

eben

ADV

−−

nicht

PTKNEG

−−

.

$.

−−

HD HD

ADVX

ADVX

HD

ADVX

If there are modifiers which do not modify the conjunct itself because they are am-biguous or might modify something else rather than the conjunct, they are attached onthe same (high) level as the conjunction:

0 1 2 3 4 5 6

500 501 502 503

504

505

Und

KON

−−

das

PDS

nsn

auch

ADV

−−

noch

ADV

−−

ohne

APPR

a

Mehrvergütung

NN

asf

.

$.

−−

HD HD HD HD

NX

HD

NX

HD

ADVX

ADVX

PX

NX

0 1 2 3 4 5 6 7

500 501 502 503

504

505

506

und

KON

−−

damit

PROP

−−

auch

ADV

−−

die

ART

nsf

Nervosität

NN

nsf

im

APPRART

dsn

Nato−Hauptquartier

NN

dsn

.

$.

−−

HD HD − HD HD

NX

HD

NX

HD

PX

PX

ADVX

NX

HD

NX

125

Page 127: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.5.11 Split Coordinations

Closely related to isolated conjuncts are split coordinations. Generally, the left conjunctof a split coordination is located in MF, in rare cases in VF, and the right conjunct occursin NF. In order to express the relation between them, the left conjunct carries the label ofits grammatical function (ON, OA, OD, etc.) whereas the right conjunct carries a labelthat denotes that it is the conjunct of this grammatical function (e.g. ONK, OAK, ODK,etc.). In asymmetric coordination, the syntactic category of the second split conjunctdetermines the syntactic category one level higher up:

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506

507 508 509 510

511 512

513

Jedes

PIDAT

nsn

Ja−Wort

NN

nsn

zieht

VVFIN

3sis

Applaus

NN

asm

nach

APPR

d

sich

PRF

ds*3

,

$,

−−

Unterschriften

NN

apf

,

$,

−−

Küsse

NN

apm

,

$,

−−

Händeschütteln

NN

asn

.

$.

−−

− HD HD HD HD HD HD HD

NX

ON

VXFIN

HD −

NX

HD

NX

KONJ

NX

KONJ

NX

KONJ

NX

OA

PX

OPP

NX

OAK

VF

LK

MF

NF

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506

507 508 509 510 511

512 513

514

Selbstverständlich

ADJD

−−

hat

VAFIN

3sis

nicht

PTKNEG

−−

Karin

NE

nsf

Jöns

NE

nsf

die

ART

apf

Flächen

NN

apf

gebucht

VVPP

−−

,

$,

−−

sondern

KON

−−

die

ART

nsf

SPD

NE

nsf

.

$.

−−

HD HD HD − − − HD HD − HD

ADJX

MOD

VXFIN

HD

ADVX

NX

KONJ

VXINF

OV −

NX

KONJ

NX

ON

NX

OA

NX

ONK

VF

LK

MF

VC

NF

SIMPX

0 1 2 3 4 5 6 7

500 501 502 503 504

505 506 507 508

509 510

511

Lausbuben

NN

npm

sind

VAFIN

3pis

und

KON

−−

bleiben

VVFIN

3pis

sie

PPER

np*3

und

KON

−−

unwiderstehlich

ADJD

−−

.

$.

−−

HD HD HD HD HD

NX

ON

VXFIN

KONJ −

VXFIN

KONJ

NX

PRED −

ADJX

KONJ

VXFIN

HD

ADJX

ONK

VF

LK

MF

NF

SIMPX

126

Page 128: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

6.6 Elliptical Constructions

In elliptical constructions, syntactically necessary linguistic elements are missing whichcan be reconstructed from the context or the speech situation. Elliptical constructionsappear on the phrase level as well as on the sentence level.

The model of topological fields does not make any assumptions about dependencyrelations, but it allows that topological fields may be left empty. For the description ofelliptical sentence constructions, the scheme of topological fields is an appropriate modelbecause neither crossing branches nor traces have to be used to annotate the surfacestructure of a sentence.

In elliptical phrases, the head word is missing. They are annotated like phraseswithout a head. Therefore, the edge labels of an elliptical phrase are empty:

0 1 2 3 4 5 6 7

500 501 502 503

504 505 506 507

508 509

510

die

ART

npn

irischen

ADJA

npn

Hinweisschilder

NN

npn

seien

VAFIN

3pks

den

ART

dpn

walisischen

ADJA

dpn

ziemlich

ADV

−−

ähnlich

ADJD

−−

HD HD HD HD

ADJX

− HD

VXFIN

HD −

ADJX

ADVX

− HD

NX

ON

NX

OD

ADJX

PRED

VF

LK

MF

SIMPX

0 1 2 3

500 501

502

503

in

APPR

d

und

KON

−−

um

APPR

a

Berlin

NE

asn

− HD

NX

HD

PX

KONJ −

PX

KONJ

PX

0 1 2 3

500 501 502

503

504

505

vom

APPRART

dsm

15.

ADJA

dsm

4.

ADJA

dsm

99

CARD

−−

HD HD HD

ADJX

ADJX

NX

HD

NX

NX

HD

PX

127

Page 129: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

500 501 502 503 504 505 506 507 508

509 510 511 512 513 514 515 516

517 518 519

520

521

Die

ART

nsf

FDP

NE

nsf

hat

VAFIN

3sis

in

APPR

d

Thüringen

NE

dsn

knapp

ADJD

−−

4.000

CARD

−−

,

$,

−−

in

APPR

d

Sachsen

NE

dsn

3.300

CARD

−−

und

KON

−−

in

APPR

d

Brandenburg

NE

dsn

2.000

CARD

−−

Mitglieder

NN

apn

− HD HD HD HD HD HD HD HD HD

NX

ON

VXFIN

HD −

NX

HD

ADJX

ADJX

− −

NX

HD

ADJX

− −

NX

HD

ADJX

− HD

PX

V−MOD

NX

OA

PX

V−MOD

NX

OA

PX

V−MOD

NX

OA

MF

KONJ

MF

KONJ −

MF

KONJ

VF

LK

FKOORD

SIMPX

504

0 1 2 3 4 5 6 7 8 9 10 11 12

500 501 502 503 504 505 506 507 508 509

510 511 512 513

514 515

516

517

Ob

KOUS

−−

der

ART

nsm

Senat

NN

nsm

nun

ADV

−−

weniger

PIAT

***

Straßen

NN

apf

reinigen

VVINF

−−

lassen

VVINF

−−

oder

KON

−−

flächendeckend

ADJD

−−

Parkgebühren

NN

apf

kassieren

VVINF

−−

will

VMFIN

3sis

− − HD HD − HD HD HD HD HD HD HD

NX

ON

ADVX

MOD

NX

OA

VXINF

OV

VXINF

OV

ADJX

V−MOD

NX

OA

VXINF

OV

VXFIN

HD

MF

VC

MF

VC

FKONJ

KONJ −

FKONJ

KONJ

C

FKOORD

SIMPX

refvc

In elliptical sentence constructions, specific topological fields are not occupied. Allconstituents are attached to the appropriate field. In the first example, LK in the secondconjunct is missing. In the second example, the subject is in NF and the main clause islacking a verbal constituent:

0 1 2 3 4 5 6 7 8

500 501 502 503 504

505 506 507 508 509

510 511

512

Der

ART

nsm

Fall

NN

nsm

ist

VAFIN

3sis

brisant

ADJD

−−

,

$,

−−

die

ART

nsf

Mischung

NN

nsf

explosiv

ADJD

−−

.

$.

−−

− HD HD HD − HD HD

NX

ON

VXFIN

HD

ADJX

PRED

NX

ON

ADJX

PRED

VF

LK

MF

VF

MF

SIMPX

KONJ

SIMPX

KONJ

SIMPX

128

Page 130: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9

500 501 502 503 504 505 506

507 508 509

510

511

512

513

Fein

ADJD

−−

,

$,

−−

daß

KOUS

−−

sich

PRF

as*3

die

ART

nsf

Achse

NN

nsf

Hamburg−Köln

NN

ns*

langsam

ADJD

−−

festigt

VVFIN

3sis

.

$.

−−

HD − HD − HD HD HD HD

ADJX

PRED

NX

APP

NX

APP

VXFIN

HD

NX

OA

NX

ON

ADJX

MOD

C

MF

VC

SIMPX

ON

MF

NF

SIMPX

129

Page 131: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 7

The Annotation of Specific SyntacticPhenomena

7.1 Superlative and Comparative Forms

7.1.1 Superlative Forms

The particle am, which occurs as a particle with an adjective or an adverb in superla-tive constructions, is tagged as PTKA. Both, the particle and the adjective/adverb areattached on the same level forming an adverbial/adjectival phrase:

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505

506 507 508 509

510 511

512

Den

ART

asm

vorigen

ADJA

asm

Sonntag

NN

asm

hätte

VAFIN

3skt

Frank

NE

nsm

Michael

NE

nsm

Nehr

NE

nsm

am

PTKA

−−

liebsten

ADJD

−−

aus

APPR

d

dem

ART

dsm

Kalender

NN

dsm

gestrichen

VVPP

−−

.

$.

−−

HD HD − − − − HD − HD HD

ADJX

− HD

VXFIN

HD −

NX

HD

VXINF

OV

NX

OA

NX

ON

ADJX

MOD

PX

FOPP

VF

LK

MF

VC

SIMPX

7.1.2 The Comparative Particles wie and als

Comparative particles in German are als and wie, in rare cases also denn (e.g. Die werdendort seliger schlummern denn je.). These particles are tagged as KOKOM and occur withall types of syntactic phrases (NX, ADVX, PX, etc.). They are directly attached to anadjacent comparative phrase. In case of a comparative phrase with a postmodifier, theyare directly attached to the highest node of the complex phrase.

A comparative phrase can occur as an adjacent postmodifier of the head phrase:

130

Page 132: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505

506 507 508 509 510

511 512

513 514

515

516

Rein

ADV

−−

musikalisch

ADJD

−−

gesehen

VVPP

−−

ist

VAFIN

3sis

das

ART

nsn

Album

NN

nsn

wesentlich

ADJD

−−

schlanker

ADJD

−−

als

KOKOM

−−

das

ART

nsn

erste

ADJA

nsn

.

$.

−−

HD HD HD − HD HD HD

ADVX

− HD

VXINF

HD

VXFIN

HD

ADJX

− HD − −

ADJX

ADJX

V−MOD

ADJX

HD

NX

MF

VC

NX

ON

ADJX

PRED

SIMPX

MOD

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8

500 501 502 503 504

505 506 507

508

509

510

daß

KOUS

−−

sie

PPER

nsf3

als

KOKOM

−−

ehrenamtliche

ADJA

nsf

Vorsitzende

NN

nsf

ein

ART

asn

dienstliches

ADJA

asn

Handy

NN

asn

hat

VAFIN

3sis

− HD HD HD HD

ADJX

− HD −

ADJX

− HD

VXFIN

HD

NX

HD

NX

NX

ON

NX

OA

C

MF

VC

SIMPX

If there is a long-distance dependency between the comparative phrase and the headphrase, the dependency relation is denoted with the respective X-MOD label.

0 1 2 3 4 5 6 7

500 501 502 503 504

505 506 507 508 509

510

511

der

PRELS

nsm

fünfmal

ADV

−−

mehr

PIS

***

nach

APPR

d

Bremerhaven

NE

dsn

liefert

VVFIN

3sis

als

KOKOM

−−

Daewoo

NE

ns*

HD HD HD HD − HD

NX

ON

ADVX

− HD −

NX

HD

VXFIN

HD

NX

OA−MOD

NX

OA

PX

V−MOD

C

MF

VC

NF

R−SIMPX

In case of a long-distance dependency between the comparative phrase and the mainverb (cf. 4.7.9), the comparative phrase is either a complement (e.g. PRED) or an am-biguous or unambiguous modifier of the main verb (MOD or V-MOD).

131

Page 133: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504

505 506 507

508

509

510

Unter

APPR

d

dem

ART

dsn

Motto

NN

dsn

"

$(

−−

Kino−Extrem

NN

dsn

"

$(

−−

agiert

VVFIN

3sis

der

ART

nsm

Regisseur

NN

nsm

als

KOKOM

−−

Filmjockey

NN

nsm

− HD HD HD − HD − HD

NX

APP

NX

APP

VXFIN

HD

NX

ON

NX

PRED

NX

HD

PX

V−MOD

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8

500 501 502 503

504 505 506

507

508

509

Wie

KOKOM

−−

in

APPR

d

den

ART

dpn

meisten

PIDAT

dpn

Musicals

NN

dpn

ist

VAFIN

3sis

die

ART

nsf

Handlung

NN

nsf

simpel

ADJD

−−

HD HD − HD HD

ADJX

− HD

VXFIN

HD

NX

ON

ADJX

PRED

− −

NX

HD

PX

MOD

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506

507 508 509 510 511

512 513 514

515

516

In

APPR

d

Arsten

NE

dsn

wird

VAFIN

3sis

die

ART

nsf

neue

ADJA

nsf

Strecke

NN

nsf

,

$,

−−

wie

KOUS

−−

geplant

VVPP

−−

,

$,

−−

nachgearbeitet

VVPP

−−

und

KON

−−

begrünt

VVPP

−−

.

$.

−−

HD HD HD − HD HD HD

NX

HD

VXFIN

HD −

ADJX

− HD

VXINF

HD

VXINF

KONJ −

VXINF

KONJ

PX

V−MOD

C

VC

VXINF

OV

NX

ON

SIMPX

MOD

VF

LK

MF

VC

SIMPX

The high attachment principle applies when the comparative particle has scope overa coordination of phrases (cf. 6.5.5). In this case, the two conjuncts are coordinated first.Then the particle is attached on a higher level.

132

Page 134: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5

500 501

502

503

wie

KOKOM

−−

Pete

NE

dsm

Sampras

NE

dsm

oder

KON

−−

Yewgeni

NE

dsm

Kafelnikov

NE

dsm

− − − −

NX

KONJ −

NX

KONJ

NX

HD

NX

7.2 Verbal and Adjectival Use of Participles

In German, verbal participles which are passive verb forms (Der Mensch wird angesehen)can be used as adjectives: it can either function as an attribute adjective (der angeseheneMensch) or - depending on the context - also as a predicative adjective (der Mensch istangesehen.). In contrast to the auxiliary werden in verbal passives, the auxiliary seinis used in constructions with adjectival passives. Concerning the problematic distinctionbetween verbal and adjectival passives, we adapted the criteria in the Stuttgart-Tubingentag set (STTS) (Schiller et al. 1995).1

1. Can the sentence be transformed into active form keeping the same semantics? Ifyes → VVPP

2. Is there a von-PP or an equivalent PP that gives evidence for verb semantics? Ifyes → VVPP

3. Is it possible to substitute the word in questions by a semantically similar adjective?If yes → ADJD

The following two tree structures show the annotation of the verbal and adjectivalpassives of the verbal participle angesehen. In the first example, the verbal participle isanalysed as a VVPP in VC. In the second example, the verbal participle has an adjectivalreading and is annotated as an ADJD in MF.

1Concerning the differences between verbal and adjectival passives in English cf. Bresnan (1995).

133

Page 135: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505

506 507

508

509

daß

KOUS

−−

auch

ADV

−−

ein

ART

nsn

anderes

ADJA

nsn

Argument

NN

nsn

als

KOKOM

−−

gültig

ADJD

−−

angesehen

VVPP

−−

wurde

VAFIN

3sit

− HD HD − HD HD HD

ADJX

− HD

VXINF

OV

VXFIN

HD

ADVX

MOD

NX

ON

ADJX

PRED

C

MF

VC

SIMPX

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505 506

507 508 509 510 511 512

513

514

515

516

517

Es

PPER

nsn3

ist

VAFIN

3sis

schade

ADJD

−−

,

$,

−−

daß

KOUS

−−

akademische

ADJA

npf

Leistungen

NN

npf

so

ADV

−−

wenig

ADV

−−

angesehen

ADJD

−−

sind

VAFIN

3pis

HD HD HD − HD HD HD

NX

ON−MOD

VXFIN

HD

ADJX

PRED

ADJX

− HD

ADVX

− HD

VXFIN

HD

ADVX

− HD

NX

ON

ADJX

PRED

C

MF

VC

SIMPX

ON

VF

LK

MF

NF

SIMPX

7.3 Topicalization

Topicalization is almost exclusively found in verb-second clauses. Consequently, the sub-ject is not in the first position of the clause. Topicalized constructions bring about wordorder phenomena which differ from those occurring in MF, e.g., non-finite parts of VCare not allowed in MF.

Our annotation principles demand to analyse the topicalized verb complex and itsnon-finite parts as VC in the first position of the clause. VC is then attached to VF. If apart of MF is topicalized along with VC, first MF and VC are combined to form FKONJbefore they are attached to VF:

134

Page 136: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6

500 501 502 503

504 505 506

507 508

509

Geplant

VVPP

−−

war

VAFIN

3sit

der

ART

nsm

Papst−Besuch

NN

nsm

seit

APPR

d

langem

NN

dsn

.

$.

−−

HD HD − HD HD

VXINF

OV

VXFIN

HD −

NX

HD

VC

NX

ON

PX

V−MOD

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502 503 504 505 506

507 508 509 510

511 512

513 514

515 516

517

Auf

APPR

a

Pioneer

NE

asm

aufmerksam

ADJD

−−

geworden

VAPP

−−

war

VAFIN

3sit

der

ART

nsm

BUND

NE

nsm

durch

APPR

a

Informationen

NN

apf

französischer

ADJA

gpm

Bauern

NN

gpm

.

$.

−−

HD HD HD HD − HD HD HD

NX

HD

VXINF

OV

VXFIN

HD

ADJX

− HD

PX

FOPP

ADJX

PRED

NX

HD

NX

MF

VC

− −

NX

HD

FKONJ

NX

ON

PX

V−MOD

VF

LK

MF

SIMPX

7.4 Headlines

The syntax of headlines differs from other syntactic constructions in so far as headlines2

often lack the finite verb or a verb at all. If a headline has only an infinitive, the caseassignment follows the preference principle formulated in 5.2. Therefore, we assume ingeneral the more plausible grammatical function in each case: a passive constructionswith ON in MF if the verb in VC is a past participle and an active construction with OAin MF if the verb in VC is an infinitive.

2The identifier “HEADLINE” is inserted into the comment line above the sentence for each syntacticunit which is marked as a headline in the original data.

135

Page 137: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4

500 501 502

503 504 505

506

507

20

CARD

−−

Dissidenten

NN

npm

in

APPR

d

China

NE

dsn

festgenommen

VVPP

−−

HD HD HD

ADJX

− HD −

NX

HD

VXINF

HD

NX

ON

PX

V−MOD

MF

VC

SIMPX

0 1

500 501

502 503

504

WBM−Chefs

NN

apm

ablösen

VVINF

−−

HD HD

NX

OA

VXINF

HD

MF

VC

SIMPX

A headline can also consist of an elliptical sentence (cf. 6.6):

0 1 2

500 501

503507

505

506

Welthandelsorganisationen¦

NN

nsf

vollkommen

ADJD

−−

kopflos

ADJD

−−

HD HD

ADJX

− HD

NX

ON

ADJX

PRED

VF

MF

SIMPX

Headlines can also consist of more than one syntactic structure, for instance, separatedby a colon or a dash (cf. 4.7.2 and 5.2):

136

Page 138: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3

500 501 502

503 504

505

Rechtschreibreform

NN

nsf

:

$.

−−

Gegner

NN

npm

klagen

VVFIN

3pis

HD

NX

HD HD

NX

ON

VXFIN

HD

VF

LK

SIMPX

7.5 Discourse Markers

Generally, discourse markers are expressions or phrases of greeting, apologizing, thanking,short emotional utterances, and interjections. Their node label is DM. The edge label ofa discourse marker is empty, i.e., it does not have a head. Typical discourse markers are:ja, nein, hallo, oh, aha, pst, nunja, gewiß, toll, nun ja, etc.

In most cases, discourse markers occur as isolated expressions. Interjections, taggedas ITJ, are directly projected to DM without internal structure. The same applies foranswer particles (PTKANT):

0

500

Oh

ITJ

−−

DM

0

500

ja

PTKANT

−−

DM

Phrases which function as discourse markers are first projected to their phrase levelbefore they are assigned the node label DM.

0

500

501

gewiß

ADV

−−

HD

ADVX

DM

137

Page 139: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1

500

501

Keine

PIAT

asf

Ahnung

NN

asf

− HD

NX

DM

0 1

500

501

502

Liebe

ADJA

np*

tazzen

NN

np*

HD

ADJX

− HD

NX

DM

Isolated conjunctions and foreign language discourse markers are tagged according totheir part of speech (KON and FM) and are projected to DM:

0

500

Und

KON

−−

DM

0

500

pardon

FM

−−

DM

Discourse markers may also consist of an interjection or an answer particle and aphrase:

0 1 2

500

501

Nun

ADV

−−

ja

PTKANT

−−

.

$.

−−

HD

ADVX

− −

DM

In some cases, discourse markers have a grammatical function within a phrase or aclause. Therefore, they are attached to the syntactic structure:

138

Page 140: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8

500 501

502

503

mit

APPR

d

den

ART

dpn

Worten

NN

dpn

"

$(

−−

aha

ITJ

−−

,

$,

−−

aha

ITJ

−−

,

$,

−−

aha

ITJ

−−

− HD − − −

NX

APP

DM

APP

NX

HD

PX

0 1 2 3 4 5 6

500 501

502

Welt

NN

dsf

(

$(

−−

Oh

ITJ

−−

,

$,

−−

no

FM

−−

!

$.

−−

)

$(

−−

HD − −

NX

HD

DM

NX

7.6 Parentheses

Parentheses occur as interjective utterances within a sentence. Since there is no depen-dency relation between the parenthesis and the rest of the construction, the parenthesis isnot attached to the surrounding constituents. Often parentheses occur as SIMPX-clauses.Insertions like sagte Mehmet Scholl into direct speech are also annotated as parenthesis.3

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507

508 509 510511

512

513

514

515

Eine

ART

nsf

Vielzahl

NN

nsf

der

ART

gpm

betroffenen

ADJA

gpm

Mieter

NN

gpm

sei

VAFIN

3sks

daher

PROP

−−

,

$,

−−

so

ADV

−−

Bertermann

NE

nsm

,

$,

−−

bereits

ADV

−−

ausgezogen

VVPP

−−

.

$.

−−

− HD HD HD HD HD HD HD HD

ADJX

− HD

VXFIN

HD

ADVX

NX

HD

NX

PX

V−MOD

ADVX

MOD

VXINF

OV

NX

HD

NX

NX

ON

VF

LK

MF

VC

SIMPX

3On the TuBa-D/Z web page (http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html), the treebank is also available in the Penn Treebank formats version 1and 2. In version 1, parentheses are attached to the tree structure with the edge label PAR. For furtherdetails about the Penn Treebank formats cf. 9.

139

Page 141: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507 508

509 510 511 512 513 514

515516

Ein

ART

nsn

Kuratorium

NN

nsn

,

$,

−−

das

PDS

nsn

ist

VAFIN

3sis

wohl

ADV

−−

der

ART

nsm

Gedanke

NN

nsm

,

$,

−−

macht

VVFIN

3sis

sich

PRF

as*3

immer

ADV

−−

gut

ADJD

−−

.

$.

−−

− HD HD HD HD − HD HD HD HD HD

NX

ON

NX

ON

VXFIN

HD

ADVX

MOD

NX

PRED

VXFIN

HD

NX

OA

ADVX

MOD

ADJX

PRED

VF

LK

MF

SIMPX

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505

506 507 508 509 510

511

512

"

$(

−−

Schön

ADJD

−−

"

$(

−−

,

$,

−−

sagte

VVFIN

3sit

Mehmet

NE

nsm

Scholl

NE

nsm

,

$,

−−

"

$(

−−

ist

VAFIN

3sis

das

PDS

nsn

nicht

PTKNEG

−−

"

$(

−−

.

$.

−−

HD HD − − HD HD HD

ADJX

PRED

VXFIN

HD

NX

ON

VXFIN

HD

NX

ON

ADVX

MOD

LK

MF

SIMPX

VF

LK

MF

SIMPX

140

Page 142: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 8

Criteria for the Distinction ofGrammatical Functions

8.1 Subcategorization of Verbs

The TuBa-D/Z-Verblist document1 lists all verbs occurring in the treebank with theirspecific subcategorization frames. This reference list guarantees the consistent annotationof grammatical functions. For a detailed description of constructing the verb list see(Hinrichs and Telljohann 2009).

8.2 Subcategorization of PREDs

Since constituents which predicates subcategorize for have grammatical function withina sentence, they are neither marked as PRED-MOD nor attached to the predicate itself.These constituents are attached to a field and assigned the respective grammatical func-tion like the constituent which is marked as FOPP and OPP in the following examples:

1In case of interest, please refer to web page http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html for contact information.

141

Page 143: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504 505

506 507 508

509 510

511

512

Für

APPR

a

den

ART

asm

Erfolg

NN

asm

des

ART

gsn

Volksbegehren

NN

gsn

sind

VAFIN

3pis

etwa

ADV

−−

243.000

CARD

−−

Unterschriften

NN

npf

erforderlich

ADJD

−−

.

$.

−−

− HD − HD HD HD HD HD

NX

HD

NX

VXFIN

HD

ADVX

ADJX

− HD

NX

HD

NX

ON

ADJX

PRED

PX

FOPP

VF

LK

MF

SIMPX

0 1 2 3 4 5 6 7 8 9 10

500 501 502 503 504

505 506 507 508

509 510

511

Die

ART

npm

älteren

ADJA

npm

Brüder

NN

npm

fühlen

VVFIN

3pis

sich

PRF

ap*3

für

APPR

a

ihre

PPOSAT

apf

Schwestern

NN

apf

sehr

ADV

−−

verantwortlich

ADJD

−−

.

$.

−−

HD HD HD − HD HD

ADJX

− HD

VXFIN

HD −

NX

HD

ADVX

− HD

NX

ON

NX

OA

PX

OPP

ADJX

PRED

VF

LK

MF

SIMPX

8.3 Distinction of FOPP, OPP, and V-MOD

One of the major problems is to distinguish, whether a given PP is an obligatory (OPP)or an optional (FOPP) complement of a specific verb in a specific reading, or whether itis a free adjunct (V-MOD) of that verb.

The TuBa-D/Z-Verblist is intended as a reference for these problematic cases.In the following, we will briefly describe what criteria have been used in order to

decide about the subcategorization with respect to PP complements/modifiers:

1. A PP is called OPP within a sentence if the sentence were ungrammatical withoutthe OPP (or if there was at least a very noticeable change of meaning). For instance,Sie gehen [OPP gegen die Faschisten] vor./ Das Gesetz ist [OPP in Kraft] getreten.

2. A PP is called FOPP if it can be left out of this specific sentence without causingungrammaticality (or a very noticeable change of meaning) and if its preposition isselected by this specific verb. For instance, Insgesamt berichtet die Polizei [FOPPvon 19 Festnahmen und 98 Ingewahrsamnahmen]./Spater wurden wir [FOPP uber

142

Page 144: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Auswandern] nachdenken. Here, the prepositions select these specific verbs and thePPs cannot be added to any arbitrary verb (which is possible for free adjuncts).In addition, in passive clauses, the subject of the original active clause, which hasthe form of a prepositional phrase, is marked as FOPP (Sie wurden [FOPP vonAutonomen] umringt.).

3. A PP is called V-MOD if its preposition is not selected by this specific verb, i.e., itcan be exchanged by any other modifying PP, and similarly, this PP can occur witharbitrary verbs (Nur [V-MOD im griechischen Lager] gab es Probleme). TypicalV-MODs are temporal or local adjuncts specifying time and location of the action,event, or state expressed by the verb.

8.4 Distinction of MOD, MOD-MOD, and V-MOD

A typical case of modification of modifiers is a temporal expression (V-MOD) that furtherspecifies another temporal expression (MOD-MOD) in the same clause:

1. [V-MOD am Samstag] finden [MOD-MOD ab 16 Uhr] Fuhrungen statt.[MOD-MOD Wann] finden [V-MOD am Samstag] Fuhrungen statt?

2. [MOD da] finden [V-MOD am Samstag] Fuhrungen statt.[V-MOD wann] finden [MOD da] Fuhrungen statt?[MOD dann] finden [V-MOD am Samstag] Fuhrungen statt.da, dann, etc. can be either temporal, causal, consequential, or local expressions.Thus, one cannot make sure whether the following time expression am Samstagreally refers to them. The only obvious observation is that the time expression is aV-MOD in any case.

For resumptive constructions (LV), there is also a clear criterion concerning the mod-ification relations. Within a verb-second clause, a modifier occurring in VF is MOD/X-MOD, whereas the modifier in LV is MOD-MOD, not vice versa, because the modifier inVF occurs within the core of the sentence, whereas the modifier in LV has to be licensedby some other constituent in the core sentence, e.g. Wenn da was gebucht worden ist,dann ist das nicht in Ordnung. (cf. 6.1.4).

8.5 Distinction of ON, PRED, ON-MOD, and

PRED-MOD

It is not always trivial to distinguish which constituent is ON, PRED, or ON-MOD forpredicative verbs. For this reason, a few criteria and examples are listed here that canbe of help. Here are some properties of ON and PRED:

1. Typically, PRED occurs in MF, whereas ON occurs in VF of verb-second clauses.This should be considered for annotation, if no other criterion (as described below)applies.

143

Page 145: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

2. Subject-verb agreement always has to be taken into account. For instance, if theverb is in plural form, the subject has to be plural as well.

3. If there is a suitable NP that could serve as subject, then this NP is annotated assubject rather than any other constituent with a different syntactic category (PP,ADVP, etc.).

For verb-second clauses, it is important to follow these two steps in exactly this orderto stick to the distributional criterion that has been chosen for the PRED/ON distinction:

1. Have a look at the constituent in VF. If it is an NP which might serve as subjectand if it agrees with the verb, annotate it as ON.

2. If it does not agree with the verb, annotate it as PRED (ADJP, ADVP, PP, etc.).

Examples:

1. [ON neue Wortschopfungen] sind [PRED es] nur.[PRED es] sind nur [ON neue Wortschopfungen].oder sind [PRED es] nur [ON neue Wortschopfungen].[PRED das] sind ohnehin [ON die schwachsten Partner].[ON die schwachsten Partner] sind [PRED das] ohnehin.oder sind [PRED das] ohnehin [ON die schwachsten Partner].

Subject-verb agreement suggests that neue Wortschopfungen und die schwachstenPartner are the subject, because of their plural form regardless in which field theyoccur.

2. [ON die Ursache] war [PRED unklar].[PRED unklar] war [ON die Ursache][ON Candan Ercettin] ist [PRED uberall].[PRED uberall] ist [ON Candan Ercettin].

ADJPs and ADVPs typically have PRED function when occurring together withpredicative verbs and NP subjects.

3. [PRED aus den Trauernden] wird [ON ein wutender Mob].

ein wutender Mob is considered the subject, because it is a noun phrase. Therefore,the prepositional phrase is PRED.

4. [ON das] ist [PRED eine einmalige Chance].[ON eine einmalige Chance] ist [PRED das].[ON es] ist [PRED der erste Besuch eines Papstes].[ON der erste Besuch eines Papstes] ist [PRED es].

144

Page 146: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

[ON Hauptauftraggeber] ist [PRED die Bremer Verwaltung].[ON die Bremer Verwaltung] ist [PRED Hauptauftraggeber].

The NP in VF position agrees with the verb and therefore has subject priority. Asa consequence, the constituent in MF is PRED.

5. [PRED wer] bin [ON ich].[PRED was] ist [ON das].

In w-questions, the interrogative pronoun is always PRED because here also theagreement rule applies.

6. [ON-MOD es] sei [PRED wichtig], [ON daß man ... ].[ON Aufgabe des Festspielhauses] sei [PRED-MOD es], [PRED das Haus spielfertigzu halten].

If a sentential subject or a sentential predicate occurs with an expletive es, theexpletive es is either ON-MOD or PRED-MOD (cf. 4.2.10).

145

Page 147: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Chapter 9

The TuBa-D/Z Data Formats

The TuBa-D/Z treebank is released in four different data formats :

1. the NEGRA Export format

2. the Penn Treebank format version 1 and version 2

3. the Export-XML format (incl. anaphora and coreference relations)

4. the CoNLL formats: 2006 (CoNLL-X), 2010, 2011/12, and CoNLL-U (v2)

9.1 The NEGRA Export Format

This format is provided by the annotation tool Annotate (Brants and Skut 1998), it iscreated automatically from the database underlying the annotation process in Annotate.The NEGRA Export format is a line-oriented pointer-based representation of the syn-tactic annotation. It is also the most complete data format since it preserves all theinformation available during the manual annotation. A more complete description of thenegra Export format can be found in (Brants 1997).

There are two versions of this format; NEGRA Export format 3 contains 3 layers of to-ken information (token, POS tags, morphology), lemmata are displayed in the ‘comment’column; NEGRA Export format 4 contains a 4th layer of lemma annotation.

An example of the NEGRA Export format 4 is given below, combined with the graph-ical representation of the syntactic annotation for the sentence ”Vikare mussen sich nachdem Kandidatengetz so verhalten, wie es von einem kunftigen Pfarrer erwartet werdenkann”.

146

Page 148: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Graphical representation (print out of the annotate tool):

509

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

500 501 502 503 504 505 506 507 508 509 510 511

512 513 514 515 516 517

518 519

520

521

522

523

Vikare

NN

npm

LM=Vikar

müssen

VMFIN

3pis

LM=müssen%aux

sich

PRF

ap*3

LM=#refl

nach

APPR

d

LM=nach

dem

ART

dsn

LM=das

Kandidatengetz

NN

dsn

LM=Kandidatengesetz Kandidatengesetz¦

so

ADV

−−

LM=so

verhalten

VVINF

−−

LM=verhalten

,

$,

−−

LM=,

wie

KOUS

−−

LM=wie

es

PPER

nsn3

LM=es

von

APPR

d

LM=von

einem

ART

dsm

LM=ein

künftigen

ADJA

dsm

LM=künftig

Pfarrer

NN

dsm

LM=Pfarrer

erwartet

VVPP

−−

LM=erwarten

werden

VAINF

−−

LM=werden%passiv

kann

VMFIN

3sis

LM=können%aux

.

$.

−−

LM=.

HD HD HD − HD HD HD − HD HD HD HD HD

NX

ON

VXFIN

HD −

NX

HD

VXINF

OV −

ADJX

− HD

VXINF

OV

VXINF

OV

VXFIN

HD

NX

OA

PX

V−MOD

ADVX

PRED −

NX

HD

NX

ON

PX

FOPP

C

MF

VC

SIMPX

PRED−MOD

VF

LK

MF

VC

NF

SIMPX

refvc

The first line of the sentence representation (marked as ’begin of sentence’ (BOS)includes the sentence id (here: 24539), the identity of the last annotator (here the onewith id 2), the time of the last modification (in UNIX format, i.e. seconds since 1/1/1970)and the id of the origin of the file (1146 points to article 155 of the edition of 11/7/1992).

In the right column, secondary edges (here: ’refvc’ pointing from node # 518 to node# 517, a dependency within the verbal complex) as well as corrections of misspellings(here: ’Kandidatengesetz’) are also represented. Optionally, there is a version of NEGRAExport format 4 that contains anaphoric relations (here: ’es’ is marked as expletive).

147

Page 149: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Export format 4:

#BOS 24539 2 1134150923 1146

Vikare Vikar NN npm HD 500

mussen mussen%aux VMFIN 3pis HD 502

sich #refl PRF ap*3 HD 504

nach nach APPR d - 506

dem das ART dsn - 505

Kandidatengetz Kandidatengesetz NN dsn HD 505 %% Kandidatengesetz

so so ADV -- HD 507

verhalten verhalten VVINF -- HD 509

, , $, -- -- 0

wie wie KOUS -- - 511

es es PPER nsn3 HD 512

von von APPR d - 515

einem ein ART dsm - 514

kunftigen kunftig ADJA dsm HD 513

Pfarrer Pfarrer NN dsm HD 514

erwartet erwarten VVPP -- HD 517

werden werden%passiv VAINF -- HD 518

kann konnen%aux VMFIN 3sis HD 519

. . $. -- -- 0

#500 -- NX -- ON 501

#501 -- VF -- - 523

#502 -- VXFIN -- HD 503

#503 -- LK -- - 523

#504 -- NX -- OA 508

#505 -- NX -- HD 506

#506 -- PX -- V-MOD 508

#507 -- ADVX -- PRED 508

#508 -- MF -- - 523

#509 -- VXINF -- OV 510

#510 -- VC -- - 523

#511 -- C -- - 521

#512 -- NX -- ON 516 %% R=expletive

#513 -- ADJX -- - 514

#514 -- NX -- HD 515

#515 -- PX -- FOPP 516

#516 -- MF -- - 521

#517 -- VXINF -- OV 520

#518 -- VXINF -- OV 520 refvc 517

#519 -- VXFIN -- HD 520

#520 -- VC -- - 521

#521 -- SIMPX -- PRED-MOD 522

#522 -- NF -- - 523

#523 -- SIMPX -- -- 0

#EOS 24539

The only deviation from context-freeness which the annotation scheme allows con-cerns the annotation of parentheses. Parentheses are annotated as separate trees withno attachment to surrounding trees. The following tree gives an example for such aphenomenon (for a more complete description of the annotation cf. 7.6).

148

Page 150: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Graphical representation:

0 1 2 3 4 5 6 7 8 9 10 11 12 13

500 501 502 503 504 505 506 507

508 509 510 511 512 513

514 515 516

517

So

ADV

−−

LM=so

etwas

PIS

***

LM=etwas

,

$,

−−

LM=,

sagen

VVFIN

3pis

LM=sagen

die

ART

np*

LM=der|die|das

Abgeordneten

NN

np*

LM=Abgeordneter|Abgeordnete|Abgeordnetes¦

,

$,

−−

LM=,

hätten

VAFIN

3pkt

LM=haben%aux

sie

PPER

np*3

LM=sie

auch

ADV

−−

LM=auch

noch

ADV

−−

LM=noch

nicht

PTKNEG

−−

LM=nicht

erlebt

VVPP

−−

LM=erleben

.

$.

−−

LM=.

HD HD − HD HD HD HD HD HD

ADVX

− HD

VXFIN

HD

NX

ON

VXFIN

HD

ADVX

− HD

VXINF

OV

NX

OA

LK

MF

SIMPX

NX

ON

ADVX

MOD

ADVX

MOD

VF

LK

MF

VC

SIMPX

The pointer-based representation of the NEGRA Export format separates informationabout the linear precedence of words from attachment information so that parenthesescan be represented naturally without having to resort to explicitly marking non-attachednodes. Here, the SIMPX node dominating the parenthesis (node #517) is marked as nothaving a mother node.

Export format 3:

#BOS 7307 5 1121695339 364

So ADV -- HD 500 %% LM=so

etwas PIS *** HD 501 %% LM=etwas

, $, -- -- 0 %% LM=,

sagen VVFIN 3pis HD 513 %% LM=sagen

die ART np* - 515 %% LM=der|die|das

Abgeordneten NN np* HD 515 %% LM=Abgeordneter|Abgeordnete|Abgeordnetes

, $, -- -- 0 %% LM=,

hatten VAFIN 3pkt HD 503 %% LM=haben%aux

sie PPER np*3 HD 505 %% LM=sie

auch ADV -- HD 506 %% LM=auch

noch ADV -- HD 507 %% LM=noch

nicht PTKNEG -- HD 508 %% LM=nicht

erlebt VVPP -- HD 510 %% LM=erleben

. $. -- -- 0 %% LM=.

#500 ADVX -- - 501

#501 NX -- OA 502

#502 VF -- - 512

#503 VXFIN -- HD 504

#504 LK -- - 512

#505 NX -- ON 509

#506 ADVX -- MOD 509

#507 ADVX -- - 508

#508 ADVX -- MOD 509

#509 MF -- - 512

#510 VXINF -- OV 511

#511 VC -- - 512

#512 SIMPX -- -- 0

#513 VXFIN -- HD 514

#514 LK -- - 517

#515 NX -- ON 516

#516 MF -- - 517

#517 SIMPX -- -- 0

#EOS 7307

149

Page 151: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

9.2 The Penn Treebank Format

There exist two versions of the Penn Treebank format which will be introduced in thefollowing sections.

9.2.1 The Penn Treebank Format Version 1

Version 1 is based on the format of the Penn Treebank (Mitchell et al. 1993). Theattachment of constituents is shown via bracketing and indentation. Thus, all constituentswhich show the same level of indentation are attached on the same level. In the PennTreebank format, grammatical functions, which are shown in the NEGRA Export formatin the column ”edge label”, are attached to the syntactic label via a colon. Thus, the label”NX:OA” means that the constituent is a noun phrase with the grammatical functionaccusative object.

The Penn Treebank format is a representation that combines the linear representationof words with their attachment to higher constituents. For this reason, this format isrestricted to completely context-free tree structures, i.e. it cannot adequately representthe annotation of parentheses in TuBa-D/Z. In order to capture the original syntacticannotation as well as the original word order in the sentence, it was decided to introducea new edge label to mark such cases: PAR. Thus, the sentence ”So etwas , sagen dieAbgeordneten , hatten sie auch noch nicht erlebt .”, as shown above is represented in thePenn Treebank format by the following bracketed structure:

Comments are preceded by a double ’%’ sign. The comment behind the structure isintended to help the reader locate the beginning of the parenthesis and it is not part ofthe actual data.

150

Page 152: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

%% sent. no. 7307

(

(SIMPX

(VF

(NX:OA

(ADVX

(ADV:HD So)

)

(PIS:HD etwas)

)

)

($, ,)

(SIMPX:PAR %% here starts the parenthesis!

(LK

(VXFIN:HD

(VVFIN:HD sagen)

)

)

(MF

(NX:ON

(ART die)

(NN:HD Abgeordneten)

)

)

)

($, ,)

(LK

(VXFIN:HD

(VAFIN:HD hatten)

)

)

(MF

(NX:ON

(PPER:HD sie)

)

(ADVX:MOD

(ADV:HD auch)

)

(ADVX:MOD

(ADVX

(ADV:HD noch)

)

(PTKNEG:HD nicht)

)

)

(VC

(VXINF:OV

(VVPP:HD erlebt)

)

)

)

($. .)

)

151

Page 153: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Commas, which are not attached to the tree, are indented on the highest level althoughthey are included in the bracketing of the constituent surrounding them. In the sentencebelow, e.g., the first comma is grouped into the noun phrase NX via word order. Theindentation, however, signals that the comma cannot necessarily be attached to this node.It is also conceivable that it may be attached to one of the lower nodes, NX or R-SIMPX.In the case of the second comma, there are even more possible attachment sites.

%% fragment of sent. no. 33

(

(R-SIMPX

(C

(NX:ON

(PRELS:HD die)

)

)

(MF

(NX:OA

(NX=ORG:HD

(ART:-NE die)

(NN:HD AWO)

)

($, ,)

(R-SIMPX

(C

(PX:V-MOD

(PWAV:HD wo)

)

)

(MF

(NX:ON

(PPER:HD er)

)

(NX:PRED

(NN:HD Kreisvorsitzender)

)

)

(VC

(VXFIN:HD

(VAFIN:HD ist)

)

)

)

)

)

($, ,)

(VC

(VXFIN:HD

(VVFIN:HD prufte)

)

)

)

($. .))

152

Page 154: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

9.2.2 The Penn Treebank Format Version 2

Version 2 of the Penn Treebank format has no unattached phrases. For comparison, thesame sentences used to demonstrate the Penn Treebank version 1 format are repeatedhere in version 2 format. Please note that in the data each sentence is on one line, andthat the multi-line formatting here is for the human reader only:

%% sent. no. 7307

(VROOT:--

(SIMPX:--

(VF:-

(NX:OA

(ADVX:-

(ADV:HD So)

)

(PIS:HD etwas)

)

)

($,:-- ,)

(SIMPX:--

(LK:-

(VXFIN:HD

(VVFIN:HD sagen)

)

(MF:-

(NX:ON

(ART:- die)

(NN:HD Abgeordneten)

)

)

)

($,:-- ,)

(LK:-

(VXFIN:HD

(VAFIN:HD hatten)

)

)

(MF:-

(NX:ON

(PPER:HD sie)

)

(ADVX:MOD

(ADV:HD auch)

)

(ADVX:MOD

(ADVX:-

(ADV:HD noch)

)

(PTKNEG:HD nicht)

)

)

(VC:-

(VXINF:OV

(VVPP:HD erlebt)

)

)

)

($.:-- .)

)

153

Page 155: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Sentence #33 fragment

(R-SIMPX:MOD

(C:-

(NX:ON

(PRELS:HD die)

)

)

(MF:-

(NX:OA

(NX=ORG:HD

(ART:-NE die)

(NN:HD AWO)

)

($,:-- ,)

(R-SIMPX:-

(C:-

(ADVX:V-MOD

(PWAV:HD wo)

)

)

(MF:-

(NX:ON

(PPER:HD er)

)

(NX:PRED

(NN:HD Kreisvorsitzender)

)

)

(VC:-

(VXFIN:HD

(VAFIN:HD ist)

)

)

)

)

)

($,:-- ,)

(VC:-

(VXFIN:HD

(VVFIN:HD prufte)

)

)

)

154

Page 156: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

9.3 The Export-XML Format

The XML format is a custom-made XML format that follows the NEGRA Export fileformat. It is designed to accommodate all original information provided in the Exportformat, including e.g. comments. Dominance relations between nodes are representeddirectly within the XML tree structure. Nodes without parent node (e.g. the sentencenode SIMPX and punctuation marks do not have any ”parent” attribute. They havethe edge label ”- -” that can be linked to an implicit root node. Thus, it is possible torepresent parentheses without the use of additional labels.

Anaphora is expressed by a link between two related nodes. Coreference sets thereforeare represented implicitly by chains of nodes that are part of a referential relation.

The following example shows the XML structure for the sentence “Schillen erklarte,sie werde als Kriegsgegnerin kandidieren”. The personal pronoun “sie” is anaphoric to theantecedent noun phrase “Schillen”. In the XML document, a <relation> tag is addedbelow each node that is part of a referential relation. It encodes the type of referentialrelation and the node ID of the antecedent node. In our example, the antecedent is thenode with ID s1723 500, that is the NX dominating the named entity “Schillen”. ThisNX in turn is in a coreferential relationship with node s1721 11 (word number 11 insentence 1721), thus part of a coreference chain.

For extensive documentation of the ExportXML format as well as a Java API forreading the format, please visit the webpage at: http://www.sfs.uni-tuebingen.de/en/

ascl/resources/corpora/export-format.html

Graphical representation of the tree without annotation of the referential relation:

0 1 2 3 4 5 6 7 8

500 501 502 503 504 505

506 507 508 509 510 511

512

513

514

Schillen

NE

nsf

LM=Schillen

erklärte

VVFIN

3sit

LM=erklären

,

$,

−−

LM=,

sie

PPER

nsf3

LM=sie

werde

VAFIN

3sks

LM=werden%aux

als

KOKOM

−−

LM=als

Kriegsgegnerin

NN

nsf

LM=Kriegsgegnerin

kandidieren

VVINF

−−

LM=kandidieren

.

$.

−−

LM=.

HD HD HD HD − HD HD

NX

ON

VXFIN

HD

NX

ON

VXFIN

HD

NX

PRED

VXINF

OV

VF

LK

MF

VC

SIMPX

OS

VF

LK

NF

SIMPX

155

Page 157: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

XML format including the referential relation:

<sentence xml:id="s1723">

<node xml:id="s1723_514" cat="SIMPX" func="--">

<node xml:id="s1723_501" cat="VF" func="-" parent="s1723_514">

<node xml:id="s1723_500" cat="NX" func="ON" parent="s1723_501">

<relation type="coreferential" target="s1721_11"/>

<ne xml:id="ne_29507" type="PER">

<word xml:id="s1723_1" form="Schillen" pos="NE" morph="nsf" lemma="Schillen" func="HD" parent="s1723_500"/>

</ne>

</node>

</node>

<node xml:id="s1723_503" cat="LK" func="-" parent="s1723_514">

<node xml:id="s1723_502" cat="VXFIN" func="HD" parent="s1723_503">

<word xml:id="s1723_2" form="erklarte" pos="VVFIN" morph="3sit" lemma="erklaren" func="HD" parent="s1723_502"/>

</node>

</node>

<word xml:id="s1723_3" form="," pos="$," lemma="," func="--"/>

<node xml:id="s1723_513" cat="NF" func="-" parent="s1723_514">

<node xml:id="s1723_512" cat="SIMPX" func="OS" parent="s1723_513">

<node xml:id="s1723_505" cat="VF" func="-" parent="s1723_512">

<node xml:id="s1723_504" cat="NX" func="ON" parent="s1723_505">

<relation type="anaphoric" target="s1723_500"/>

<word xml:id="s1723_4" form="sie" pos="PPER" morph="nsf3" lemma="sie" func="HD" parent="s1723_504"/>

</node>

</node>

<node xml:id="s1723_507" cat="LK" func="-" parent="s1723_512">

<node xml:id="s1723_506" cat="VXFIN" func="HD" parent="s1723_507">

<word xml:id="s1723_5" form="werde" pos="VAFIN" morph="3sks" lemma="werden%aux" func="HD" parent="s1723_506"/>

</node>

</node>

<node xml:id="s1723_509" cat="MF" func="-" parent="s1723_512">

<node xml:id="s1723_508" cat="NX" func="PRED" parent="s1723_509">

<word xml:id="s1723_6" form="als" pos="KOKOM" lemma="als" func="-" parent="s1723_508"/>

<word xml:id="s1723_7" form="Kriegsgegnerin" pos="NN" morph="nsf" lemma="Kriegsgegnerin" func="HD"

parent="s1723_508"/>

</node>

</node>

<node xml:id="s1723_511" cat="VC" func="-" parent="s1723_512">

<node xml:id="s1723_510" cat="VXINF" func="OV" parent="s1723_511">

<word xml:id="s1723_8" form="kandidieren" pos="VVINF" lemma="kandidieren" func="HD" parent="s1723_510"/>

</node>

</node>

</node>

</node>

</node>

<word xml:id="s1723_9" form="." pos="$." lemma="." func="--"/>

</sentence>

156

Page 158: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

9.4 The CoNLL Format (CoNLL-X 2006, 2010, 2011/2012,

CoNLL-U v2)

This section describes the CoNLL formats in which the data is available. All CoNLLdata is automatically generated from the Export-XML format.

9.4.1 The CoNLL-X 2006 Format

The CoNLL format contains a dependency version of TuBa-D/Z in the format of theCoNLL-X shared task. The conversion was done automatically, but is oriented at theannotation guidelines by Foth (2006).

The CoNLL format is a table format containing a series of tab-separated lines. Eachline that represents a word contains the following information:

1. ID – a sequential ID for each token

2. FORM – the word form (token)

3. LEMMA – the gold standard lemma of the token

4. CPOSTAG – simplified part-of-speech tag

5. POSTAG – part-of-speech tag according to STTS tag set

6. FEATS – tag with morphological information

7. HEAD – regent of the token in the dependency analysis, or “0” for tokens withoutregent

8. DEPREL – dependency relation between the token and its regent, or “ROOT” fortokens without regent

9. PHEAD – Projective head of current token (always ” ”)

10. PDEPREL – Dependency relation to PHEAD (always ” ”)

Sentence 1723 in CoNLL 2006 format:

1 Schillen Schillen N NE nsf 2 SUBJ _ _

2 erklarte erklaren V VVFIN 3sit 0 ROOT _ _

3 , , $, $, -- 2 -PUNCT- _ _

4 sie sie PRO PPER nsf3 5 SUBJ _ _

5 werde werden%aux V VAFIN 3sks 2 S _ _

6 als als KOKOM KOKOM -- 8 KOM _ _

7 Kriegsgegnerin Kriegsgegnerin N NN nsf 6 CJ _ _

8 kandidieren kandidieren V VVINF -- 5 AUX _ _

9 . . $. $. -- 8 -PUNCT- _ _

See http://anthology.aclweb.org/W/W06/W06-2920.pdf for more details about theCoNLL-X 2006 format.

157

Page 159: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

9.4.2 The CoNLL 2010 Format

The CoNLL 2010 format was used in the SemEval-2010 shared task, Coreference Reso-lution In Multiple Languages, and contains the following columns:

1. ID - word identifiers in the sentence

2. TOKEN - word forms

3. LEMMA - word lemmas (gold standard manual annotation)

4. *PLEMMA - word lemmas predicted by an automatic analyzer

5. POS - coarse part of speech

6. *PPOS - same as 5 but predicted by an automatic analyzer

7. FEAT - morphological features (part of speech type, number, gender, case, tense,aspect, degree of comparison, etc., separated by the character ”—”)

8. *PFEAT - same as 7 but predicted by an automatic analyzer

9. HEAD - ID of the syntactic head (’0’ if the word is the root of the tree)

10. *PHEAD - same as 9 but predicted by an automatic analyzer

11. DEPREL - dependency relation labels corresponding to the dependencies describedin 9

12. *PDEPREL - same as 11 but predicted by an automatic a nalyzer

13. NE - named entities

14. *PNE - same as 13 but predicted by a named entity recognizer

15. *PRED - predicates are marked and annotated with a semantic class label

16. *PPRED - Same as 13 but predicted by an automatic analyzer

17. COREF - coreference annotation in open-close notation

Columns marked with ”*” are always filled with ” ”, since they are either predictedvalues or the information is not available in the TuBa-D/Z. These columns are includedto conform to the format.

1 Schillen Schillen _ NE _ nsf _ 2 _ SUBJ _ (PER) _ _ _ (0)

2 erklarte erklaren _ VVFIN _ 3sit _ 0 _ ROOT _ * _ _ _ _

3 , , _ $, _ _ _ 0 _ ROOT _ * _ _ _ _

4 sie sie _ PPER _ nsf3 _ 5 _ SUBJ _ * _ _ _ (0)

5 werde werden%aux _ VAFIN _ 3sks _ 2 _ S _ * _ _ _ _

6 als als _ KOKOM _ _ _ 8 _ KOM _ * _ _ _ _

7 Kriegsgegnerin Kriegsgegnerin _ NN _ nsf _ 6 _ CJ _ * _ _ _ _

8 kandidieren kandidieren _ VVINF _ _ _ 5 _ AUX _ * _ _ _ _

9 . . _ $. _ _ _ 0 _ ROOT _ * _ _ _ _

See http://stel.ub.edu/semeval2010-coref/datasets for more details about the CoNLL2010 format.

158

Page 160: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

9.4.3 The CoNLL 2011/2012 Format

The CoNLL 2011/2012 format contains the following columns:

1. Document ID - the newspaper article id in the form TYYMMDD.articleNumber

2. Part Number - the GLOBAL sentence ID. Numbering does not restart within eachdocument, therefore the part number corresponds the sentence ID in the treebank.Thus a document should be solely identified by the doc ID.

3. Word number

4. Word itself

5. Part-of-Speech

6. Parse bit - represents parse in a bracketed structure

7. Predicate lemma - the lemma of every token is represented

8. *Predicate Frameset ID - PropBank frameset ID of the predicate in column 7

9. Word sense - GermaNet ID of the word sense

10. *Speaker

11. Named Entities

12. *Predicate Arguments

13. Coreference - coreference chain information

Columns marked with ”*” are always filled with ”-”, since they are either predictedvalues or the information is not available in the TuBa-D/Z. These columns are includedto conform to the format.

T990507.136 1723 1 Schillen NE (VROOT:--(SIMPX:--(VF:-(NX=PER:ON*)) Schillen - - - (PER) - (0)

T990507.136 1723 2 erklarte VVFIN (LK:-(VXFIN:HD*)) erklaren - - - * - -

T990507.136 1723 3 , $, * , - - - * - -

T990507.136 1723 4 sie PPER (NF:-(SIMPX:OS(VF:-(NX:ON*)) sie - - - * - (0)

T990507.136 1723 5 werde VAFIN (LK:-(VXFIN:HD*)) werden%aux - - - * - -

T990507.136 1723 6 als KOKOM (MF:-(NX:PRED* als - - - * - -

T990507.136 1723 7 Kriegsgegnerin NN *)) Kriegsgegnerin - - - * - -

T990507.136 1723 8 kandidieren VVINF (VC:-(VXINF:OV*))))) kandidieren - - - * - -

T990507.136 1723 9 . $. *) . - - - * - -

See http://conll.cemantix.org/2012/data.html for more details about the CoNLL2011/2012 format.

159

Page 161: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

9.4.4 The CoNLL-U v2 Format

The CoNLL-U format is designed to represent dependency treebanks in a consistentmanner across languages. CoNLL-U is a tabular format, where each line representing aword contains exactly 10 columns:

1. ID: Word number

2. FORM: Word itself

3. LEMMA: Word lemma

4. UPOSTAG: Universal part-of-speech tag

5. XPOSTAG: POS tag from the source TuBa-D/Z data

6. FEATS: List of morphological features from the universal feature inventory

7. HEAD: Head of the current word, which is either an ID or zero (0).

8. DEPREL: Universal dependency relation to the HEAD

9. *DEPS: Enhanced dependency graph in the form of a list of head-deprel pairs.

10. MISC: Zero or more of the following annotations appear here:

(a) TopoField: topological field

(b) Typo: typo correction

(c) Morph: morphological features from the source TuBa-D/Z data

(d) NETYPE: named entity type

(e) LexUnit: GermaNet ID of the word sense (only partially annotated, see http:

//www.sfs.uni-tuebingen.de/GermaNet/corpora.shtml)

Columns marked with “*” are always filled with “ ”, since they are not available in theTuBa-D/Z. The Document IDs are specified according to the format guidelines, whereeach newspaper article represents a document. Further information on the conversionprocess can be found in Coltekin et al. (2017). The conversion software is published athttps://github.com/SfS-ASCL/TuebaUdConverter, which may also be useful for convert-ing earlier versions of the treebank.

See http://universaldependencies.org/ for more details about the CoNLL-U format.Please note that the version 2 guidelines are used.

160

Page 162: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

References

Bech, G. 1955–57. Studien uber das deutsche Verbum infinitum. Kopenhagen. 2 Bande.2. unveranderte Auflage 1983 mit einem Vorwort von Catharine Fabricius-Hansen.Tubingen: Max Niemeyer.

Behaghel, O. 1932. Deutsche Syntax (Eine geschichtliche Darstellung), Band 4. Heidel-berg: Carl Winter.

Brants, T., and W. Skut. 1998. Automation of treebank annotation. In Proceedingsof the Conference on New Methods in Language Processing (NeMLaP-3/CoNLL98),January 14-17, 1998, Sydney, Australia, 49–57. Sydney.

Brants, T. 1997. The NeGra Export Format for Annotated Corpora. Universitat desSaarlandes, Computational Linguistics, Saarbrucken, Germany.

Brants, T. 1998. TnT – A Statistical Part-of-Speech Tagger. Universitat des Saarlandes,Computational Linguistics, Saarbrucken, Germany.

Bresnan, J. 1995. Lexicality and Argument Structure. In Invited Paper given at the ParisSyntax and Semantics Conference. Paris. October 12-14, 1995.

Coltekin, C., B. Campbell, E. Hinrichs, and H. Telljohann. 2017. Converting the TuBa-D/Z treebank of German to Universal Dependencies. In Proceedings of the NoDaLiDa2017 Workshop on Universal Dependencies (UDW 2017), 27–37, Gothenburg, Sweden.

Drach, E. 1937. Grundgedanken der Deutschen Satzlehre. Frankfurt/Main.

Drosdowski, G. (Ed.). 1995. Duden ”Die Grammatik der deutschen Gegenwartssprache”.Mannheim, Leipzig, Wien, Zurich: Dudenverlag.

Eisenberg, P. 1999–2001. Grundriß der deutschen Grammatik, Band 2: Der Satz.Stuttgart, Weimar: J.B. Metzler.

Engel, U. 1996. Deutsche Grammatik. Heidelberg: Julius Groos Verlag.

Erdmann, O. 1886. Grundzuge der deutschen Syntax nach ihrer geschichtlichen Entwick-lung dargestellt. Stuttgart: Cotta. Erste Abteilung.

Foth, K. A. 2006. Eine umfassende Constraint-Dependenz-Grammatik des Deutschen.Technical report, Fachbereich Informatik der Universitat Hamburg.

Grewendorf, G. 1991. Aspekte der deutschen Syntax. Vol. 33 of Studien zur deutschenGrammatik. Tubingen: Gunter Narr Verlag.

Helbig, G., and J. Buscha. 1998. Deutsche Grammatik. Ein Handbuch fur denAuslanderunterricht. Leipzig: Verl. Enzyklopadie. 18 edition.

Herling, S. H. A. 1821. Uber die Topik der deutschen Sprache. In Abhandlungen des frank-furterischen Gelehrtenvereins fur deutsche Sprache, 296–362, 394. Frankfurt/Main.Drittes Stuck.

161

Page 163: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Hinrichs, E. W., and H. Telljohann. 2009. Constructing a Valence Lexicon for a Treebankof German. In Proceedings of the Seventh International Workshop on Treebanks andLinguistic Theories (TLT 7): January 23-24, 2009, Groningen, The Netherlands.URL: http://www.let.rug.nl/tlt/.

Hinrichs, E. W., J. Bartels, Y. Kawata, V. Kordoni, and H. Telljohann. 2000. TheTubingen Treebanks for Spoken German, English, and Japanese. In W. Wahlster(Ed.), Verbmobil: Foundations of Speech-to-Speech Translation. Berlin: Springer.

Hohle, T. N. 1986. Der Begriff ‘Mittelfeld’. Anmerkungen uber die Theorie der topolo-gischen Felder. In A. Schone (Ed.), Kontroversen alte und neue. Akten des 7. Inter-nationalen Germanistenkongresses Gottingen, 329–340. Tubingen: Niemeyer.

Kathol, A. 1995. Linearization-Based German Syntax. PhD thesis, Ohio State University.

Kiss, T. 1995. Infinitive Komplementation. Neue Studien zum deutschen Verbum infini-tum. Tubingen: Max Niemeyer.

Kubler, S., and H. Telljohann. 2002. Towards a dependency-based evaluation for partialparsing. In Beyond PARSEVAL – Towards Improved Evaluation Measures for ParsingSystems – (LREC 2002 Workshop), Las Palmas, Gran Canaria, June 2002.

Mitchell, M., B. Santorini, and M. A. Marcinkiewicz. 1993. Building a Large AnnotatedCorpus of English: The Penn Treebank. Computational Linguistics 19(2):313–330.

Naumann, K., and V. Moller. 2007. Manual for the Annotation of in-document ReferentialRelations. University of Tubingen, May 2007.

Plaehn, O. 1998. Annotate – Bedienungsanleitung. FR 8.7 Computerlinguistik, Projekt C3Nebenlaufige Grammatische Verarbeitung, Sonderforschungsbereich 378, Ressource-nadaptive Kognitive Prozesse, 13. April 1998. URL: http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/annotate.html.

Putz, H. 1986. Uber die Syntax der Pronominalform ’es’ im modernen Deutsch. Tubingen:Stauffenburg. 2 edition.

Schiller, A., S. Teufel, and C. Thielen. 1995. Guidelines fur das Tagging deutscher Textcor-pora mit STTS. Technical report, Universities of Stuttgart and Tubingen.

Schnorr, V. 1991. Problems of Lemmatization in the Bilingual Dictionary. In F. J.Hausmann, O. Reichmann, H. E. Wiegand, and L. Zgusta (Eds.), Worterbucher Eininternationales Handbuch zur Lexikographie, Dritter Teilband, 2813–2817. Walter deGruyter, Berlin, New York.

Stegmann, R., H. Telljohann, and E. W. Hinrichs. 2000. Stylebook for the GermanTreebank in verbmobil. Verbmobil-report 239, University of Tubingen.

Telljohann, H., E. W. Hinrichs, S. Kubler, H. Zinsmeister, and K. Beck. 2015. Stylebookfor the Tubingen Treebank of Written German (TuBa-D/Z). University of Tubingen,August 2015.

162

Page 164: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Trushkina, J. 2004. Morpho-syntactic annotation and dependency parsing of German.PhD thesis, University of Tubingen.

Versley, Y., K. Beck, E. W. Hinrichs, and H. Telljohann. 2010. A Syntax-first Approachto High-quality Morphological Analysis and Lemma Disambiguation for the TuBa-D/Z Treebank. In Proceedings of the Ninth International Workshop on Treebanksand Linguistic Theories (TLT 9): December 3-4, 2010, Tartu, Estonia. URL: http://www.math.ut.ee/tlt9/.

163

Page 165: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

Index

accusative object, double, 89AcI, 88adverbial adjective, 21, 74, 76adverbial phrase, 25, 79ambiguity, 18, 31, 40, 96, 98–101, 120apposition, 26, 49attributive adjective, 21, 32, 42, 43, 74,

77

C-field, 16, 25, 102, 105, 109cardinal numbers, 21, 47, 64, 65circumposition, 21, 74coherency, 87, 88comparatives, 13, 130CoNLL format, 146context-freeness, 12coordination, 13, 15, 19, 20, 25, 44, 72,

77, 112, 114–116, 118–124, 126,132

Dependency Grammar, 32, 71determiner phrase, 25, 70discourse marker, 11, 13, 22, 25, 32, 137,

138

edge labels, 12, 17, 19, 20, 26–28, 31, 32,97, 112

elliptical construction, 13, 19, 22, 112,127, 128, 136

Ersatzinfinitiv, 16, 25, 83, 84expletive, 26, 67, 68, 145export-XML format, 146

flat clustering principle, 18, 40, 81, 102foreign language material, 21, 53, 63

headline, 11, 13, 22, 83, 96, 97, 135

high attachment principle, 18, 40, 118,132

imperative, 22, 89incoherency, 88infinitives with zu, 85, 86, 88initial field, 14, 16, 25, 99isolated phrase, 28, 29, 100, 101, 124

KOORD-field, 16, 17, 25, 104, 112

lassen, 88lemma, 19, 33–38, 146lemmatization, 33–38levels of annotation, 18, 19long-distance dependency, 12, 31, 105,

131longest match principle, 18, 22, 30, 122

modal verbs, 22, 94

named entities, 12, 20, 27, 48, 56–60, 63Negra export format, 146Negra treebank, 10node labels, 12, 19, 20, 25, 27, 53, 83,

101, 119, 137nominalized adjective, 76non-ambiguity, 31, 97non-words, 22, 66

ordinal numbers, 64

paratactic construction, 25, 122, 124parenthesis, 11, 13, 22, 139PARORD-field, 16, 17, 25, 104, 105, 124part-of-speech tags, 12, 19, 28, 66, 72particle verb, 91Penn Treebank format, 146

164

Page 166: Stylebook for the Tubingen Treebank of Written German ......Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kubler, Heike

postmodifier, 18, 40, 46, 48, 50, 55, 60,62, 79, 99, 109, 110, 130

postnominal modifier, 46, 47postposition, 21, 74predicate, 91, 92, 141, 145predicate-argument structure, 11, 17predicative adjective, 21, 74, 75, 133preference principle, 97, 135premodifier, 40, 43, 60–62, 64, 75, 76, 78,

79, 99, 120prenominal modifier, 42, 44, 46preposition, 21, 32, 41, 71, 72, 142, 143proper noun, 21, 27, 40, 41, 44, 52, 53,

56, 63punctuation marks, 22, 28, 49, 99, 100

relative clause, 16, 25, 98, 109, 110relative clause, event-modifying, 111relative clause, independent, 111resumptive construction, 15, 16, 25, 105,

143reusability, 10, 11

secondary edge label, 12, 20, 26, 31, 81,82, 88, 109

split coordination, 26, 112, 126superlative forms, 130syntactic dependencies, 12syntactic-semantic node labels, 20, 57, 58

TuBa-D/S treebank, 10TuBa-D/Z data formats, 10, 146theory-neutrality, 10, 11TIGER treebank, 10topicalization, 13, 134topological fields, 12–20, 31, 32, 42, 92,

96, 102, 119, 127, 128truncated word, 22, 116

verb complex, 12, 14–16, 25, 81–83, 85,87, 88, 91, 110, 134

verb particle, 22, 90VERBMOBIL treebank, 10, 28

165