1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

34
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia ?1 , Mengyun Shi ?1,4 , Mikhail Sirotenko ?3 , Yin Cui ?3 , Claire Cardie 1 , Bharath Hariharan 1 , Hartwig Adam 3 , Serge Belongie 1,2 1 Cornell University 2 Cornell Tech 3 Google Research 4 Hearst Magazines Abstract. In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute cat- egorization (recognize one or multiple attributes). The proposed task requires both localizing an object and describing its properties. To illus- trate the various aspects of this task, we focus on the domain of fashion and introduce Fashionpedia as a step toward mapping out the visual aspects of the fashion world. Fashionpedia consists of two parts: (1) an ontology built by fashion experts containing 27 main apparel categories, 19 apparel parts, 294 fine-grained attributes and their relationships; (2) a dataset with everyday and celebrity event fashion images annotated with segmentation masks and their associated per-mask fine-grained at- tributes, built upon the Fashionpedia ontology. In order to solve this challenging task, we propose a novel Attribute-Mask R-CNN model to jointly perform instance segmentation and localized attribute recogni- tion, and provide a novel evaluation metric for the task. Fashionpedia is available at: https://fashionpedia.github.io/home/. Keywords: Dataset, Ontology, Instance Segmentation, Fine-Grained, Attribute, Fashion 1 Introduction Recent progress in the field of computer vision has advanced machines’ ability to recognize and understand our visual world, showing significant impacts in fields including autonomous driving [52], product recognition [32,14], etc . These real-world applications are fueled by various visual understanding tasks with the goals of naming, describing (attribute recognition), or localizing objects within an image. Naming and localizing objects is formulated as an object detection task (Fig- ure 1(a-c)). As a hallmark for computer recognition, this task is to identify and indicate the boundaries of objects in the form of bounding boxes or segmentation masks [12,37,17]. Attribute recognition [6,29,9,36] (Figure 1(d)) instead focuses ? equal contribution. arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Transcript of 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Page 1: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia: Ontology, Segmentation, and anAttribute Localization Dataset

Menglin Jia?1, Mengyun Shi?1,4, Mikhail Sirotenko?3, Yin Cui?3,Claire Cardie1, Bharath Hariharan1, Hartwig Adam3, Serge Belongie1,2

1Cornell University 2Cornell Tech 3Google Research 4Hearst Magazines

Abstract. In this work we explore the task of instance segmentationwith attribute localization, which unifies instance segmentation (detectand segment each object instance) and fine-grained visual attribute cat-egorization (recognize one or multiple attributes). The proposed taskrequires both localizing an object and describing its properties. To illus-trate the various aspects of this task, we focus on the domain of fashionand introduce Fashionpedia as a step toward mapping out the visualaspects of the fashion world. Fashionpedia consists of two parts: (1) anontology built by fashion experts containing 27 main apparel categories,19 apparel parts, 294 fine-grained attributes and their relationships; (2)a dataset with everyday and celebrity event fashion images annotatedwith segmentation masks and their associated per-mask fine-grained at-tributes, built upon the Fashionpedia ontology. In order to solve thischallenging task, we propose a novel Attribute-Mask R-CNN model tojointly perform instance segmentation and localized attribute recogni-tion, and provide a novel evaluation metric for the task. Fashionpedia isavailable at: https://fashionpedia.github.io/home/.

Keywords: Dataset, Ontology, Instance Segmentation, Fine-Grained,Attribute, Fashion

1 Introduction

Recent progress in the field of computer vision has advanced machines’ abilityto recognize and understand our visual world, showing significant impacts infields including autonomous driving [52], product recognition [32,14], etc. Thesereal-world applications are fueled by various visual understanding tasks with thegoals of naming, describing (attribute recognition), or localizing objects withinan image.

Naming and localizing objects is formulated as an object detection task (Fig-ure 1(a-c)). As a hallmark for computer recognition, this task is to identify andindicate the boundaries of objects in the form of bounding boxes or segmentationmasks [12,37,17]. Attribute recognition [6,29,9,36] (Figure 1(d)) instead focuses

? equal contribution.

arX

iv:2

004.

1227

6v2

[cs

.CV

] 1

8 Ju

l 202

0

Page 2: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

2 M. Jia et al.

!"#$ !%# !&#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!'#

!"#$%&'()*&+),-- !"#$%&'()*$+,)%-+.+/0+.1 ()*$+,)%!"$$)#. 2+,0&3)$$) 45).+.1%(65)7).1$0 8+9:.";) <"+/$,+.)

!"#$%&

'()*

=.:,)%,).1$0

26;;)$#+9",

=>&?)@$0)@0+5%,).1$0

2+.1,)@>#)"/$)A

B#&55)A%/0&3,A)#

C)13,"#%D'+$E

!,"+.

8)9:,+.)

+,(%

2,))?)2,))?) !&9:)$!&9:)$

=>&?)@$0)@0+5%,).1$0

-,6%D45).+.1E

2,+;%D'+$E%

C)13,"#%D'+$E

26;;)$#+9",

<"/0)A

B+/$#)//)A%

!,"+.

!,"+.

8&#;",%<"+/$

F&,,"#

-".&*

/0"**%*

1"2

!&9:)$

+,(%

3.*%450%

G./$".9)%2)1;).$"$+&.

7&9",+H)A%=$$#+>3$)

.(('%$%&'(), I

26;;)$#+9",

=>&?)@$0)@0+5%,).1$0

2+.1,)@>#)"/$)A

B#&55)A%/0&3,A)#

C)13,"#%D'+$E

!,"+.

<"/0)A

=>&?)@$0)@0+5%,).1$0

C)13,"#%D'+$E

!,"+.

=.:,)%,).1$0

-,6%D45).+.1E

2,+;%D'+$E%

26;;)$#+9",

B+/$#)//)A%

!,"+.

8&#;",%<"+/$

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATT

RIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_A

TTRIB

UTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATT

RIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_AT

TRIB

UTE

HAS_

ATTR

IBUT

EHA

S_AT

TRIB

UTE

HAS_

ATTR

IBUT

EHA

S_AT

TRIB

UTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRI…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATT

RIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

EHA

S_AT

TRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

I…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

EHA

S_AT

TRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_A

TTRIB

UTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

Asymm-etrical

Symm-etrical

Peplum

Circle

Flare

Fit and Flare

Trumpet

Merm-aid

Balloon /Bubble

Bell

Bell Bottom

Bootcut /Boot cut

Peg

Pencil

Straight

A-Line / ALine

Tent /Trapeze

Baggy

WideLeg

HighLow

Curved (fit)

Tight (fit) /Slim (fit) /Skinny (fit)

Regul-ar (fit)

Loose (fit)

Oversized /Oversize

EmpireWaistline

DroppedWaistline /

Highwaist /

NormalWaist /

Normal-w…

Lowwaist /Low Rise

Basque(wasitline)

No waistline

Above-the-…

Hip (length)

Micro (length)

Mini(length) /Mid-thigh(length)

Above-the-k…

Knee(Length) /Knee High

(length)

Below TheKnee

(length) /Below-th…

Midi /Mid-calf /mid calf /

Tea-length /tea len…

Maxi(length) /

Ankle(length)

Floor(length) / Full

(length)

Sleeveless

Short(length)

Elbow-length

Threequarter

(length) /

Wristlength

Singlebreasted

Doublebreasted

Lace Up /Lace-up

Wrapping /Wrap

Zip-up

Fly (Opening)

Chained(Opening)

Buckled(Opening)

Toggled(Opening)

noopening

Plastic

Rubber

Metal

Straw Feather

Gem /Gemstone

Bone

Ivory

Fur / Faux-fur

Leather /Faux-leather

Suede

Shearling

Crocodile

Snakeskin

Wood

nonon-textilematerial

Burnout

Distressed /Ripped

Washed

Embossed

FrayedPrinted

Ruched /Ruching

Quilted

Pleat /pleated

Gathering /gathered

Smocking /Shirring

Tiered /layered

Cutout

Slit

Perfor-ated

Lining / lined

Applique /Embroidery /

Patch

Bead

Rivet / Stud /Spike

Sequin

no specialmanufactur…

Plain(pattern)

Abstract

Cartoon

Letters &Numbers

Camou-flage

Check / Plaid/ Tartan

Dot

Fair Isle

Floral

Geometric

Paisley

Stripe

Houndstooth(pattern)

Herringbone(pattern)

Chevron

Argyle

Animal

Leopard

Snakeskin(pattern)

Cheetah

Peacock

Zebra

Giraffe

Toile de Jouy

Plant

Classic(T-shirt)

Polo (Shirt)

Under-shirt

Henley(Shirt)

Ringer(T-shirt)

Raglan(T-shirt)

Rugby (Shirt)

Sailor (Shirt)

Crop (Top) /Midriff (Top)

Halter (Top)

Camisole

Tank (Top)

Peasant(Top)

Tube (Top) /bandeau

(Top)

Tunic (Top)

Smock (Top)

Hoodie

Blazer

Pea (Jacket)

Puffer(Jacket) /

Down(Jacket)

Biker(Jacket) /

Moto(Jacket)

Trucker(Jacket) /

Denim(Jacket)

Bomber(Jacket)

Anorak

Safari(Jacket) /

Utility(Jacket) /

Cargo

Mao (Jacket)

Nehru(Jacket)

Norfolk(Jacket)

Classicmilitary(Jacket)

Track(Jacket)

Windbreaker Chanel(Jacket)

Bolero

Tuxedo(Jacket)

Varsity(Jacket)

Crop (Jacket)

Jeans

Sweatpants/ Jogger(Pants)

Leggings

Hip-huggers(Pants) / Hiphuggers (…

Cargo(Pants) /Military(Pants)

Culottes

Capri (Pants)

Harem(Pants)

Sailor (Pants)

Jodhpur /Breeches(Pants)

Peg (Pants)

Camo(Pants)

Track (Pants)

Crop (Pants)

Short(Shorts) / Hot

(Pants)

Booty(Shorts)

Berm-uda(Shorts)

Cargo(Shorts)

Trunks

Boardshorts/ Board(Shorts)

Skort

Roll-Up(Shorts) /Boyfriend(Shorts)

Tie-up(Shorts)

Culotte(Shorts)

Lounge(Shorts)

Bloom-ers

Tutu (Skirt) /Ballerina

(Skirt)

Kilt

Wrap (Skirt)

Skater (Skirt)

Cargo (Skirt)

Hobble(Skirt) /Wiggle(Skirt)

Sheath (Skirt)

Ball Gown(Skirt)

Gypsy(Skirt) /

Broomstick(Skirt)

Rah-rah(Skirt)

Prairie(Skirt)

Flame-nco

(Skirt)

Accordion(Skirt)

Sarong(Skirt)

Tulip (Skirt)

Dirndl(Skirt)

Godet (Skirt)

Blanket(Coat)

Parka

Trench (Coat)

Pea (Coat) /Reefer (Co…

Shearling(Coat)

TeddyBear (Coat)

/ Teddy(Coat) / Fur

(Coat)

Puffer (Coat)

Duster(Coat)

Raincoat

Kimono

Robe

Dress(Coat

Duffle(Coat) /

Duffel (Co…

Wrap (Coat)

Military(Coat)Swing

(Coat)

Halter(Dress)

Wrap (Dress)

Chemise(Dress)

Slip(Dress)

Cheongsams/ Qipao

Jumper(Dress)

Shift(Dress)

Sheath(Dress)

Shirt(Dress)

Sundress /Sun (Dress)

Kaftan /Caftan

Bodycon(Dress)

Nightgown

Gown

Sweater(Dress)

Tea (Dress)

Blouson(Dress)

Tunic (Dress)

Skater(Dress)

Asymmetric(Collar)

Regular(Collar)

Shirt (Collar)

Polo(Collar) / FlatKnit (Collar)

Chelsea(Collar)

Banded(Collar)

Mandarin(Collar) /Nehru(Collar)

PeterPan

(Collar)

Bow(Collar) / Tie

(Collar) /Ascot

(Collar)

Stand-away(Collar)

Jabot (Collar)

Sailor (Collar)

Oversized(Collar)

Notched(Lapel)

Peak (Lapel)

Shawl (Lapel)

Napoleon(Lapel)

Oversized(Lapel)

Set-in sleeve

Dropped-sh…

Raglan(Sleeve)

Cap (Sleeve)

Tulip(Sleeve) /

Petal(Sleeve)

Puff (Sleeve)

Bell(Sleeve) /

Circularflounce

(Sleeve) /Ruffle

(Sleeve)

Poet (Sleeve)

Dolman(Sleeve) &

Batwing(Sleeve)

Bishop(Sleeve)

Leg ofmutton

(Sleeve)

Kimono(Sleeve)

Cargo(Pocket) /

Gusset(Pocket) /

Bellow

Patch(Pocket)

Welt (Pocket)

Kang-aroo(Pocket)

Seam(Pocket)

Slash(Pocket) /Slashed

(Pocket) /slant (…

Curved(Pocket)

Flap (Pocket)

Collarless

Asymmetric(Neckline)

Crew(Neck)

Round(Neck) /

Roundneck

V-neck / V(Neck)

Surplice(Neck)

Oval(Neck)

U-neck / U(Neck)

Sweetheart(Neckline)

Queen anne(Neck)

Boat(Neck)

Scoop(Neck)

Square(Neckline)

Plunging(Neckline) /

Plunge(Neckline)

Keyhole(Neck)

Halter(Neck)

Crossover(Neck)

Choker(Neck)

High(Neck) /

Bottle (Neck)

Turtle(Neck) /

Mock (Neck)/ Polo(Neck)

Cowl (Neck)

Straightacross(Neck)

Illusion(Neck)

Off-the-sho…

Oneshoulder

Jacket

Shirt / Blouse

Tops

Sweater(pullover, without

opening)

Cardigan (withopening)

Vest / Gilet Pants

ShortsSkirt

Coat Dress

Jumpsuit

CapeCollar

Sleeve

Neckline

Lapel

Pocket

Scarf

Umbrella

Epaulette

Buckle

Zipper

Applique

Bead

Bow

Ribbon

Rivet

Ruffle

Sequin

Tassel

Shoe

Bag, wallet

Flower

Fringe

Headband, Hair Accessory

Tie

Glove

Watch

Belt

Leg warmer

Tights, stockings

Sock

Glasses

Hat

ModaNetDeepFashion2

Apparel CategoriesFine-grained Attributes

!(#

JacketTops

Sweater

Cardigan (withopening)

Vest / Gilet Pants

ShortsSkirt

Coat Dress

Jumpsuit

CapeCollar

Sleeve

Neckline

Lapel

Pocket

Scarf

Umbrella

Epaulette

Buckle

Zipper

Applique

Bead

Bow

Ribbon

Rivet

Ruffle

Sequin

Tassel

Shoe

Bag, wallet

Flower

Fringe

Tie

Glove

Watch

Belt

Leg warmer

Sock

Glasses

Hat

ModaNetDeepFashion2

Apparel CategoriesFine-grained Attributes

!)#$

Fig. 1: An illustration of the Fashionpedia dataset and ontology: (a)main garment masks; (b) garment part masks; (c) both main garment and gar-ment part masks; (d) fine-grained apparel attributes; (e) an exploded view ofthe annotation diagram: the image is annotated with both instance segmen-tation masks (white boxes) and per-mask fine-grained attributes (black boxes);(f) visualization of the Fashionpedia ontology: we created Fashionpedia ontol-ogy and separate the concept of categories (yellow nodes) and attributes [38](blue nodes) in fashion. It covers pre-defined garment categories used by bothDeepfashion2 [11] and ModaNet [54]. Mapping with DeepFashion2 also showsthe versatility of using attributes and categories. We are able to present all 13garment classes in DeepFashion2 with 11 main garment categories, 1 garmentpart, and 7 attributes

on describing and comparing objects, since an object also has many other prop-erties or attributes in addition to its category. Attributes not only provide acompact and scalable way to represent objects in the world, as pointed out byFerrari and Zisserman [9], attribute learning also enables transfer of existingknowledge to novel classes. This is particularly useful for fine-grained visualrecognition, with the goal of distinguishing subordinate visual categories such asbirds [46] or natural species [43].

In the spirit of mapping the visual world, we propose a new task, instancesegmentation with attribute localization, which unifies object detection and fine-grained attribute recognition. As illustrated in Figure 1(e), this task offers astructured representation of an image. Automatic recognition of a rich set of at-tributes for each segmented object instance complements category-level objectdetection and therefore advance the degree of complexity of images and sceneswe can make understandable to machines. In this work, we focus on the fash-ion domain as an example to illustrate this task. Fashion comes with rich and

Page 3: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 3

complex apparel with attributes, influences many aspects of modern societies,and has a strong financial and cultural impact. We anticipate that the proposedtask is also suitable for other man-made product domains such as automobileand home interior.

Structured representations of images often rely on structured vocabular-ies [28]. With this in mind, we construct the Fashionpedia ontology (Figure 1(f))and image dataset (Figure 1(a-e)), annotating fashion images with detailed seg-mentation masks for apparel categories, parts, and their attributes. Our pro-posed ontology provides a rich schema for interpretation and organization ofindividuals’ garments, styles, or fashion collections [24]. For example, we cancreate a knowledge graph (see supplementary material for more details) by ag-gregating structured information within each image and exploiting relationshipsbetween garments and garment parts, categories, and attributes in the Fash-ionpedia ontology. Our insight is that a large-scale fashion segmentation andattribute localization dataset built with a fashion ontology can help computervision models achieve better performance on fine-grained image understandingand reasoning tasks.

The contributions of this work are as follows:A novel task of fine-grained instance segmentation with attribute localization.

The proposed task unifies instance segmentation and visual attribute recognition,which is an important step toward structural understanding of visual content inreal-world applications.

A unified fashion ontology informed by product descriptions from the internetand built by fashion experts. Our ontology captures the complex structure offashion objects and ambiguity in descriptions obtained from the web, containing46 apparel objects (27 main apparels and 19 apparel parts), and 294 fine-grainedattributes (spanning 9 super categories) in total. To facilitate the development ofrelated efforts, we also provide a mapping with categories from existing fashionsegmentation datasets, see Figure 1(f).

A dataset with a total of 48,825 clothing images in daily-life, street-style,celebrity events, runway, and online shopping annotated both by crowd workersfor segmentation masks and fashion experts for localized attributes, with thegoal of developing and benchmarking computer vision models for comprehensiveunderstanding of fashion.

A new model, Attribute-Mask R-CNN, is proposed with the aim of jointlyperforming instance segmentation and localized attribute recognition; a novelevaluation metric for this task is also provided.

2 Related Work

The combined task of fine-grained instance segmentation and attribute lo-calization has not received a great deal of attention in the literature. On onehand, COCO [31] and LVIS [15] represent the benchmarks of object detectionfor common objects. Panoptic segmentation is proposed to unify both seman-tic and instance segmentation, addressing both stuff and thing classes [27]. In

Page 4: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

4 M. Jia et al.

Table 1: Comparison of fashion-related datasets: (Cls. = Classification, Segm. =Segmentation, MG = Main Garment, GP = Garment Part, A = Accessory, S =Style, FGC = Fine-Grained Categorization). To the best of our knowledge, weinclude all fashion-related datasets focusing on visual recognition

Name Category Annotation Type Attribute Annotation Type

Cls. BBox Segm. Unlocalized Localized FGC

Clothing Parsing [50] MG, A - - - - -Chic or Social [49] MG, A - - - - -Hipster [26] MG, A, S - - - - -Ups and Downs [19] MG - - - - -Fashion550k [23] MG, A - - - - -Fashion-MNIST [48] MG - - - - -

Runway2Realway [44] - - MG, A - - -ModaNet [54] - MG, A MG, A - - -Deepfashion2 [11] - MG MG - - -

Fashion144k [40] MG, A - - X - -Fashion Style-128 Floats [41] S - - X - -UT Zappos50K [51] A - - X - -Fashion200K [16] MG - - X - -FashionStyle14 [42] S - - X - -

Main Product Detection [39] - MG - X - -

StreetStyle-27K [35] - - - X - X

UT-latent look [21] MG, S - - X - XFashionAI [7] MG, GP, A - - X - XiMat-Fashion Attribute [14] MG, GP, A, S - - X - X

Apparel classification-Style [4] - MG - X - XDARN [22] - MG - X - XWTBI [25] - MG, A - X - XDeepfashion [32] S MG - X - X

Fashionpedia - MG, GP, A MG, GP, A - X X

spite of the domain differences, Fashionpedia has comparable mask qualitieswith LVIS and the similar total number of segmentation masks as COCO. Onthe other hand, we have also observed an increasing effort to curate datasets forfine-grained visual recognition, evolved from CUB-200 Birds [46] to the recentiNaturalist dataset [43]. The goal of this line of work is to advance the state-of-the-art in automatic image classification for large numbers of real world, fine-grained categories. A rather unexplored area of these datasets, however, is toprovide a structured representation of an image. Visual Genome [28] providesdense annotations of object bounding boxes, attributes, and relationships in thegeneral domain, enabling a structured representation of the image. In our work,we instead focus on fine-grained attributes and provide segmentation masks inthe fashion domain to advance the clothing recognition task.

Clothing recognition has received increasing attention in the computer vi-sion community recently. A number of works provide valuable apparel-relateddatasets [50,4,49,26,44,40,22,25,19,41,32,23,48,51,16,42,39,35,21,54,7,11]. Thesepioneering works enabled several recent advances in clothing-related recognition

Page 5: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 5

and knowledge discovery [10,34]. Table 1 summarizes the comparison amongdifferent fashion datasets regarding annotation types of clothing categories andattributes. Our dataset distinguishes itself in the following three aspects.

Exhaustive annotation of segmentation masks: Existing fashion datasets[44,54,11] offer segmentation masks for the main garment (e.g., jacket, coat,dress) and the accessory categories (e.g., bag, shoe). Smaller garment objectssuch as collars and pockets are not annotated. However, these small objectscould be valuable for real-world applications (e.g., searching for a specific collarshape during online-shopping). Our dataset is not only annotated with the seg-mentation masks for a total of 27 main garments and accessory categories butalso 19 garment parts (e.g., collar, sleeve, pocket, zipper, embroidery).

Localized attributes: The fine-grained attributes from existing datasets[22,32,39,14] tend to be noisy, mainly because most of the annotations are col-lected by crawling fashion product attribute-level descriptions directly from largeonline shopping websites. Unlike these datasets, fine-grained attributes in ourdataset are annotated manually by fashion experts. To the best of our knowl-edge, ours is the only dataset to annotate localized attributes: fashion experts areasked to annotate attributes associated with the segmentation masks labeled bycrowd workers. Localized attributes could potentially help computational modelsdetect and understand attributes more accurately.

Fine-grained categorization: Previous studies on fine-grained attributecategorization suffer from several issues including: (1) repeated attributes be-longing to the same category (e.g., zip,zipped and zipper) [32,21]; (2) basiclevel categorization only (object recognition) and lack of fine-grained catego-rization [50,4,49,26,25,44,41,42,23,16,48,54,11]; (3) lack of fashion taxonomieswith the needs of real-world applications for the fashion industry, possibly dueto the research gap in fashion design and computer vision; (4) diverse taxonomystructures from different sources in fashion domain. To facilitate research in theareas of fashion and computer vision, our proposed ontology is built and veri-fied by fashion experts based on their own design experiences and informed bythe following four sources: (1) world-leading e-commerce fashion websites (e.g.,ZARA, H&M, Gap, Uniqlo, Forever21); (2) luxury fashion brands (e.g., Prada,Chanel, Gucci); (3) trend forecasting companies (e.g., WGSN); (4) academicresources [8,3].

3 Dataset Specification and Collection

3.1 Ontology specification

We propose a unified fashion ontology (Figure 1(f)), a structured vocabu-lary that utilizes the basic level categories and fine-grained attributes [38]. TheFashionpedia ontology relies on similar definitions of object and attributes as pre-vious well-known image datasets. For example, a Fashionpedia object is similarto “item” in Wikidata [45], or “object” in COCO [31] and Visual Genome [28]).In the context of Fashionpedia, objects represent common items in apparel (e.g.,

Page 6: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

6 M. Jia et al.

!"#$%!&'()*%!+,##-./,'%"012!+,3-'!1*"1.$3*&0!,*%*%"!4*5!)5!+3*-.*&,'!-)61'7!8'37!5,3*%./5300'-%2!3&9::'0-*63,!%#.%#%!0';0*,'.:30'-*3,

!&*%",'.8-'3&0'7!+,#-3,!&9::'0-*63,

!&1*-0./6#,,3-2

!$-*&0!,'%"01!5,3*%./5300'-%2.!38#<'!01'!1*5.!%#.%#%!0';0*,'.:30'-*3,.!0*"10./+*02.!&9::'0-*63,

!5,3*%./5300'-%2.!38#<'!01'!1*5./,'%"012.!%#.%#%!0';0*,'.:30'-*3,.!0-)6='-./>36='02.!$3&1'7.!-'"),3-./+*02.!&*%",'.8-'3&0'7!&9::'0-*63,.!%#.$3*&0,*%'

!-'"),3-./6#,,3-2

!5,3*%./5300'-%2.!%#.%#%!0';0*,'.:30'-*3,.!-'"),3-./+*02.!+,9./#5'%*%"2.!5'"./53%0&2.!:3;*./,'%"012.!&9::'0-*63,.!5'"

!$',0./5#6='02

!$',0./5#6='02

!$-*&0!,'%"01!&'0!*%.&,''<'

!5,3*%./5300'-%2.!%#.%#%!0';0*,'.:30'-*3,.!7#)8,'.8-'3&0'7.!-'"),3-./+*02.!0-'%61./6#302.!:*%*./,'%"012.!&9::'0-*63,.!%#.$3*&0,*%'

!38#<'!01'!1*5./,'%"012!5,3*%./5300'-%2!%#-:3,.$3*&0!%#.%#%!0';0*,'.:30'-*3,!0*"10./+*02!6,3&&*6./0!&1*-02!&9::'0-*63,

!38#<'!01'!1*5./,'%"012!%#-:3,.$3*&0!%#.%#%!0';0*,'.:30'-*3,!-'"),3-./+*02!&*%",'.8-'3&0'7!&9::'0-*63,!7#0

!5,3*%./5300'-%2!%#.%#%!0';0*,'.:30'-*3,!-'"),3-./+*02!:*%*./,'%"012!8,3%='0./6#302!&*%",'.8-'3&0'7!,*%*%"!&9::'0-*63,!%#.$3*&0,*%'

!<!%'6=

!$-*&0!,'%"01!&'0!*%.&,''<'

!$-*&0!,'%"01

!5,3*%./5300'-%2!>'3%&!%#.%#%!0';0*,'.:30'-*3,!-'"),3-./+*02!$3&1'7!,#$.$3*&0!:3;*./,'%"012!&9::'0-*63,!+,9./#5'%*%"2!&0-3*"10

!53061./5#6='02

!6)-<'7./5#6='02

0#03,.:3&=&?.@A 0#03,.:3&=&?.@@

0#03,.630'"#-*'&?.BC 0#03,.630'"#-*'&?.BC 0#03,.630'"#-*'&?.B@ 0#03,.630'"#-*'&.?.B@

/32 /82

/62 /72 /'2 /+2 /"2 /12 /*2

Fig. 2: Image examples with annotated segmentation masks (a-f) and fine-grained attributes (g-i)

jacket, shirt, dress). In this section, we break down each component of the Fash-ionpedia ontology and illustrate the construction process. With this ontologyand our image dataset, a large-scale fashion knowledge graph can be built as anextended application of our dataset (more details can be found in the supple-mentary material).

Apparel categories. In the Fashionpedia dataset, all images are annotatedwith one or multiple main garments. Each main garment is also annotated withits garment parts. For example, general garment types such as jacket, dress,pants are considered as main garments. These garments also consist of severalgarment parts such as collars, sleeves, pockets, buttons, and embroideries. Maingarments are divided into three main categories: outerwear, intimate and acces-sories. Garment parts also have different types: garment main parts (e.g., collars,sleeves), bra parts, closures (e.g., button, zipper) and decorations (e.g., embroi-dery, ruffle). On average, each image consists of 1 person, 3 main garments,3 accessories, and 12 garment parts, each delineated by a tight segmentationmask (Figure 1(a-c)). Furthermore, each object is assigned to a synset ID in ourFashionpedia ontology.

Fine-grained attributes. Main garments and garment parts can be asso-ciated with apparel attributes (Figure 1(e)). For example, “button” is part ofthe main garment “jacket”; “jacket” can be linked with the silhouette attribute“symmetrical”; the garment part “button” could contain the attribute “metal”with a relationship of material. The Fashionpedia ontology provide attributes for13 main outerwear garments categories, and 5 out of 19 garments parts (“sleeve”,“neckline”, “pocket”, “lapel”, and “collar”). Each image has 16.7 attributes onaverage (max 57 attributes). As with the main garments and garment parts, wecanonicalize all attributes to our Fashionpedia ontology.

Relationships. Relationships can be formed between categories and at-tributes. There are three main types of relationships (Figure 1(e)): (1) outfitsto main garments, main garments to garment parts: meronymy (part-of) rela-tionship; (2) main garments to attributes or garment parts to attributes: theserelationship types can be garment silhouette (e.g., peplum), collar nickname

Page 7: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 7

(e.g., peter pan collars), garment length (e.g., knee-length), textile finishing (e.g.,distressed), or textile-fabric patterns (e.g., paisley), etc.; (3) within garments,garment parts or attributes: there is a maximum of four levels of hyponymy(is-an-instance-of) relationships. For example, weft knit is an instance of knitfabric, and the fleece is an instance of weft knit.

3.2 Image Collection and Annotation Pipeline

Image collection. A total of 50,527 images were harvested from Flickrand free license photo websites, which included Unsplash, Burst by Shopify,Freestocks, Kaboompics, and Pexels. Two fashion experts verified the quality ofthe collected images manually. Specifically, the experts checked a scenes’ diversityand made sure clothing items were visible in the images. Fdupes [33] is used toremove duplicated images. After filtering, 48,825 images were left and used tobuild our Fashionpedia dataset.

Annotation Pipeline. Expert annotation is often a time-consuming pro-cess. In order to accelerate the annotation process, we decoupled the work be-tween crowd workers and experts. We divided the annotation process into thefollowing two phases.

First, segmentation masks with apparel objects are annotated by 28 crowdworkers, who were trained for 10 days before the annotation process (with pre-pared annotation tutorials of each apparel object, see supplementary materialfor details). We collected high-quality annotations by having the annotators fol-low the contours of garments in the image as closely as possible (See section 4.2for annotation analysis). This polygon annotation process was monitored dailyand verified weekly by a supervisor and by the authors.

Second, 15 fashion experts (graduate students in the apparel domain) wererecruited to annotate the fine-grained attributes for the annotated segmenta-tion masks. Annotators were given one mask and one attribute super-category(“textile pattern”, “garment silhouette” for example) at a time. An additionaltwo options, “not sure” and “not on the list” were added during the annotation.The option “not on the list” indicates that the expert found an attribute thatis not on the proposed ontology. If “not sure” is selected, it means the expertcan not identify the attribute of one mask. Common reasons for this selectioninclude occlusion of the masks and viewing angles of the image; (for example,a top underneath a closed jacket). More details can be found in Figure 2. Eachattribute supercategory is assigned to one or two fashion experts, dependingon the number of masks. The annotations are also checked by another expertannotator before delivery.

We split the data into training, validation and test sets, with 45,623, 1158,2044 images respectively. More details of the dataset creation can be found inthe supplementary material.

Page 8: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

8 M. Jia et al.

4 Dataset Analysis

This section will discuss a detailed analysis of our dataset using the train-ing images. We begin by discussing general image statistics, followed by ananalysis of segmentation masks, categories, and attributes. We compare Fash-ionpedia with four other segmentation datasets, including two recent fashiondatasets DeepFashion2 [11] and ModaNet [54], and two general domain datasetsCOCO [31] and LVIS [15].

4.1 Image Analysis

We choose to use images with high resolutions during the curating processsince Fashionpedia ontology includes diverse fine-grained attributes for both gar-ments and garment parts. The Fashionpedia training images have an average di-mension of 1710 (width) × 2151 (height). Images with high resolutions are ableto show apparel objects in detail, leading to more accurate and faster annotationsfor both segmentation masks and attributes. These high resolution images canalso benefit downstream tasks such as detection and image generation. Examplesof detailed annotations can be found in Figure 2.

4.2 Mask Analysis

We define “masks” as one apparel instance that may have more than oneseparate components (the jacket in Figure 1 for example), “polygon” as a disjointarea.

Mask quantity. On average, there are 7.3 (median 7, max 74) number ofmasks per image in the Fashionpedia training set. Figure 3(a) shows that theFashionpedia has the largest median value in the 5 datasets used for comparison.Fashionpedia also has the widest range among three fashion datasets, and a com-parable range with COCO, which is a dataset in a general domain. Compared toModaNet and Deepfashion2 datasets, Fashionpedia has the widest range of maskcount distribution. However, COCO and LVIS maintain a wider distribution overFashionpedia owing to their more common objects in their dataset. Figure 3(d)illustrates the distribution within the Fashionpedia dataset. One image usuallycontains more garment parts and accessories than outerwears.

Mask sizes. Figure 3(b) and 3(e) compares relative mask sizes within Fash-ionpedia and against other datasets. Ours has a similar distribution as COCOand LVIS, except for a lack of larger masks (area > 0.95). DeepFashion2 has aheavier tail, meaning it contains a larger portion of garments with a zoomed-inview. Unlike DeepFashion2, our images mainly focused on the whole ensemble ofclothing. Since ModaNet focuses on outwears and accessories, it has more maskswith relative area between 0.2 and 0.4. whereas ours has an additional 19 apparelparts categories. As illustrated in Figure 3(e), garment parts and accessories arerelatively small compared to the outerwear (e.g., “dress”, “coat”).

Mask quality. Apparel categories also tend to have complex silhouettes.Table 2 shows that the Fashionpedia masks have the most complex boundaries

Page 9: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 9

0 100 200 300 400 500Number of mask per image

10 3

10 2

10 1

100

101

102

Perc

ent o

f im

ages

DeepFashion2 [2|8]ModaNet [5|16]COCO2017 [4|93]LVIS [6|556]Fashionpedia [7|74]

(a) Mask count per image

across datasets

0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size

10 4

10 3

10 2

10 1

100

101

Perc

ent o

f ins

tanc

es

DeepFashion2ModaNetCOCO2017LVISFashionpedia

(b) Relative mask size across

datasets

2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Number of categories per image

10 3

10 2

10 1

100

101

102

Perc

ent o

f im

ages

DeepFashion2ModaNetCOCO2017LVISFashionpedia

(c) Category diversity per im-

age across datasets

0 10 20 30 40 50 60 70Number of mask per image

10 3

10 2

10 1

100

101

Perc

ent o

f ins

tanc

es Outerwears [1|6]Garment Parts [3|69]Accessories [2|18]

(d) Mask count per image in

Fashionpedia

0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size

10 3

10 2

10 1

100

101

Perc

ent o

f ins

tanc

es OuterwearsGarment PartsAccessories

(e) Relative mask size in Fash-

ionpedia

0 50 100 150 200 250 300Categories and attributes

100

101

102

103

104

105

Coun

ts

CategoriesAttributes

(f) Categories and attributes

distribution in Fashionpedia

Fig. 3: Dataset statistics: First row presents comparison among datasets. Sec-ond row presents comparison within Fashionpedia. Y-axes are in log scale. Rela-tive segmentation mask size were calculated same as [15]. Relative mask size wasrounded to precision of 2. For mask count per image comparisons (Figure 3(a)and 3(d)), legends follow [median | max ] format. Values in X-axis for Figure 3(a)was discretized for better visual effect

amongst the five datasets (according to the measurement used in [15]). This sug-gests that our masks are better able to represent complex silhouettes of apparelcategories more accurately than ModaNet and DeepFashion2. We also report thenumber of vertices per polygons, which is a measurement representing how gran-ularity of the masks produced. Table 2 shows that we have the second-highestaverage number of vertices among five datasets, next to LVIS.

4.3 Category and Attributes Analysis

There are 46 apparel categories and 294 attributes presented in the Fash-ionpedia dataset. On average, each image was annotated with 7.3 instances, 5.4categories, and 16.7 attributes. Of all the masks with categories and attributes,each mask has 3.7 attributes on average (max 14 attributes). Fashionpedia hasthe most diverse number of categories within one image among three fashiondatasets, while comparable to COCO (Figure 3(c)), since we provide a com-prehensive ontology for the annotation. In addition, Figure 3(f) shows the dis-tributions of categories and attributes in the training set, and highlights thelong-tailed nature of our data.

During the fine-grained attributes annotation process, we also ask the expertsto choose “not sure” if they are uncertain to make a decision, “not on the list”if they find another attribute that not provided. the majority of “not sure”comes from three attributes superclasses, namely “Opening Type”, “Waistline”,

Page 10: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

10 M. Jia et al.

Table 2: Comparison of segmentation mask complexity amongst segmentationdatasets in both fashion and general domain (COCO 2017 instance training datawas used). Each statistic (mean and median) represents a bootstrapped 95%confidence interval following [15]. Boundary complexity was calculated accordingto [15,2]. Reported mask boundary complexity for COCO and LVIS was differentcompared with [15] due to different image resolution and image sets. The numberof vertices per polygon is calculated as the number of vertices in one polygon.Polygon is defined as one disjoint area. Masks (and polygons) with zero areawere ignored

Dataset Boundary complexity No. of vertices per polygon Images countmean median mean median

COCO [31] 6.65 - 6.66 6.07 - 6.08 21.14 - 21.21 15.96 - 16.04 118,287LVIS [15] 6.78 - 6.80 5.89 - 5.91 35.77 - 35.95 22.91 - 23.09 57,263ModaNet [54] 5.87 - 5.89 5.26 - 5.27 22.50 - 22.60 18.95 - 19.05 52,377Deepfashion2 [11] 4.63 - 4.64 4.45 - 4.46 14.68 - 14.75 8.96 - 9.04 191,960

Fashionpedia 8.36 - 8.39 7.35 - 7.37 31.82 - 32.01 20.90 - 21.10 45, 623

“Length”. Since there are masks only show a limited portion of apparel (a topinside a jacket for example), the annotators are not sure how to identify thoseattributes due to occlusion or viewpoint discrepancies. Less than 15% of masksfor each attribute superclasses account for “not on the list”, which illustratesthe comprehensiveness of our proposed ontology (see supplementary materialfor more details of the extra dataset analysis).

5 Evaluation Protocol and Baselines

5.1 Evaluation metric

In object detection, a true positive (TP) for each category c is defined as asingle detected object that matches a ground truth object with a Intersectionover Union (IoU) over a threshold τIoU. COCO’s main evaluation metric usesaverage precision averaged across all 10 IoU thresholds τIoU ∈ [0.5 : 0.05 : 0.95]and all 80 categories. We denote such metric as APIoU.

In the case of instance segmentation and attribute localization, we extendstandard COCO metric by adding one more constraint: the macro F1 score forpredicted attributes of single detected object with category c (see supplementarymaterial for the average choice of f1-score). We denote the F1 threshold as τF1 ,and it has the same range as τIoU (τF1 ∈ [0.5 : 0.05 : 0.95]). The main metricAPIoU+F1

reports averaged precision score across all 10 IoU thresholds, all 10macro F1 scores, and all the categories. Our evaluation API, code and trainedmodels are available at: https://fashionpedia.github.io/home/Model_and_API.

html.

Page 11: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 11

Jacket

Symmetrical

Above-the-hiplength

Single-breasted

Droppedshoulder

Regular(fit)Plain

Washed

InstanceSegmentation

LocalizedAttribute

Attributes

Class

Box

1024

Mask

28x28x46

28x28x256

HxWx256RoI

RoI HxWx256 x4

HxWx256

+

backbone

HxWx256 x4

Fig. 4: Attribute-Mask R-CNN adds a multi-label attribute prediction head uponMask R-CNN for instance segmentation with attribute localization

5.2 Attribute-Mask R-CNN

We perform two tasks on Fashionpedia: (1) apparel instance segmentation (ignoringattributes); (2) instance segmentation with attribute localization. To better facilitateresearch on Fashionpedia, we design a strong baseline model named Attribute-Mask R-CNN that is built upon Mask R-CNN [17]. As illustrated in Figure 4, we extend MaskR-CNN heads to include an additional multi-label attribute prediction head, which istrained with sigmoid cross-entropy loss. Attribute-Mask R-CNN can be trained end-to-end for jointly performing instance segmentation and localized attribute recognition.

We leverage different backbones including ResNet-50/101 (R50/101) [18] with fea-ture pyramid network (FPN) [30] and SpineNet-49/96/143 [5]. The input image is re-sized to 1024 of the longer edge to feed all networks except SpineNet-143. For SpineNet-143, we instead use the input size of 1280. During training, other than standard randomhorizontal flipping and cropping, we use large scale jittering that resizes an image toa random ratio between (0.5, 2.0) of the target input image size, following Zoph etal . [55]. We use an open-sourced Tensorflow [1] code base1 for implementation and allmodels are trained with a batch size of 256. We follow the standard training scheduleof 1× (5625 iterations), 2× (11250 iterations), 3× (16875 iterations) and 6× (33750iterations) in Detectron2 [47], with linear learning rate scaling suggested by Goyal etal . [13].

5.3 Results Discussion

Attribute-Mask R-CNN. From results in Table 3, we have the following observa-tions: (1) Our baseline models achieve promising performance on challenging Fashion-pedia dataset. (2) There is a significant drop (e.g ., from 48.7 to 35.7 for SpineNet-143)in box AP if we add τF1 as another constraint for true positive. This is further verifiedby per super-category mask results in Table 4. This suggests that joint instance seg-mentation and attribute localization is a significantly more difficult task than instancesegmentation, leaving much more room for future improvements.

Main apparel detection analysis. We also provide in-depth detector analysisfollowing COCO detection challenge evaluation [31] inspired by Hoiem et al . [20]. Fig-ure 5 illustrates a detailed breakdown of bounding boxes false positives produced bythe detectors.

1 https://github.com/tensorflow/tpu/tree/master/models/official/detection

Page 12: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

12 M. Jia et al.

Table 3: Baseline results of Mask R-CNN and Attribute-Mask R-CNN on Fash-ionpedia. The big performance gap between APIoU and APIoU+F1

suggests thechallenging nature of our proposed task

Backbone Schedule FLOPs(B) params(M) APboxIoU/IoU+F1

APmaskIoU/IoU+F1

R50-FPN

296.7 46.4

38.7 / 26.6 34.3 / 25.52× 41.6 / 29.3 38.1 / 28.53× 43.4 / 30.7 39.2 / 29.56× 42.9 / 31.2 38.9 / 30.2

R101-FPN

374.3 65.4

41.0 / 28.6 36.7 / 27.62× 43.5 / 31.0 39.2 / 29.83× 44.9 / 32.8 40.7 / 31.46× 44.3 / 32.9 39.7 / 31.3

SpineNet-496×

267.2 40.8 43.7 / 32.4 39.6 / 31.4SpineNet-96 314.0 55.2 46.4 / 34.0 41.2 / 31.8SpineNet-143 498.0 79.2 48.7 / 35.7 43.1 / 33.3

Table 4: Per super-category results (for masks) using Attribute-Mask R-CNNwith SpineNet-143 backbone. We follow the same COCO sub-metrics for overalland three super-categories for apparel objects. Result format follows [APIoU

/ APIoU+F1 ] or [ARIoU / ARIoU+F1 ] (see supplementary material for per-classresults)

Category AP AP50 AP75 APl APm APs

overall 43.1 / 33.3 60.3 / 42.3 47.6 / 37.6 50.0 / 35.4 40.2 / 27.0 17.3 / 9.4

outerwear 64.1 / 40.7 77.4 / 49.0 72.9 / 46.2 67.1 / 43.0 44.4 / 29.3 19.0 / 4.4parts 19.3 / 13.4 35.5 / 20.8 18.4 / 14.4 28.3 / 14.5 23.9 / 16.4 12.5 / 9.8accessory 56.1 / - 77.9 / - 63.9 / - 57.5 / - 60.5 / - 25.0 / -

Figure 5(a) and 5(b) compare two detectors trained on Fashionpedia and COCO.Errors of the COCO detector are dominated by imperfect localization (AP is increasedby 28.3 from overall AP at τIoU = 0.75) and background confusion (+15.7) (5(b)).Unlike the COCO detector, no mistake in particular dominates the errors producedby the Fashionpedia detector. Figure 5(a) shows that there are errors from localiza-tion (+13.4), classification (+6.9), background confusions (+6.6). Due to the spaceconstraint, we leave super-category analysis in the supplementary material.

Prediction visualization. Baseline outputs (with both segmentation masks andlocalized attributes) are also visualized in Figure 6. Our Attribute-Mask R-CNN achievesgood results even for small objects like shoes and glasses. Model can correctly predictfine-grained attributes for some masks on one hand (e.g., Figure 6 second image at toprow ). On the other hand, it also incorrectly predict the wrong nickname (welt) topocket (e.g., Figure 6 third image at top row ). These results show that there isheadroom remaining for future development of more advanced computer vision modelson this task (see supplementary material for more details of the baseline analysis).

Page 13: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 13

overall-all-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.537] C75[.630] C50[.671] Loc[.720] Sim[.740] Oth[.806] BG[1.00] FN

(a) Fashionpedia

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

overall-all-all

[.399] C75[.589] C50[.682] Loc[.695] Sim[.713] Oth[.870] BG[1.00] FN

(b) COCO (2015 challenge winner)

Fig. 5: Main apparel detectors analysis. Each plot shows 7 precision recall curveswhere each evaluation setting is more permissive than the previous. Specifically,C75: strict IoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: local-ization errors ignored (τIoU = 0.1); Sim: supercategory False Positives (FPs)removed; Oth: category FPs removed; BG: background (and class confusion)FPs removed; FN: False Negatives are removed. Two plots are a comparisonbetween two detectors trained on Fashionpedia and COCO respectively. The re-sults are averaged over all categories. Legends present the area under each curve(corresponds to AP metric) in brackets as well

6 Conclusion

In this work, we focus on a new task that unifies instance segmentation and at-tribute recognition. To solve challenging problems entailed in this task, we introducedthe Fashionpedia ontology and dataset. To the best of our knowledge, Fashionpediais the first dataset that combines part-level segmentation masks with fine-grained at-tributes. We presented Attribute-Mask R-CNN, a novel model for this task, along witha novel evaluation metric. We expect models trained on Fashionpedia can be appliedto many applications including better product recommendation in online shopping,enhanced visual search results, and resolving ambiguous fashion-related words for textqueries. We hope Fashionpedia will contribute to the advances in fine-grained imageunderstanding in the fashion domain.

7 Acknowledgements

This research was partially supported by a Google Faculty Research Award. Wethank Kavita Bala, Carla Gomes, Dustin Hwang, Rohun Tripathi, Omid Poursaeed,Hector Liu, and Nayanathara Palanivel, Konstantin Lopuhin for their helpful feedbackand discussion in the development of Fashionpedia dataset. We also thank Zeqi Gu,Fisher Yu, Wenqi Xian, Chao Suo, Junwen Bai, Paul Upchurch, Anmol Kabra, andBrendan Rappazzo for their help developing the fine-grained attribute annotation tool.

Page 14: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

14 M. Jia et al.

!"#$% &'''()#*$+&',"-.'/()#*$+0$)-$.()'1"$$)2#&'1(".#'/1"$$)2#0

3())4)'5 &'''()#*$+&'62.%$7()#*$+#.89#",)&'%)$7.#'%())4)

:"89)$ &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0;1)#.#*'$=1)&'%.#*()'>2)"%$)?

!"#$% &'''$)-$.()'1"$$)2#&'1(".#'/1"$$)2#0()#*$+&',"-.'/()#*$+0'

3())4)'5 &'''#.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+()#*$+&'%+;2$'/()#*$+0'

@;(("2 &'''#.89#",)&'%+.2$'/8;(("20#.89#",)&'2)*<("2'/8;(("20'

A;1 &'''%.(+;<)$$)&'%=,,)$2.8"(6".%$(.#)&'#;'6".%$(.#)()#*$+&'">;4)7$+)7+.1'/()#*$+0

39.2$ &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0()#*$+&',.#.'/()#*$+0

3())4)'5 &''' #.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+

3())4)'B &''''#.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+

:"89)$ &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0;1)#.#*'$=1)&'%.#*()'>2)"%$)?

A;1 &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0''''()#*$+&'">;4)7$+)7+.1'/()#*$+0

3())4)'B &'''#.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+()#*$+&'%+;2$'/()#*$+0

!"#$%

3())4)'5

3())4)'B

:"89)$

3())4)'B

3())4)'5

A;1

!"#$%

:"89)$

A;1

39.2$

!;89)$'5 &'''#.89#",)&'1"$8+'/1;89)$0''#.89#",)&'C("1'/1;89)$0'#.89#",)&'6)($'/1;89)$0

D)89(.#) &'''#)89(.#)'$=1)&'2;<#?'/#)890 !;89)$'5 &'''#.89#",)&'1"$8+'/1;89)$0#.89#",)&'6)($'/1;89)$0#.89#",)&'C("1'/1;89)$0%.(+;<)$$)&'%=,,)$2.8"(

3())4)'5 &'''#.89#",)&'%)$7.#'%())4)()#*$+&'%+;2$'/()#*$+0'

3())4)'B &'''#.89#",)&'%)$7.#'%())4)()#*$+&'%+;2$'/()#*$+0'

!;89)$'5

D)89(.#)

!;89)$

@;(("2

E"1)( !"#$%

3())4)'B

3())4)'5 @;(("2

!;89)$'B!;89)$'5

!;89)$'F

3())4)'B3())4)'5

3+;)'5

3+;)'B

G("%%)%

G("%%)%

G("%%)%

3+;)'5 3+;)'B

D)89(.#)

3+;)'B

3+;)'5

D)89(.#) &'''#)89(.#)'$=1)&'47#)89#)89(.#)'$=1)&'2;<#?'/#)890#)89(.#)'$=1)&';4"('/#)890

@;(("2

A;1

/"0 />0 /80 /?0

!;89)$ &'''#.89#",)&'1"$8+'/1;89)$0''#.89#",)&'C("1'/1;89)$0'#.89#",)&'6)($'/1;89)$0

!"#$% & '()#*#+,$-()&,./-,,0'()#*#+1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

6'( &,%*/2'7)$$)&,%-33)$5*8"/,,/)#+$2&,"9':);$2);2*(,0/)#+$21,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,

&,#)8</*#),$-()&,5'7#=,,,0#)8<1,#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,:;#)8<

&,,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,%/"%2,0('8<)$1,,,,,,,,#*8<#"3)&,("$82,0('8<)$1

>"+ &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,

&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1

&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2

@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,,#*8<#"3)&,5)+7/"5,08'//"51

&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,':"/,0#)8<1

>)/$ &,%*/2'7)$$)&,%-33)$5*8"/,,%*/2'7)$$)&,5)+7/"5,0.*$1$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1

&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,'()#*#+,$-()&,%*#+/),95)"%$)=

&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,%2'5$,0/)#+$21,

&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,:;#)8<#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,%8''(,,0#)8<1

&,/)#+$2&,%2'5$,0/)#+$21,#*8<#"3)&,%)$;*#,%/)):),

A5)%% &,,%*/2'7)$$)&,%-33)$5*8"/,'()#*#+,$-()&,B*(;7(,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,

A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,

&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,

@'"$ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,

>)/$ &,,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,#*8<#"3)&,5)+7/"5,08'//"51,

&,#*8<#"3)&,("$82,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1

>"+ &,%*/2'7)$$)&,%-33)$5*8"/,?"*%$/*#)&,2*+2,?"*%$,?"*%$/*#)&,#'53"/,?"*%$,/)#+$2&,3*#*,0/)#+$21,'()#*#+,$-()&,B*(;7(,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(,

>"+ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

&,,/)#+$2&,?5*%$;/)#+$2,#*8<#"3)&,%)$;*#,%/)):),

&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,

&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,!"#!$%"&'$($)*$(+,&-.(/'.0!/1$(+&!"0*($2/")3&%$($(+

A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(

6'( &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,/)#+$2&,"9':);$2);2*(,,,0/)#+$21

!"#$% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,./-,0'()#*#+1,%*/2'7)$$)&,5)+7/"5,0.*$1,

&,#)8</*#),$-()&,5'7#=,0#)8<1,%*/2'7)$$)&,%-33)$5*8"/,#)8</*#),$-()&,':"/,0#)8<1

&,%*/2'7)$$)&,%-33)$5*8"/,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%

0"1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,091,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,081,,,,,,,,,,,,,,,,,,,,,,,, 0=1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0)1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.1,,,,,,

!"#$%

6'(C)8</*#)

!'8<)$>"+

D2'),E

D2'),F

D2'),ED2'),E

D2'),E D2'),ED2'),FD2'),F

D2'),F

D2'),F

G/"%%)%

!'8<)$,E

D/)):),

@'//"5

C)8</*#)

>)/$

H"8<)$

6'(

!"#$% >78</)

C)8</*#) D/)):),,FD/)):),,E

A5)%%

>"+>"+

D/)):),,E

D/)):),,F

H"8<)$

@'"$

A5)%%

D/)):),,E D/)):),,FA5)%%

!"#$%

@'"$

D/)):),,I

C)8</*#)

6*+2$%,E 6*+2$%,F

>)/$

@'//"5

!'8<)$

>"+A5)%%

@'"$

!'8<)$,F

6*+2$%,E6*+2$%,E

6*+2$%,F

6*+2$%

C)8</*#)

C)8</*#)

C)8</*#)

C)8</*#)

!'8<)$

!'8<)$,E

!'8<)$

D/)):),

D/)):),E,

D/)):),F,

D/)):),E,

D/)):),F,

D/)):),E,

D/)):),F,

H"8<)$

H"8<)$

6*+2$%

&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1

6*+2$%,E

&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1

6*+2$%,F

&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1

!'8<)$,F

&,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%'()#*#+,$-()&,./-,,,0'()#*#+1

6*+2$%

Fig. 6: Attribute-Mask R-CNN results on the Fashionpedia validation set. Masks,bounding boxes, and apparel categories (category score > 0.6) are shown. Thelocalized attributes from the top 5 masks (that contain attributes) on each imageare also shown. Correctly predicted categories and localized attributes are bolded

Page 15: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 15

8 Supplementary Material

In our work we presented the new task of instance segmentation with attributelocalization. We introduced a new ontology and dataset, Fashionpedia, to further de-scribe the various aspects of this task. We also proposed a novel evaluation metric andAttribute-Mask R-CNN model for this task. In the supplemental material, we providethe following items that shed further insight on these contributions:

– More comprehensive experimental results (§ 8.1)– An extended discussion of Fashionpedia ontology and potential knowledge graph

applications (§ 8.2)– More details of dataset analysis (§ 8.3)– Additional information of annotation process (§ 8.4)– Discussion (§ 8.5)

8.1 Attribute-Mask R-CNN

Per-class evaluation. Fig. 7 presents detailed evaluation results per supercate-gory and per category. In Fig. 7, we follow the same metrics from COCO leaderboard(AP, AP50, AP75, APl, APm, APs, AR@1, AR@10, AR@100, ARs@100, ARm@100,ARl@100), with τIoU and τF1 if possible. Fig. 7 shows that metrics considering bothconstraint τIoU and τF1 are always lower than using τIoU alone across all the supercate-gories and categories. This further demonstrates the challenging aspect of our proposedtask.

In general, categories belong to “garment parts” have a lower AP and AR, com-paring with “outerwear” and “accessories”.

A detailed breakdown of detection errors is presented in Fig. 8 for supercategoriesand three main categories. In terms of supercategories in Fashionpedia, “outerwear”errors are dominated by within supercategory class confusions (Fig. 8(a)). Within thissupercategory class, ignoring localization errors would only raise AP slightly from77.5 to 79.1 (+1.6). A similar trend can be observed in class “skirt”, which belongsto “outerwear” (Fig. 8(d)). Detection errors of “part” (Fig. 8(b) 8(e)) and “acces-sory”(Fig. 8(c) 8(f)) on the other hand, are dominated by both background confusionand localization. “part” also has a lower AP in general, compared with other two super-categories. A possible reason is that objects belong to “part” usually have smaller sizesand lower counts.

F1 score calculation. Since we measure the f1 score of predicted attributes andgroundtruth attributes per mask, we consider the both options of multi-label multi-class classification with 294 classes for one instance, and binary classification for 294instances. Multi-label multi-class classification is a straightforward task, as it is a com-mon setting for most of the fine-grained classification tasks. In binary classificationscenario, we consider the 1 and 0 of the multi-hot encoding of both results and ground-truth labels as the positive and negative classes respectively. There are also two aver-aging choices: “micro” and “macro”. “Micro” averaging calculates the score globally bycounting the total true positives, false negatives and false positives. “Macro” averagingcalculates the metrics for each attribute class and reports the unweighted mean. Insum, there are four options of f1-score averaging methods: 1) “micro”, 2) “macro”, 3)“binary-micro”, 4) “binary-macro”.

As shown in Fig. 9, we present the APF1=τF1IoU+F1 , with τIoU averaged in the range

of [0.5 : 0.05 : 0.95]. τF1 is increased from 0.0 to 1.0 with a increment of 0.01. Fig. 9

Page 16: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

16 M. Jia et al.

Max

Min

AP AP50 AP70 APl APm APs AR@1 AR@10 AR@100 ARs@100 ARm@100 ARl@100IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU

Fig. 7: Detailed results (for masks) using Mask R-CNN with SpineNet-143 back-bone. We present the same metrics as COCO leaderboard for overall categories,three super categories for apparel objects, and 46 fine-grained apparel categories.We use both constraints (for example, APIoU and APIoU+F1

) if possible. For cat-egories without attributes, the value represents APIoU or ARIoU. “top” is shortfor “top, t-shirt, sweatshirt”. “head acc” is short for “headband, head covering,hair accessory”

Page 17: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 17

overall-outerwear-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1pr

ecis

ion

[.733] C75[.775] C50[.791] Loc[.912] Sim[.915] Oth[.921] BG[1.00] FN

(a) Super category: outerwear

overall-part-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.293] C75[.420] C50[.478] Loc[.499] Sim[.524] Oth[.649] BG[1.00] FN

(b) Super category: parts

overall-accessory-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.687] C75[.781] C50[.822] Loc[.841] Sim[.871] Oth[.912] BG[1.00] FN

(c) Super category: accessory

outerwear-skirt-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.741] C75[.788] C50[.795] Loc[.894] Sim[.894] Oth[.911] BG[1.00] FN

(d) Outerwear: skirt

part-pocket-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.397] C75[.633] C50[.792] Loc[.802] Sim[.809] Oth[.921] BG[1.00] FN

(e) Parts: pocket

accessory-tights, stockings-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.766] C75[.824] C50[.878] Loc[.903] Sim[.916] Oth[.941] BG[1.00] FN

(f) Accessory: tights, stock-

ings

Fig. 8: Main apparel detectors analysis. Each plot shows 7 precision recall curveswhere each evaluation setting is more permissive than the previous. Specifically,C75: strict IoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: localizationerrors ignored (τIoU = 0.1); Sim: supercategory False Positives (FPs) removed;Oth: category FPs removed; BG: background (and class confusion) FPs re-moved; FN: False Negatives are removed. The first row (overall-[supercategory]-[size]) contains results for three supercategories in Fashionpedia; the second row([supercategory]-[category]-[size]) shows results for three fine-grained categories(one per supercategory). Legends present the area under each curve (correspondsto AP metric) in brackets as well

0.0 0.2 0.4 0.6 0.8 1.0F1 score thresholds

0.175

0.200

0.225

0.250

0.275

0.300

0.325

0.350

APIo

U+

F1 sc

ore

microsmacrosbinary-microsbinary-macros

Fig. 9: APF1=τF1

IoU+F1 score with different τF1. The value presented are average overτIoU ∈ [0.5 : 0.05 : 0.95]. We use “binary-macro” as our main metric

Page 18: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

18 M. Jia et al.

illustrates that as the value of τF1 increases, APF1=τF1IoU+F1 decreases in different rates

given different choices of f1 score calculation. There are 294 attributes in total, and anaverage of 3.7 attributes per mask in Fashionpedia training data. It’s not surprisingto observe that“Binary-micro” produces high f1-scores in general (higher than 0.97),as the APIoU+F1 score only decreases if the τF1 ≥ 0.97. On the other hand, “macro”averaging in multi-label multi-class classification scenario gives us extremely low f1-scores (0.01− 0.03). This further demonstrates the room for improvement for localizedattribute classification task. We used “binary-macros” as our main metric.

Result visualization. Figure 10 shows that our simple baseline model can de-tect most of the apparel categories correctly. However, it also produces false positivessometimes. For example, it segments legs as tights and stockings (Figure 10(f) ).A possible reason is that both objects have the same shape and stockings are worn onthe legs.

Predicting fine-grained attributes, on the other hand, is a more challenging prob-lem for the baseline model. We summarize several issues: (1) predict more attributesthan needed: (Figure 10(a) , (b) , (c) ); (2) fail to distinguish amongfine-grained attributes: for example, dropped-shoulder sleeve (ground truth) v.s. set-insleeve (predicted) (Figure 10(e) ); (3) other false positives: Figure 10(e) hasa double-breasted opening, yet the model predicted it as the zip opening.

These results further show that there are rooms for improvement and future devel-opment of more advanced computer vision models on this instance segmentation withattribute localization task.

Result visualization on other datasets. Other fashion datasets such as ModaNetand DeepFashion2 also contain instance segmentation masks. Aside from the resultspresented in the main paper (see Table 3 of the main paper) on Fashionpedia, wepresent the qualitative analysis on the segmentation masks generated among Fashion-pedia(Fig. 10), ModaNet(Fig. 11(a-f)) and DeepFashion2(Fig. 11(g-l)) datasets. Photosof the first row in Fig. 11 are from ModaNet. They show that the quality of the gen-erated masks on ModaNet is fairly good and comparable to Fashionpedia in general(Fig. 11(a)). We also have a couple of observations of the failure cases: (1) fail to de-tect apparel objects: for example, the shoe from Fig. 11(c) is not detected. Parts of thepants (Fig. 11(c)) and coat (Fig. 11(d)) are not detected; (2) fail to detect some cate-gories: Fig. 11(e) shows that the shoes on the shoe rack and right foot are not detected,possibly due to a lack of such instances in the ModaNet training dataset. Similar toFashionpedia, ModaNet mostly consist of street style images. See Fig. 12(b) for ex-ample predictions from model trained on Fashionpedia; 3) close-up images: ModaNetcontains mostly full-body images. This might be the possible reason to the decreasedquality of predicted masks on close-up shot like Fig. 11(f).

For DeepFashion 2 (Fig. 11(g,h,k)), the generated segmentation masks tends to nottightly follow the contours of garments in the images. The main reason possibly is thatthe average number of vertices per polygon is 14.7 for Deepfashion2, which is lower thanFashionpedia and ModaNet (see Table 2 in the main text). Our qualitative analysis alsoshows that: 1) the model will generate the segmentation masks of pants (Fig. 11(i)) andtops (Fig. 11(j)) that are not visible in the images. Both of them are covered by a jacket.And we find that in DeepFashion 2, some part of the garments which is covered byother objects are indeed annotated with segmentation masks; 2) better performanceon objects that are not on human body (Fig. 11(l)): DeepFashion 2 contains manycommercial-customer image pairs (both images with and without human body) in thetraining dataset. In contrast, both Fashionpedia and ModaNet contain more imageswith human body than images without human body in the training datasets.

Page 19: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 19

!"#$% & '()#*#+,$-()&,./-,,0'()#*#+1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

6'( &,%*/2'7)$$)&,%-33)$5*8"/,,/)#+$2&,"9':);$2);2*(,0/)#+$21,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,

&,#)8</*#),$-()&,5'7#=,,,0#)8<1,#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,:;#)8<

&,,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,%/"%2,0('8<)$1,,,,,,,,#*8<#"3)&,("$82,0('8<)$1

>"+ &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,

&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1

&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2

@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,,#*8<#"3)&,5)+7/"5,08'//"51

&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,':"/,0#)8<1

>)/$ &,%*/2'7)$$)&,%-33)$5*8"/,,%*/2'7)$$)&,5)+7/"5,0.*$1$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1

&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,'()#*#+,$-()&,%*#+/),95)"%$)=

&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,%2'5$,0/)#+$21,

&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,:;#)8<#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,%8''(,,0#)8<1

&,/)#+$2&,%2'5$,0/)#+$21,#*8<#"3)&,%)$;*#,%/)):),

A5)%% &,,%*/2'7)$$)&,%-33)$5*8"/,'()#*#+,$-()&,B*(;7(,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,

A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,

&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,

@'"$ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,

>)/$ &,,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,#*8<#"3)&,5)+7/"5,08'//"51,

&,#*8<#"3)&,("$82,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1

>"+ &,%*/2'7)$$)&,%-33)$5*8"/,?"*%$/*#)&,2*+2,?"*%$,?"*%$/*#)&,#'53"/,?"*%$,/)#+$2&,3*#*,0/)#+$21,'()#*#+,$-()&,B*(;7(,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(,

>"+ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1

&,,/)#+$2&,?5*%$;/)#+$2,#*8<#"3)&,%)$;*#,%/)):),

&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,

&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,!"#!$%"&'$($)*$(+,&-.(/'.0!/1$(+&!"0*($2/")3&%$($(+

A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(

6'( &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,/)#+$2&,"9':);$2);2*(,,,0/)#+$21

!"#$% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,./-,0'()#*#+1,%*/2'7)$$)&,5)+7/"5,0.*$1,

&,#)8</*#),$-()&,5'7#=,0#)8<1,%*/2'7)$$)&,%-33)$5*8"/,#)8</*#),$-()&,':"/,0#)8<1

&,%*/2'7)$$)&,%-33)$5*8"/,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%

0"1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,091,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,081,,,,,,,,,,,,,,,,,,,,,,,, 0=1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0)1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.1,,,,,,

!"#$%

6'(C)8</*#)

!'8<)$>"+

D2'),E

D2'),F

D2'),ED2'),E

D2'),E D2'),ED2'),FD2'),F

D2'),F

D2'),F

G/"%%)%

!'8<)$,E

D/)):),

@'//"5

C)8</*#)

>)/$

H"8<)$

6'(

!"#$% >78</)

C)8</*#) D/)):),,FD/)):),,E

A5)%%

>"+>"+

D/)):),,E

D/)):),,F

H"8<)$

@'"$

A5)%%

D/)):),,E D/)):),,FA5)%%

!"#$%

@'"$

D/)):),,I

C)8</*#)

6*+2$%,E 6*+2$%,F

>)/$

@'//"5

!'8<)$

>"+A5)%%

@'"$

!'8<)$,F

6*+2$%,E6*+2$%,E

6*+2$%,F

6*+2$%

C)8</*#)

C)8</*#)

C)8</*#)

C)8</*#)

!'8<)$

!'8<)$,E

!'8<)$

D/)):),

D/)):),E,

D/)):),F,

D/)):),E,

D/)):),F,

D/)):),E,

D/)):),F,

H"8<)$

H"8<)$

6*+2$%

&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1

6*+2$%,E

&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1

6*+2$%,F

&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1

!'8<)$,F

&,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%'()#*#+,$-()&,./-,,,0'()#*#+1

6*+2$%

Fig. 10: Baseline results on the Fashionpedia validation set. Masks, boundingboxes, and apparel categories (category score > 0.6) are shown. Attributes fromtop 10 masks (that contain attributes) from each image are also shown. Correctpredictions of objects and attributes are bolded

Generalization to the other image domains. For Fashionpedia, we also in-ference on images found in online shopping websites, which usually displays a singleapparel category, with or without a fashion model. We found out that the learnedmodel works reasonably well if the apparel item is worn by a model (Fig. 12).

8.2 Fashionpedia Ontology and Knowledge Graph

Fig. 13 presents our Fashionpedia ontology in detail. Fig. 14 and 15 displays thetraining data mask counts per category and per attributes. Utilizing the proposed on-tology and the image dataset, a large-scale fashion knowledge graph can be constructedto represent the fashion world in the product level. Fig. 16 illustrates a subset of theFashionpedia knowledge graph.

Apparel graphs. Integrating the main garments, garment parts, attributes, andrelationships presented in one outfit ensemble, we can create an apparel graph repre-sentation for each outfit in an image. Each apparel graph is a structured representation

Page 20: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

20 M. Jia et al.

!"#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!%#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!&#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ !'#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!(#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!)#$$$$$$

!*#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!+#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!,#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!-#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!.#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ !/#$$$$$$

!"#

$%&'

(&&)

*$+,-"./

Fig. 11: Baseline results on ModaNet and DeepFashion2 validation set

(a) (b)

Fig. 12: Generated masks on online-shopping images [53]. (a) and (b) show thesame types of shoes in different settings. Our model correctly detects and cate-gorizes the pair of shoes worn by a fashion model, yet mistakenly detects shoesas jacket and a bag in (b)

Page 21: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 21

(a) Categories (b) Attributes (partially)

Fig. 13: Apparel categories (a) and fine-grained attributes (b) hierarchy in Fash-ionpedia

sleev

esho

e

neckl

inepo

cketdre

ss

top, t-

shirt,

swea

tshirtpa

ntscollarzip

per

jacket

bag,

walletbe

lt

shirt,

blouselap

elbe

adskirtriv

et

glasse

s

tights

, stock

ings

appliq

ue

hair a

cc*watc

hbu

cklecoa

tsho

rtssock ha

truf

fle

swea

ter tieglo

vesca

rfflo

werho

od

cardig

anseq

uin

jumpsu

it

epau

letteve

stfrin

gebowtas

selrib

boncap

e

umbre

lla

leg warm

er102

103

104

Coun

ts

Apparel categories

Fig. 14: Mask counts per apparel categories in training data. “head acc” is shortfor “headband, head covering, hair accessory”

Page 22: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

22 M. Jia et al.

of an outfit ensemble, containing certain types of garments. Nodes in the graph repre-sent the main garments, garment parts, and attributes. Main garments and garmentparts are linked to their respective attributes through different types of relationships.Figure 17 shows more image examples with apparel graphs.

Fashionpedia knowledge graph. While apparel graphs are localized representa-tions of certain outfit ensembles in fashion images, we can also create a single Fashion-pedia knowledge graph (Figure 16). The Fashionpedia knowledge graph is the unionof all apparel graphs and includes entire main garments, garment parts, attributes,and relationships in the dataset. In this way, we are able to represent and understandfashion images in a more structured way.

We expect our Fashionpedia knowledge graph and the database to have applica-bility to extending the existing knowledge graph (such as WikiData [45]) with noveldomain-specific knowledge, improving the underlying fashion product recommendationsystem, enhancing search engine’s results for fashion visual search, resolving ambiguousfashion-related words for text search, and more.

8.3 Dataset Analysis

Fig. 17 shows more annotation examples, represented in the exploded views ofannotation diagrams. Table 5 displays the details about “not sure” and “not on thelist” results during attribute annotation process. We present the result per super-categories of attributes. Label “not sure” means the expert annotator is uncertainabout the choice given the segmentation mask. “Not on the list” means the annotatoris certain that the given mask presents another attributes that is not presented inthe Fashionpedia ontology. Other than “nicknames” (which is the specific name for acertain apparel category), less than 6% of the total masks account for the “not on thelist” category.

Fig. 18 and 19 also compare Fashionpedia and other images datasets in terms ofimage size and vertices per polygons.

set-i

n sle

eve

class

ic (t-

shirt

)we

lt (p

ocke

t)fla

p (p

ocke

t)go

wnpa

tch

(poc

ket)

curv

ed (p

ocke

t)sh

irt (c

olla

r)sla

sh (p

ocke

t)je

ans

drop

ped-

shou

lder

slee

vebl

azer

cap

(sle

eve)

notc

hed

(lape

l)ta

nk (t

op)

shea

th (d

ress

)ra

glan

(sle

eve)

legg

ings

shea

th (s

kirt)

shor

t (sh

orts

)sh

ift (d

ress

)re

gula

r (co

llar)

skat

er (d

ress

)bo

dyco

n (d

ress

)cr

op (t

op)

bike

r (ja

cket

)ca

miso

ledo

lman

(sle

eve)

, bat

wing

(sle

eve)

puff

(sle

eve)

tuni

c (d

ress

)se

am (p

ocke

t)na

pole

on (l

apel

)ba

nded

(col

lar)

peak

(lap

el)

shaw

l (la

pel)

kim

ono

(sle

eve)

halte

r (dr

ess)

bell

(sle

eve)

unde

rshi

rtbi

shop

(sle

eve)

kang

aroo

(poc

ket)

shirt

(dre

ss)

tube

(top

)sli

p (d

ress

)cir

cula

r flo

unce

(sle

eve)

blan

ket (

coat

)ov

ersiz

ed (l

apel

)di

rndl

(ski

rt)tru

cker

(jac

ket)

tuxe

do (j

acke

t)ho

odie

tuni

c (to

p)st

and-

away

(col

lar)

swea

ter (

dres

s)ha

lter (

top)

skat

er (s

kirt)

crop

(pan

ts)

poet

(sle

eve)

hobb

le (s

kirt)

bom

ber (

jack

et)

leg

of m

utto

n (s

leev

e)ju

mpe

r (dr

ess)

sailo

r (pa

nts)

peg

(pan

ts)

over

sized

(col

lar)

peas

ant (

top)

track

(jac

ket)

pea

(coa

t)ca

rgo

(poc

ket)

tulip

(sle

eve)

sund

ress

pete

r pan

(col

lar)

sailo

r (sh

irt)

roll-

up (s

horts

)be

rmud

a (s

horts

)po

lo (s

hirt)

trenc

h (c

oat)

man

darin

(col

lar)

capr

i (pa

nts)

blou

son

(dre

ss)

polo

(col

lar)

tedd

y be

ar (c

oat)

culo

ttes

ragl

an (t

-shi

rt)ra

h-ra

h (s

kirt)

loun

ge (s

horts

)pu

ffer (

jack

et)

wrap

(coa

t)wr

ap (d

ress

)pa

rka

crop

(jac

ket)

norfo

lk (j

acke

t)ba

ll go

wn (s

kirt)

kafta

nsw

eatp

ants

hip-

hugg

ers (

pant

s)bo

w (c

olla

r)va

rsity

(jac

ket)

rugb

y (s

hirt)

bole

roac

cord

ion

(ski

rt)sm

ock

(top)

jodh

pur

chan

el (j

acke

t)sa

fari

(jack

et)

wind

brea

ker

henl

ey (s

hirt)

ringe

r (t-s

hirt)

tie-u

p (s

horts

)gy

psy

(ski

rt)m

ilitar

y (c

oat)

puffe

r (co

at)

dust

er (c

oat)

carg

o (p

ants

)te

a (d

ress

)ki

mon

otu

tu (s

kirt)

nigh

tgow

nsh

earli

ng (c

oat)

chem

ise (d

ress

)ha

rem

(pan

ts)

culo

tte (s

horts

)pe

a (ja

cket

)cla

ssic

milit

ary

(jack

et)

trunk

sas

ymm

etric

(col

lar)

wrap

(ski

rt)pr

airie

(ski

rt)ch

else

a (c

olla

r)dr

ess (

coat

)sw

ing

(coa

t)du

ffle

(coa

t)tra

ck (p

ants

)tu

lip (s

kirt)

robe

carg

o (s

horts

)an

orak

cam

o (p

ants

)go

det (

skirt

)bo

oty

(sho

rts)

boar

dsho

rtssa

ilor (

colla

r)ch

eong

sam

sra

inco

atne

hru

(jack

et)

kilt

saro

ng (s

kirt)

carg

o (s

kirt)

jabo

t (co

llar)

bloo

mer

ssk

ort

mao

(jac

ket)

flam

enco

(ski

rt)

101

102

103

104

Coun

ts

Apparel attributes (nickname)

(a)

Fig. 15: Mask counts per attributes in training data, grouped by super categories.Best viewed digitally

Page 23: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 23

wrist

-leng

th

abov

e-th

e-hi

p (le

ngth

)

shor

t (le

ngth

)

hip

(leng

th)

min

i (le

ngth

)

max

i (le

ngth

)

thre

e qu

arte

r (le

ngth

)

floor

(len

gth)

abov

e-th

e-kn

ee (l

engt

h)

micr

o (le

ngth

)

elbo

w-le

ngth

knee

(len

gth)

mid

i

belo

w th

e kn

ee (l

engt

h)

sleev

eles

s

103

104Co

unts

Apparel attributes (length)

(a)

roun

d (n

eck)

v-ne

ckov

al (n

eck)

scoo

p (n

eck)

stra

ight

acr

oss (

neck

)bo

at (n

eck)

swee

thea

rt (n

eckl

ine)

u-ne

ckpl

ungi

ng (n

eckl

ine)

asym

met

ric (n

eckl

ine)

colla

rless

crew

(nec

k)hi

gh (n

eck)

one

shou

lder

squa

re (n

eckl

ine)

turtl

e (n

eck)

halte

r (ne

ck)

off-t

he-s

houl

der

surp

lice

(nec

k)qu

een

anne

(nec

k)illu

sion

(nec

k)co

wl (n

eck)

keyh

ole

(nec

k)ch

oker

(nec

k)cr

osso

ver (

neck

)

102

103

Coun

ts

Apparel attributes (neckline type)

(b)

zip-u

p

singl

e br

east

ed

fly (o

peni

ng)

no o

peni

ng

wrap

ping

doub

le b

reas

ted

lace

up

buck

led

(ope

ning

)

togg

led

(ope

ning

)

chai

ned

(ope

ning

)

102

103

104

Coun

ts

Apparel attributes (opening type)

(c)

norm

al w

aist

no w

aist

line

high

wai

st

low

waist

drop

ped

waist

line

empi

re w

aist

line

basq

ue (w

asitl

ine)

103

104

Coun

ts

Apparel attributes (waistline)

(d)

no sp

ecia

l man

u_te

ch*

linin

gpr

inte

dga

ther

ing

wash

edpl

eat

appl

ique

(a)

bead

(a)

ruch

ed slit

dist

ress

edcu

tout

rivet

(a)

tiere

dse

quin

(a)

fraye

dbu

rnou

tqu

ilted

perfo

rate

dsm

ocki

ngem

boss

ed

102

103

104

Coun

ts

Apparel attributes (textile finishing, manufacturing techniques)

(e)

plai

n (p

atte

rn)

flora

lst

ripe

chec

kle

tters

, num

bers

abst

ract

geom

etric do

tca

rtoon

plan

tto

ile d

e jo

uyfa

ir isl

epa

isley

leop

ard

chev

ron

cam

oufla

gech

eeta

har

gyle

snak

eski

n (p

atte

rn)

houn

dsto

oth

(pat

tern

)he

rring

bone

(pat

tern

)ze

bra

gira

ffepe

acoc

k

102

103

104

Coun

ts

Apparel attributes (textile pattern)

(f)

sym

met

rical

regu

lar (

fit)

tigh

t (fit

)

st

raig

ht

lo

ose

(fit)

a-li

ne

p

encil

fit a

nd fl

are

asy

mm

etric

al

c

ircle

flar

e

b

ell

peg

trum

pet

ove

rsize

d

m

erm

aid

hig

h lo

w

w

ide

leg

tent

boo

tcut

pep

lum

bal

loon

cur

ved

(fit)

bag

gy

b

ell b

otto

m

102

103

104

Coun

ts

Apparel attributes (silhouette)

,(g)

no n

on-te

xtile

mat

eria

l

plas

tic

met

al

gem fur

rubb

er

sued

e

shea

rling

feat

her

croc

odile

snak

eski

n

wood

bone

ivor

y

100

101

102

103

104

105

Coun

ts

Apparel attributes (non-textile material type)

(h)

Fig. 15: Mask counts per attributes in training data, grouped by super categories(cont.). Best viewed digitally

Page 24: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

24 M. Jia et al.

CONTAIN

CO

NTA

IN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONT

AIN

CONTAIN

CO

NTA

IN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAINCONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CO

NTAIN

CONT

AIN

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CO

NTA

INCONT

AIN

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CO

NTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

CONTAIN

CONT

AIN

CONTAIN

CONT

AIN

CONTAIN

CO

NTAIN

CONTAIN

HAS_ATTRIBUTE

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

EHAS

_ATT

RIBUTE

CONTAIN

CONTAIN

CONTAIN

CO

NTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CO

NTAIN

CONTAIN

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

CO

NTA

IN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

EH

AS_ATTRIBU

TE

HAS

_ATT

RIB

UTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

EHAS

_ATT

RIBU

TE

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

CONT

AIN

CONTAIN

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_ATTRIBUTEHA

S_AT

TRIB

UTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTECONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_A

TTRIB

UTE

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTEHAS_ATTRIBUTE

CONT

AIN

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

CONTAIN

CO

NTA

IN

CONTAIN

HAS

_ATT

RIB

UTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_

ATTR

IBUT

E

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTEH

AS_A

TTR

IBU

TEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_ATTRIBUTEHAS_ATTRIB…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATT…

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

CONTAIN

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

CONTAIN

HAS

_ATT

RIB

UTE

HAS_ATTR

IBUTE

HAS_

ATTR

IBUT

E HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

EHA

S_AT

TRIB

UTE

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

CONT

AIN

CO

NTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CO

NTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

CO

NTA

IN

CONTAIN

product_3

Jacket / Blazer

product_2

product_1

product_7

Coat

product_6

product_5

product_4

product_9

Shorts

product_8

product_13

Skirt

product_12

product_11

product_10

product_18

Tops / T-ShirtSweatshirt

product_17

product_16

product_15

product_14

product_20

Dress

product_19

Bolero

Collarless

Set-in sleeve

Symm-etrical

Regular (fit)

Above-the-…

Wrist-length

Crop (Jacket)

Plain(pattern)

EmpireWaistline

Asym-metrical

Pleat /pleated

Zip-up

Peplum

Leather /Faux-leather

Biker(Jacket) /

Napoleon(Lapel)

Threequarter

Puffer(Jacket) /

Down(Jacket)

Quilted

Oversized(Collar)

Above-the-k…

Check / Plaid/ Tartan

Dropped-sh…

Loose (fit)

Fur

Hip (length)

Washed

Distressed /Ripped

Slit

Fly (Opening)

Mini(length) /

Low waist /

Curved(Pocket)

NormalWaist /

Pencil

StraightTight (fit) /

Slim (fit) /Skinny (fit)

Skater (Skirt)

Circle

Flare

Off-the-shoulder

Straightacross

(Neck)

Tiered /layered

Sleeveless

Shirt (Collar)

Ringer(T-shirt)

Round(Neck) /

Roundneck Stripe

One shoulder

Perfo-rated

Dolman(Sleeve) &

CONTAIN

CONTAIN

CONTAIN

CO

NTA

IN

CONTAIN

CONTAINCONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

H…

HAS_ATTRIBUTE

HAS_ATTRIBUTEH

AS_A

HAS_

ATTR

IBUT

E

HAS_…

HAS_ATTRIBUTE

Jacket / Blazer

Set-in sleeve

Regular (fit)

Above-the-hip

Plain(pattern)

Asymmet -rical

Zip-up Leather

Biker(Jacket)

Napoleon(Lapel)

Threequarter(length) /

CO

NTAIN

CONTAIN

CONTAINCONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_

ATTR

IBUT

E

Skirt

Symmet -rical

Plain(pattern) Above

-the-Knee

NormalWaist

Skater (Skirt)

Circle

FlareCONTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRI…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

Shorts

Symmetr -ical

Regular (fit)

Washed

Distressed /Ripped

Fly (Opening)

Mini(length)

CONTAINCONTAIN

CONTAIN

CONTAIN CONTAIN

CONT

AIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

CoatCollarless

Set-in sleeve

Symmet -rical

Wrist-length

Fur

Hip (length)

Apparel CategoriesFine-grained Attributes

Fig. 16: Fashionpedia Knowledge Graph: we present a subset of the Fashionpediaknowledge graph by aggregating 20 annotated products. The knowledge graphcan be used as a tool for generating structural information

We compare image resolutions between Fashionpedia and four other segmentationdatasets (COCO and LVIS share the same images). Fig. 18 shows that images inFashionpedia have the most diverse image width and height. While ModaNet has themost consistent resolutions of images. Note that high resolution images will burden thedata loading process of training. With that in mind, we will release our dataset in boththe resized and the original versions.

We also report the distribution of number of vertices per polygons in Fig. 19. Thismeasures the annotation effort in mask annotation. Fashionpedia has the second-widestrange, next to LVIS.

Table 5: Percentage of attributes in Fashionpedia broken down by super-class.“Tex finish, manu-tech.” is short for “Textile finishing, Manufacturing tech-niques”. Summaries of “not sure” and “not on the list” during attributes an-notations are also presented. It was calculated by the counts divided by thetotal masks with attributes. “not sure” is mainly due to occlusion inside theimages, which cause some super-classes (such as waistline, opening type, andlength) are unidentifiable in the images. The percentage of “not on the list” isless than 15%. This demonstrates the comprehensiveness of our Fashionpediaontology

Super-category class count not sure not on the list

Length 15 12.79% 0.01%Nickname 153 9.15 % 12.76%Opening Type 10 32.69% 3.90%Silhouettes 25 2.90% 0.27%Tex finish, manu-tech 21 4.47% 1.34%Textile Pattern 24 2.18% 5.30%None-Textile Type 14 4.90% 4.07%Neckline 25 9.57% 3.38%Waistline 7 30.46% 0.17%

Page 25: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 25

!"#$%

&#%'()*'

+,-% .*/01,-'#2#34

5#6*'0*'#3$7

8,9("*0:"2%$

!*"2#

;,**"9

<*"%%'%

=*''>'=*''>' !,?6'$!,?6'$

@"?6'$

&#%'()*'

A9'%%

B'*,:C$7'C6#''

A,D)*'0)9'"%$'E

F,,%'01G2$4;7'?60H0!*"2E0

-*"2#

=7,'%

=*''>'=*''>' F"-'*

<*"%%'%

!*"2#

=2#3*'C)9'"%$'E

=*2(01G2$40

=/(('$92?"*

=/(('$92?"*

5%/(('$92?"*

=/(('$92?"*I'3D*"901G2$4

!*"2#

=7,'%;"-'

=?"9G!"#$/7,%'

;,"$

&#%'()*'

+,-%

J2#20*'#3$7

ID?7'E

=/(('$92?"*

K9"--2#3

.*,,90*'#3$7F,,%'01G2$4

5)%$9"?$01-"$$'9#4

8'?6*2#'

=7,9$% <*"%%'%

IDGG*'IDGG*' IDGG*'IDGG*'

=*2(01G2$4

.*/01,-'#2#34

=*2(01G2$4

!92#$'E

K"%7'E

.9"/'E

!*"2#

!,?6'$

!"#$%&'()*&+),-- !"9$0,G

+'L$2*'0.2#2%72#3 +'L$2*'0!"$$'9# =2*7,D'$$' M-'#2#30+/-'

F'#3$7 82?6#"(' K"2%$*2#'

Fig. 17: Example images and annotations from our dataset: the images are anno-tated with both instance segmentation masks and fine-grained attributes (blackboxes)

Page 26: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

26 M. Jia et al.

Fig. 18: Image size comparison among Fashionpedia, ModeNet, DeepFashion2,and COCO2017, LVIS. Only training images are shown. The Fashionpedia im-ages has the most diverse resolutions. Note that COCO2017 and LVIS havehigher resolution images for annotation. The distribution presented here are thepublicly available photos

0 250 500 750 1000 1250 1500 1750Number of vertices per polygon

10 3

10 1

101

Perc

ent o

f pol

ygon

DeepFashion2ModaNetCOCO2017LVISFashionpedia

Fig. 19: The number of vertices per polygon. This represents the quality of masksand the efforts of annotators. Values in the x-axis were discretized for bettervisual effect. Y-axis is on log scale. Fashionpedia has the second widest range,next to LVIS

Page 27: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 27

8.4 Fashionpedia dataset creation details

Image collection. To avoid photo bias, all the images are randomly collectedfrom Flickr and free license stock photo websites (Unsplash, Burst by Shopify, Freestocks, Kaboompics, and Pexels). The collected images are further verified manuallyby two fashion experts. Specifically, they check the scenes diversity and make sure theclothing items were visible and annotatable in the images. The estimated image typebreakdown is listed as follows: street style images (30% of the full dataset), celebrityevents images (30%), runway show images (30%), and online shopping product images(10%). For gender distribution, the gender in 80% of images are female, and 20% ofimages are male.

Fig. 20: Example of different type (people position/gesture, full/half shot, occlu-sion, scenes, garment types, etc.) of images in Fashionpedia dataset

We did aim to address the issue of photographic bias in the image collection process.Our dataset includes the images that are not centered, not full shot and with occlusion(see examples in Fig. 20). Furthermore, our focus is to identify clothing items, not toidentify people during the image collection process.

Crowd workers and 10-day training for mask annotation. In the spirit ofsharing the same apparel vocabulary for all the annotators, we prepared a detailedtutorial (with text descriptions and image examples) for each category and attributesin the Fashionpedia ontology (see Fig. 21 for an example). Before the official annotationprocess, we spent 10 days on training the 28 crowd workers for the following three mainreasons.

First, some apparel categories are commonly referred as other names in general.For example, “top” is a general term for “shirt”, “sweater”, “t-shirt”, “sweatshirt”.Some annotators can mistakenly annotate a “shirt” as a “top”. We need to train theseworkers so they have the same understanding of the proposed Fashionpedia ontology.

Page 28: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

28 M. Jia et al.

Fig. 21: Annotation tutorial example for shirt and top

Utilizing the prepared tutorials (see Fig. 21 for an example), we trained and educatedannotators on how to distinguish among different apparel categories.

Second, there are fine-grained differences among apparel categories. For example,we observed that some workers initially had difficulty in understanding the differenceamong different garment parts, such as tassel and fringe. To help them understand thedifference of these objects, we ask them to practice and identify more sample imagesbefore the annotation process. Fig. 22 shows our tutorials for these two categories. Wespecifically shows some correct and wrong examples of annotations.

Third, we ask for the quality of annotations. In particular, we ask the annotatorsto follow the contours of garments in the images as closely as possible. The polygonannotation process is monitored and verified for a few days before the workers startedthe actual annotation process.

Quality control of debatable apparel categories. During the annotation pro-cess, we allow annotators to ask questions about the uncertain categories. Two fashion

Page 29: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 29

!"#$%&'%()&*+,-%%%%%%% ./)),0$%1%!)/*+%2#3%$/%$)#0,%()&*+,

!"#$%&'%4#'',5-%%%%%%% ./)),0$%1%!)/*+%2#3%$/%$)#0,%4#'',5

Fig. 22: Annotation tutorial for fringe and tassel

experts monitored the annotation process by answering questions, checking the anno-tation quality, and providing weekly feedback to annotators.

Instead of asking annotators to rate their confidence level of each segmentationmask, we asked them to send back all the uncertain masks to us during the annotation.The same two fashion experts made the final judgement and gave the feedback to theworkers on these debatable or unsure fashion categories. Some examples of debatableor fuzzy fashion items that we have documented can be found in Figure 23.

8.5 Discussion

Does this dataset include the images or labels of previous datasets? Weonly include the previous datasets for comparison. Our dataset doesnt intentionallyuse any images or labels from previous datasets. All the images and labels from Fash-ionpedia are newly collected and annotated.

Who were the fashion experts annotating localized attributes in Fash-ionpedia dataset? The fashion experts are the 15 fashion graduate students that werecruited from one of the top fashion design institutes. For double-blind policy, we can-not mention the name of the university. But we will release the name of this universityand the collaborators from this university in the final version of this paper.

Instance segmentation v.s. semantic segmentation. We didn’t conduct se-mantic segmentation experiments on our dataset for the following two reasons: 1)Although semantic segmentation is a useful task, we believe instance segmentation ismore meaningful for fashion images. For example, if we need to distinguish the differentshoe style of a fashion image containing 3 pair of different shoes, instance segmenta-tion (Figure 24(a)) can help us distinguish each shoe separately. However, semanticsegmentation (Figure 24(b)) will mix all the shoe instances together. 2) Semantic seg-mentation is the sub-problem of instance segmentation. If we merge the same detected

Page 30: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

30 M. Jia et al.

Fig. 23: Example of debatable fashion items in Fashionpedia dataset. The ques-tions are asked by the crowdworkers. The answers are provided by two fashionexperts

Fig. 24: Instance segmentation (left) and semantic segmentation (right)

object class from our instance segmentation experimental result, it yields the resultsfor semantic segmentation.

Which image has the most annotated masks? In Fashionpedia dataset, themaximum number of segmentation masks in an image is 74 (Fig. 25). We find thatmost of the masks are belonging to “rivets” (garment parts).

What’s the difference between Fashionpedia and other fine-grained datasetslike CUB-200? We propose to localize fine-grained attributes within segmentationmasks of images. This is a novel task with real-world application to the best of ourknowledge. The differences between Fashionpedia and CUB are as follows: 1) CUB useskeypoints as annotation to indicate different locations on birds, while Fashionpedia hassegmentation masks of garments, garment parts, and accessories; 2) Fashionpedia at-tributes are associated with garment or garment part instances in images, whereasCUB provides global attributes, not associated with any specific keypoints.

Page 31: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 31

iMat-Fashion Kaggle challenges To advance state-of-the-art of visual analysisof clothing, we hosted two kaggle challenges (imaterialist-fashion) on Kaggle in 2019 2

and 2020 3 respectively.

2 https://www.kaggle.com/c/imaterialist-fashion-2019-FGVC63 https://www.kaggle.com/c/imaterialist-fashion-2020-fgvc7

!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#

!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.

0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8

!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#

!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.

0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8

!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#

!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.

0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8!"#$%#&'()#%$(*+,$,+$#'-"$.'/0$1(&"$'//,-('&#%$),-')(2#%$'&&3(45&#/6

!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#

!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.

0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8

!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#

!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.

0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8

73(8(*')$(.'8# 73(8(*')$(.'8#$1(&"$.'/0/

Fig. 25: The image with 74 masks in Fashionpedia dataset

Page 32: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

32 M. Jia et al.

References

1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe-mawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machinelearning. In: OSDI (2016) 11

2. Attneave, F., Arnoult, M.D.: The quantitative study of shape and pattern percep-tion. Psychological bulletin (1956) 10

3. Bloomsbury.com: Fashion photography archive, retrieved May 9, 2019from https://www.bloomsbury.com/dr/digital-resources/products/

fashion-photography-archive/ 54. Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.:

Apparel classification with style. In: ACCV (2012) 4, 55. Du, X., Lin, T.Y., Jin, P., Ghiasi, G., Tan, M., Cui, Y., Le, Q.V., Song, X.:

Spinenet: Learning scale-permuted backbone for recognition and localization. In:CVPR (2020) 11

6. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their at-tributes. In: CVPR (2009) 1

7. FashionAI: Retrieved May 9, 2019 from http://fashionai.alibaba.com/ 48. Fashionary.org: Fashionpedia - the visual dictionary of fashion design, retrieved

May 9, 2019 from https://fashionary.org/products/fashionpedia 59. Ferrari, V., Zisserman, A.: Learning visual attributes. In: Advances in neural in-

formation processing systems (2008) 1, 210. Fu, C.Y., Berg, T.L., Berg, A.C.: Imp: Instance mask projection for high accuracy

semantic segmentation of things. arXiv preprint arXiv:1906.06597 (2019) 511. Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: Deepfashion2: A versatile bench-

mark for detection, pose estimation, segmentation and re-identification of clothingimages. In: CVPR (2019) 2, 4, 5, 8, 10

12. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-rate object detection and semantic segmentation. In: CVPR (2014) 1

13. Goyal, P., Dollar, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tul-loch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1hour. arXiv preprint arXiv:1706.02677 (2017) 11

14. Guo, S., Huang, W., Zhang, X., Srikhanta, P., Cui, Y., Li, Y., Adam, H., Scott,M.R., Belongie, S.: The imaterialist fashion attribute dataset. In: ICCV Workshops(2019) 1, 4, 5

15. Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instancesegmentation. In: CVPR (2019) 3, 8, 9, 10

16. Han, X., Wu, Z., Huang, P.X., Zhang, X., Zhu, M., Li, Y., Zhao, Y., Davis, L.S.:Automatic spatially-aware fashion concept discovery. In: ICCV (2017) 4, 5

17. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: ICCV (2017) 1, 1118. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.

In: CVPR (2016) 1119. He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion

trends with one-class collaborative filtering. In: WWW (2016) 420. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In:

ECCV (2012) 1121. Hsiao, W.L., Grauman, K.: Learning the latent look: Unsupervised discovery of a

style-coherent embedding from fashion images. In: ICCV (2017) 4, 522. Huang, J., Feris, R., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual

attribute-aware ranking network. In: ICCV (2015) 4, 5

Page 33: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

Fashionpedia 33

23. Inoue, N., Simo-Serra, E., Yamasaki, T., Ishikawa, H.: Multi-label fashion imageclassification with minimal human supervision. In: ICCV (2017) 4, 5

24. Kendall, E.F., McGuinness, D.L.: Ontology engineering. Synthesis Lectures on TheSemantic Web: Theory and Technology (2019) 3

25. Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it:Matching street clothing photos in online shops. In: ICCV (2015) 4, 5

26. Kiapour, M.H., Yamaguchi, K., Berg, A.C., Berg, T.L.: Hipster wars: Discoveringelements of fashion styles. In: ECCV (2014) 4, 5

27. Kirillov, A., He, K., Girshick, R., Rother, C., Dollar, P.: Panoptic segmentation.In: CVPR (2019) 3

28. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S.,Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: Connecting languageand vision using crowdsourced dense image annotations. IJCV (2017) 3, 4, 5

29. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile clas-sifiers for face verification. In: ICCV (2009) 1

30. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramid networks for object detection. In: CVPR (2017) 11

31. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P.,Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014) 3, 5,8, 10, 11

32. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothesrecognition and retrieval with rich annotations. In: CVPR (2016) 1, 4, 5

33. Lopez, A.: Fdupes is a program for identifying or deleting duplicate files residingwithin specified directories., retrieved May 9, 2019 from https://github.com/

adrianlopezroche/fdupes 734. Mall, U., Matzen, K., Hariharan, B., Snavely, N., Bala, K.: Geostyle: Discovering

fashion trends and events. In: ICCV (2019) 535. Matzen, K., Bala, K., Snavely, N.: StreetStyle: Exploring world-wide clothing styles

from millions of photos. arXiv preprint arXiv:1706.01869 (2017) 436. Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011) 137. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detec-

tion with region proposal networks. In: Advances in neural information processingsystems (2015) 1

38. Rosch, E.: Cognitive representations of semantic categories. Journal of experimen-tal psychology: General (1975) 2, 5

39. Rubio, A., Yu, L., Simo-Serra, E., Moreno-Noguer, F.: Multi-modal embedding formain product detection in fashion. In: ICCV (2017) 4, 5

40. Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics infashion: Modeling the perception of fashionability. In: CVPR (2015) 4

41. Simo-Serra, E., Ishikawa, H.: Fashion style in 128 floats: Joint ranking and classi-fication using weak data for feature extraction. In: CVPR (2016) 4, 5

42. Takagi, M., Simo-Serra, E., Iizuka, S., Ishikawa, H.: What makes a style: Experi-mental analysis of fashion prediction. In: ICCV (2017) 4, 5

43. Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam,H., Perona, P., Belongie, S.: The inaturalist species classification and detectiondataset. In: CVPR (2018) 2, 4

44. Vittayakorn, S., Yamaguchi, K., Berg, A.C., Berg, T.L.: Runway to realway: Visualanalysis of fashion. In: WACV (2015) 4, 5

45. Vrandecic, D., Krotzsch, M.: Wikidata: a free collaborative knowledge base (2014)5, 22

Page 34: 1 arXiv:2004.12276v2 [cs.CV] 18 Jul 2020

34 M. Jia et al.

46. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSDBirds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, California Institute ofTechnology (2011) 2, 4

47. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://

github.com/facebookresearch/detectron2 (2019) 1148. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for bench-

marking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017) 4,5

49. Yamaguchi, K., Berg, T.L., Ortiz, L.E.: Chic or social: Visual popularity analysisin online fashion networks. In: ACM MM (2014) 4, 5

50. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashionphotographs. In: CVPR (2012) 4, 5

51. Yu, A., Grauman, K.: Semantic jitter: Dense supervision for visual comparisonsvia synthetic images. In: ICCV (2017) 4

52. Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., Darrell,T.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In:CVPR (2020) 1

53. ZARA.com: Zara white leather flat ankle boots with top stitching size 5 bnwt,retrieved May 9, 2019 from https://www.zara.com 20

54. Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: A large-scale streetfashion dataset with polygon annotations. In: ACM MM (2018) 2, 4, 5, 8, 10

55. Zoph, B., Ghiasi, G., Lin, T.Y., Cui, Y., Liu, H., Cubuk, E.D., Le, Q.V.: Rethinkingpre-training and self-training. arXiv preprint arXiv:2006.06882 (2020) 11