MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia...

118
MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 MySQL Schema Design in Practice Jaime Crespo Percona Live Amsterdam 2016 -Amsterdam, 3 Oct 2016- https://wikitech.wikimedia.org/wiki/User:Jcrespo/plam16

Transcript of MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia...

Page 1: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

MySQL Schema Design in Practice

Jaime CrespoPercona Live Amsterdam 2016

-Amsterdam, 3 Oct 2016-

https://wikitech.wikimedia.org/wiki/User:Jcrespo/plam16

Page 2: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

2

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Agenda

0. Introduction & setup 5. Case #5: Revisions and deletions

1. Case #1: Random pages 6. Case #6: A large table

2. Case #2: Supporting 290 Languages

7. Case #7: What links here

3. Case #3: An abnormal denormalization

8. Case #8: Anecdotes: The ghost tables and Timestamps

4. Case #4: Key-value system 9. Case #9: Slots

Page 3: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

3

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

INTRODUCTION & SETUPMySQL Schema design in practice

Page 4: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

4

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

● Sr. Database Administrator at Wikimedia Foundation

● Used to work as a trainer for Oracle (MySQL), as a Consultant (Percona) and as a Freelance administrator (DBAHire.com)

This is me fighting bad query performance

Page 5: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

5

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Schema design is key for query performance

● Check my past presentations at:http://www.slideshare.net/jynus/

Page 6: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

6

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Mediawiki as the example application• Mediawiki code distributed under GPL 2 or later• All Wikimedia project's data licensed under CC-BY-SA-2.5

Page 7: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

7

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Accessing Wikimedia Production Database (I)

● Login or register a Wikimedia SUL account (for example, on https://en.wikipedia.org )

● Use that account to authenticate on Quarry: http://quarry.wmflabs.org/

● Send your queries to the right database (for example, enwiki_p)!

Page 8: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

8

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Accessing Wikimedia Production Database (II)

Page 9: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

9

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Session Dynamic

● A real database design problem is presented

● A brief discussion starts (5-10 minutes top)– If you know the answer, let other people do proposals first;

crazy ideas are encouraged

– assume anything you need

– this is the place to be wrong, not to show of

● We analyze the proposals, balance its strengths and weaknesses and compare them with the one in use

Page 10: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

10

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Case #0: Designing a schema for Wikipedia

● Which entities would we need?

● Which relationships?

● What kind of queries are probably the most common ones?

● What do you think are the main scalability and pain points?

Page 11: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

11

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Potential entities related to content

● Page: do we need one?

● Edit: diferent from page?

● Dif: Should it be a first-class entity?

● Revision: Large table?

Page 12: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

12

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

What about page types and properties?

● Talk pages: Same entity or separate table?

● Categories, images (files), redirections: are they regular pages or should they be stored on its own entity?

● Similar question for image description pages

● Categories and tags for pages/revisions: how to implement them?

● Protection: Some pages can have restrictions on who can edit them

● Other properties (tags?)

Page 13: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

13

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Identifiers

● How to solve the problem of 2 pages that should have the same title: Ben Hur (1959 film) and Ben Hur (2016 film)?

● Should we use the page name or an arbitrary id to identify a page?

● What about revision ids?

● If we needed it, should it be a UUID or a numerical id?

Page 14: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

14

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 15: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

15

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Mediawiki Schema

Page 16: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

16

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Disclaimers

● The best solution on paper is not necessarily the best on production– It may be too difficult to migrate existing logic (15-

year old application) or not worth it– Performance is not the only metric: Security,

scalability, reliability, simplicity, etc.

● You will see many examples of compromises like this on our application

Page 17: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

17

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #1: RANDOM PAGESMySQL Schema design in practice

Page 18: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

18

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Problem #1 Description

● At the left of each page there is a link to a “Random article”

● It should allow to filter by namespace (e.g. not all pages are articles)

● It is a relatively important page compared to, let’s say, Google’s “I am feeling lucky”, as it will give an overview of what kind of articles you will find in a project

Page 19: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

19

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Problem #1 Restrictions

● It has to work as fast as a regular page, but for obvious reasons cannot be cached

● It has to be realistically pseudorandom

● It has to always return a result

● It has to work on a continuously increasing number of pages, and scale from 1 to millions

Page 20: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

20

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Potential solutions

● ORDER BY rand() LIMIT 1

● Use a well-distributed integer id, use it to get one at random

● Questions?– What indexes would be beneficial on each case?– How to count the number of total ids?– How to handle deletions?

Page 21: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

21

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 22: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

22

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual solution: Table design (I)CREATE TABLE /*_*/page ([…] -- A page name is broken into a namespace and a title. -- The namespace keys are UI-language-independent constants, -- defined in includes/Defines.php page_namespace int NOT NULL,

-- The rest of the title, as text. -- Spaces are transformed into underscores in title storage. page_title varchar(255) binary NOT NULL,

-- 1 indicates the article is a redirect. page_is_redirect tinyint unsigned NOT NULL default 0,[…] -- Random value between 0 and 1, used for Special:Randompage page_random real unsigned NOT NULL,

Page 23: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

23

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual solution: Table design (II)[…] CREATE INDEX /*i*/page_random ON /*_*/page (page_random);

Page 24: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

24

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual solution: Relevant code protected function getQueryInfo( $randstr ) { $redirect = $this->isRedirect() ? 1 : 0; $tables = [ 'page' ]; $conds = array_merge( [ 'page_namespace' => $this->namespaces, 'page_is_redirect' => $redirect, 'page_random >= ' . $randstr ], $this->extra ); $joinConds = [];

// Allow extensions to modify the query Hooks::run( 'RandomPageQuery', [ &$tables, &$conds, &$joinConds ] );

return [ 'tables' => $tables, 'fields' => [ 'page_title', 'page_namespace' ], 'conds' => $conds, 'options' => [ 'ORDER BY' => 'page_random', 'LIMIT' => 1, ], 'join_conds' => $joinConds ]; }

From: mediawiki/core/includes/specials/SpecialRandompage.php

Page 25: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

25

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual solution: Query generated

SELECT page_title, page_namespaceFROM pageLEFT JOIN page_propsON page_id = pp_page AND pp_propname = ?WHERE page_namespace IN (…) AND page_is_redirect = 0 AND page_random >= $randORDER BY page_randomLIMIT 1;

Page 26: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

26

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual solution: Performance (I)mysql> EXPLAIN SELECT … \G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: rangepossible_keys: name_title,page_random,page_redirect_namespace_len key: page_random key_len: 8 ref: NULL rows: 20473233 Extra: Using where*************************** 2. row *************************** id: 1 select_type: SIMPLE table: page_props type: eq_refpossible_keys: PRIMARY,pp_propname_page,pp_propname_sortkey_page key: PRIMARY key_len: 66 ref: enwiki.page.page_id,const rows: 1 Extra: Using where; Using index; Not exists

Page 27: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

27

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual solution: Performance (II)

mysql> SELECT … FROM sys.x$statement_analysis …

*************************** 1. row *************************** exec_count: 27126203 max_latency: 450802755000 avg_latency: 755698000 lock_latency: 2224515688000000 rows_sent: 27125869 rows_sent_avg: 1 rows_examined: 0rows_examined_avg: 0 rows_affected: 0rows_affected_avg: 0 tmp_tables: 0 tmp_disk_tables: 0 rows_sorted: 777598sort_merge_passes: 0

+------------------------------+| sys.format_time(avg_latency) |+------------------------------+| 755.70 us |+------------------------------+

Page 28: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

28

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #2: SUPPORTING 290 LANGUAGES

MySQL Schema design in practice

Page 29: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

29

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikipedia launch and early growth

● Wikipedia was launched on January 15, 2001, as a single English-language edition

● By August 8, 2001, Wikipedia had over 8,000 articles.

● On September 25, 2001, Wikipedia had over 13,000 articles.

● By the end of 2001, it had grown to approximately 20,000 articles and 18 language editions.

References: https://en.wikipedia.org/wiki/Wikipedia#History

Page 30: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

30

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Single-tenancy vs Multi-tenancy

● 1 database per wiki:– Easier to code– Easier to scale (?) - you can move wikis to a diferent

server

● Several wikis on a single database– More efficiency, specially for small wikis– They can share existing user database

Page 31: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

31

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Should we Shard?

● From early on, people was telling us “you need to shard to scale”

● Is it really such a bad idea? When is it needed? When can it be avoided?

● If we shard, based on which key?

Page 32: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

32

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 33: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

33

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia solution

● As of this lines, the Wikimedia Foundation hosts 897 wiki-like projects (diferent types and languages)

● They are divided on 7 “shards” (functional partitions)– 1 master per shard and datacenter– Multiples slaves sharing the read load

● English wikipedia has its own separate shard (s1)

● s3 host most of the wikis (892)

Page 34: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

34

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Functional partitioning

Source: https://dbtree.wikimedia.org

Page 35: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

35

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

No sharding

● Our number of edits is “low” compared to our reads– Only 500-3000 logical page edits per minute

https://grafana.wikimedia.org/dashboard/db/edit-count

– That means 2000-8000 unique rows written per second https://grafana.wikimedia.org/dashboard/db/mysql-aggregated

– Compared to ~300K total QPS (10-40M rows read/s)

Page 36: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

36

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Lessons learned: users were separate for years

● Users were required to register on each project and language independently

● Diferent users had the same name registered on diferent wikis

● From discussion to universal deployment, it took almost 10 years: https://meta.wikimedia.org/wiki/Help:Unified_login

Page 37: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

37

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Unicode

Page 38: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

38

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Escaping Latin1

● The mission of the Wikimedia Foundation is to provide free content in every language

● That was not possible with Latin1– It only supports Western languages

Page 39: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

39

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Multi-language support was limited back in the day

● How many of you has ever created a database in latin1_swedish_ci ?

● Real UTF-8 support beyond the BMP was added in MySQL 5.5 (utf8mb4)

● Still today, latest collation support is relatively new

Page 40: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

40

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Requirements

● Full support of all available character sets in the world

● Support for fully customizable ordering (e.g. entries within categories), it can be diferent depending on the language

● It has to work with available technology 15 years ago

Page 41: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

41

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 42: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

42

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia solution: character set

● Text-like fields are stored in binary fields– Technically, they are strings with the binary charset

set

● Latest versions of Mediawiki allow utf8mb4, too– It wouldn’t work for Wikimedia sites, collation has

been traditionally very limiting

Page 43: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

43

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

tables.sql

CREATE TABLE /*_*/user ( user_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT, user_name varchar(255) binary NOT NULL default '',

Page 44: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

44

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia solution: collation

● List of ordered articles are avoided

● Whenever custom ordering is needed, an additional, indexed field is used to allow per-table configurable ordering

● Whenever the ordering has to be changed, only row-level changes have to be done, instead of ALTER TABLEs

Page 45: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

45

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

$wgCategoryCollation

● https://www.mediawiki.org/wiki/Manual:$wgCategoryCollationif ( !$dryRun ) { $dbw->update( 'categorylinks', [ 'cl_sortkey' => $newSortKey, 'cl_sortkey_prefix' => $prefix, 'cl_collation' => $collationName, 'cl_type' => $type, 'cl_timestamp = cl_timestamp', ], [ 'cl_from' => $row->cl_from, 'cl_to' => $row->cl_to ], __METHOD__ );}

Page 46: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

46

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Unicode is not the only challenge

Page 47: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

47

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #3: AN ABNORMAL DENORMALIZATION

MySQL Schema design in practice

Page 48: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

48

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Users on Wikimedia projects

● An account is not needed to view the content

● An account is not needed to edit the content (anonymous edits)

● Registered users get some advantages:– Better tools for editing– Persistent configurable preferences

Page 49: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

49

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Content review

● There must be a way to see the edits from the same users (all edits must be publicly review-able)

● There must be a way to block “vandalisms” (misbehaving users, both registered and unregistered)

● In extreme cases, there must be a way to protect content from certain user groups (page protections)

● Only trusted users should be able to destroy content

Page 50: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

50

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

How to represent users?

● Should we use strings or arbitrary numerical ids?– If we use strings, how to rename registered users?– If we use arbitrary ids for registered users, how to

reference non-registered ones?

● How to be able to block returning users, including anonymous ones?

● How to allow anonymous edits on countries with doubtful privacy laws?

Page 51: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

51

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 52: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

52

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia solution

● Registered users have a local user id and a string identifier

● Local wiki accounts are linked to a global unified account (SUL) on “centralauth” database

● Anonymous users are identified by its IPv4 or IPv6 string

● In general, editions store both the id and the user text

Page 53: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

53

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Table revision

CREATE TABLE `revision` ( `rev_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `rev_page` int(10) unsigned NOT NULL, `rev_text_id` int(10) unsigned NOT NULL, `rev_comment` tinyblob NOT NULL, `rev_user` int(10) unsigned NOT NULL DEFAULT '0', `rev_user_text` varbinary(255) NOT NULL DEFAULT '', […]) ENGINE=InnoDB AUTO_INCREMENT=N DEFAULT CHARSET=binary

Page 54: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

54

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual user content (registered user)

mysql> SELECT * FROM revision WHERE rev_id = 742218408\G************************ 1. row ************************ rev_id: 742218408 rev_page: 46812822 rev_text_id: 750242058 rev_comment: Testing revision comment rev_user: 25118340 rev_user_text: JCrespo (WMF) rev_timestamp: 20161002111252 ...

Page 55: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

55

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Actual user content (anonymous user)

mysql> SELECT * FROM revision WHERE rev_id = 742219056\G************************ 1. row ************************ rev_id: 742219056 rev_page: 46812822 rev_text_id: 750242734 rev_comment: As an anonymous user, my public IP will get saved, instead of a username rev_user: 0 rev_user_text: 80.113.15.100 rev_timestamp: 20161002111851 ...

Page 56: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

56

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Pros and cons of the current implementation

● By denormalizing the table, most of the time only this table has to be checked– Only when a user is clicked the user table is accessed– No need to store information for the large amount

of anonymous users with very few edits

● User renames are painful database-wise, and almost impossible for users with huge amount of edits (like bots)

Page 57: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

57

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #4: KEY-VALUE SYSTEMMySQL Schema design in practice

Page 58: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

58

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Content storage● In the typical mediawiki installation content is

referenced this way:– Pages can have several revisions, by default the last

one is parsed and displayed– Revisions point to a text row– Text contains wikitext that has to be parsed

“rendered” and sent to the user

Page 59: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

59

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia sites needs to return hundreds thousands of pages per second● The size of the content (wikitext) is multiple times that of the

metadata

● The database growth is also diferent from the metadata, and very diferent for each wiki

● Should we setup a separate key-value system to store those edits? What should we seek?– Compression

– Automatic sharding

– Automatic failover

– JSON support for flexible datatypes

Page 60: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

60

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 61: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

61

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia solution● Each page can have a diferent content model/storage

● For example:– Regular Wikitext pages– User-editable JS/CSS/application messages– Forum threads (Flow feature, etc.)– Any other created by new extensions

Page 62: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

62

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikitext storage● The “text” table only contains pointers to content, not real content

mysql> SELECT * FROM text ...;+---------+-----------------------+---------------------+| old_id | old_text | old_flags |+---------+-----------------------+---------------------+| 1 | #REDIRECT [[Town of 1770]] | utf-8 || 2 | #REDIRECT [[Project:One-liner listings]] | utf-8 |...| 3027206 | DB://cluster24/545108 | utf-8,gzip,external || 3027205 | DB://cluster25/544444 | utf-8,gzip,external |+---------+-----------------------+---------------------+

Page 63: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

63

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

External Storage● Several shards of MySQL servers can serve content for every

wiki

● The text is compressed and decompressed in gzip format at application level

● Smart “compressing” can be done:– Reviews with the same content for the same page are

deduplicated– Older revisions are stored only using difs

Page 64: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

64

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

External Storage Cluster

Page 65: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

65

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Solid performance over HDsmysql> SELECTdigest, digest_text, sum(COUNT_STAR), sum(SUM_TIMER_WAIT)/SUM(COUNT_STAR)/1000000 as microseconds,min(first_seen), max(last_seen)FROM events_statements_summary_by_digest GROUP BY DIGEST ORDER BY sum(COUNT_STAR) DESC LIMIT 1\G*************************** 1. row *************************** digest: 1bf861e8cd3ea6bcac323bdf9caf4876 digest_text: SELECT `blob_text` FROM `blobs_cluster25` WHERE `blob_id` = ? LIMIT ? sum(COUNT_STAR): 2356927126 microseconds: 3384.60227946min(first_seen): 2015-12-09 15:00:45 max(last_seen): 2016-10-02 11:41:01

Page 66: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

66

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Speed optimizations● External storage is only the canonical place where data is stored

– Several layers of caching makes them low-traffic, disk-based storage

● Parsercache in memory (memcached) and disk (parsercache mysqls) avoids frequent usage– Local memcache is fast– Parsercache is shared even between datacenters, and

survives restarts

Page 67: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

67

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #5: REVISIONS AND DELETIONS

MySQL Schema design in practice

Page 68: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

68

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Deletions● Pages can be deleted, with several degrees:

– A new revision could just override a previous version of the page– A deletion that could be restored afterwards– Personal information or copyright that must be hidden from

everyone● In some cases, only some revisions should be deleted, not the whole

page- e.g. someone editing an otherwise legitimate page with someone’s private data

Page 69: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

69

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

How to implement deletions and restores?

● Should we delete content rows when doing hard-deletes or just overwrite the them with garbage?

● Should we move the rows to an archive table or should we mark them with deleted=1?

Page 70: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

70

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 71: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

71

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia implementation

● Individual revisions can be hidden (“suppresed”), no matter the page status

● Normal procedure is that full pages have to be deleted:– In that case, revisions are moved to the archive table– The page entry is deleted

● On restore, revisions are moved back to the revision table– A new page, with a new page_id is created– Not all revisions have to be restored necessarily

Page 72: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

72

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Page 73: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

73

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Deleting a page Code /** * Back-end article deletion * Deletes the article with database consistency, writes logs, purges caches … */ public function doDeleteArticleReal( $reason, $suppress = false, $u1 = null, $u2 = null, &$error = '', User $user = null, $tags = [] ) { $dbw = wfGetDB( DB_MASTER ); $res = $dbw->select( 'revision', array_merge( $fields, $deletionFields ), [ 'rev_page' => $id ], __METHOD__, 'FOR UPDATE' ); $dbw->insert( 'archive', $rowsInsert, __METHOD__ ); // Now that it's safely backed up, delete it $dbw->delete( 'page', [ 'page_id' => $id ], __METHOD__ ); $dbw->delete( 'revision', [ 'rev_page' => $id ], __METHOD__ );

// Log the deletion, if the page was suppressed, put it in the suppression log instead $logEntry = new ManualLogEntry( $logtype, 'delete' );

INSERT … SELECTs are avoided on HEAD

Page 74: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

74

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Lessons learned

● INSERT … SELECTs are painful both for performance and/or consistency reasons– They created actual issues when combined with

filtering or on pages with many revisions– New implementation avoids them

● Moving rows between tables is a terrible idea– Specifically, our implementation makes almost

impossible to track the history of a text

Page 75: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

75

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #6: A LARGE TABLEMySQL Schema design in practice

Page 76: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

76

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Revision table on enwiki

MariaDB MARIADB s1-master enwiki > SHOW CREATE TABLE revision;CREATE TABLE `revision` ( `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT,…) ENGINE=InnoDB AUTO_INCREMENT=741078657 DEFAULT CHARSET=binary

MariaDB MARIADB s1-master enwiki > SHOW TABLE STATUS like 'revision'\G*************************** 1. row *************************** Name: revision Engine: InnoDB Version: 10 Row_format: Compact Rows: 614837124 Avg_row_length: 153 Data_length: 94510252032Max_data_length: 0 Index_length: 90206896128

Page 77: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

77

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Point SELECTs are fast

MariaDB PRODUCTION s1 localhost performance_schema > SELECT * FROM events_statements_summary_by_digest ORDER BY count_star DESC LIMIT 1\G*************************** 1. row *************************** SCHEMA_NAME: enwiki DIGEST: ed3c3539910af27e0f7e4ea442db9124 DIGEST_TEXT: SELECT `page_id` , `page_len` , `page_is_redirect` , `page_latest` , `page_content_model` FROM `page` WHERE `page_namespace` = ? AND `page_title` = ? LIMIT ? COUNT_STAR: 37247176536 SUM_TIMER_WAIT: 6633094681562378000 MIN_TIMER_WAIT: 43209000 AVG_TIMER_WAIT: 178083000 MAX_TIMER_WAIT: 258818851000 SUM_LOCK_TIME: 1867944737116000000...1 row in set (0.06 sec)

+---------------------------------+| sys.format_time(AVG_TIMER_WAIT) |+---------------------------------+| 178.08 us |+---------------------------------+

Page 78: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

78

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Ranges are slow

Host User Schema Client Source Thread Transaction Runtime Stamp

db1066 wikiuser enwiki mw1193 - 42910008229 133321827778 318s 2016-09-25 15:31:15

SELECT /* ApiQueryContributors::execute */ rev_page AS `page`, rev_user AS `user`, MAX(rev_user_text) AS `username` FROM `revision` WHERE rev_page = '6768170' AND (rev_user != 0) AND ((rev_deleted & 4) = 0) GROUP BY rev_page, rev_user ORDER BY

rev_user LIMIT 501

Page 79: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

79

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

revision is not the only tall tableMariaDB PRODUCTION s1 localhost enwiki > SHOW CREATE TABLE logging\G*************************** 1. row *************************** Table: loggingCreate Table: CREATE TABLE `logging` ( `log_id` int(10) unsigned NOT NULL AUTO_INCREMENT,...) ENGINE=InnoDB AUTO_INCREMENT=77577756 DEFAULT CHARSET=binary1 row in set (0.00 sec)

MariaDB PRODUCTION s1 localhost enwiki > SHOW TABLE STATUS like 'logging'\G*************************** 1. row *************************** Name: logging Engine: InnoDB Version: 10 Row_format: Compact Rows: 72871150 Avg_row_length: 164 Data_length: 11963203584Max_data_length: 0 Index_length: 36273225728

Page 80: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

80

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 81: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

81

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

recentchanges table

-- Primarily a summary table for Special:Recentchanges,-- this table contains some additional info on edits from-- the last few days, see Article::editUpdates()–CREATE TABLE /*_*/recentchanges ( rc_id int NOT NULL PRIMARY KEY AUTO_INCREMENT, rc_timestamp varbinary(14) NOT NULL default '',

-- As in revision rc_user int unsigned NOT NULL default 0, rc_user_text varchar(255) binary NOT NULL,

More on: https://phabricator.wikimedia.org/difusion/MW/browse/master/maintenance/tables.sql;bc05426ae2708f8ac23b9106911fe35b5c51fd30$1057

Page 82: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

82

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

recentchanges usage

● Recentchanges is mainly a revision summary table– Most reviews are only of recent edits -entries are

purged after 30 days:MariaDB MARIADB s1-master enwiki > SELECT now(), min(rc_timestamp) from recentchanges;+---------------------+-------------------+| now() | min(rc_timestamp) |+---------------------+-------------------+| 2016-09-25 09:04:35 | 20160826090422 |+---------------------+-------------------+1 row in set (0.00 sec)

● It is updated synchronously with the edits

● It also contains additional fields / related tables like tags

Page 83: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

83

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Special slaves && partitioning

● Mediawiki allows to define instance groups for certain queries https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php'db1051' => 50, # 2.8TB 96GB, watchlist, recentchanges, contributions, logpager'db1055' => 50, # 2.8TB 96GB, watchlist, recentchanges, contributions, logpager

● Querying contributions uses a separate group contributions: https://phabricator.wikimedia.org/difusion/MW/browse/master/includes/specials/pagers/ContribsPager.php;bc05426ae2708f8ac23b9106911fe35b5c51fd30$73

// Most of this code will use the 'contributions' group DB, which can map to replica Dbs// with extra user based indexes or partioning by user. The additional metadata// queries should use a regular replica DB since the lookup pattern is not all by user.$this->mDbSecondary = wfGetDB( DB_REPLICA ); // any random replica DB$this->mDb = wfGetDB( DB_REPLICA, 'contributions' );

Page 84: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

84

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

enwiki tables have special partitioning

ALTER TABLE enwiki.revision DROP PRIMARY KEY, DROP INDEX rev_id, ADD PRIMARY KEY (rev_id, rev_user) PARTITION BY RANGE (rev_user) ( PARTITION p1 VALUES LESS THAN (1), PARTITION p50000 VALUES LESS THAN (50000), PARTITION p100000 VALUES LESS THAN (100000), PARTITION p200000 VALUES LESS THAN (200000), PARTITION p300000 VALUES LESS THAN (300000), PARTITION p400000 VALUES LESS THAN (400000), PARTITION p500000 VALUES LESS THAN (500000), PARTITION p750000 VALUES LESS THAN (750000),…

More on: https://phabricator.wikimedia.org/diffusion/OSOF/browse/master/dbtools/s1-pager.sql

Page 85: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

85

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

The solution has problems

● It is a server-side hack, with poor to no awareness on server side– Mediawiki has to support MySQL 5.0, which has no

patitioning support

● It is an enwiki-only patch; which is a bad idea in general

● Special slaves are a threat to High Availability

Page 86: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

86

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #7: WHAT LINKS HEREMySQL Schema design in practice

Page 87: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

87

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Categories, templates, images and special pages

● There is a lot of information that needs to be updated when a new edits is done:– The category must include the new page, on its right

position– If it is a template or an image, all pages that include it

have to change– The “What links here” has to reflect the new links– If a new page is created, links to it have to go from red

to blue color

Page 88: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

88

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Latency on edit has to be low

● Users understand that and “edit” will be slower than loading a page, but it still cannot take more than 0.5-1 second tops: https://grafana.wikimedia.org/dashboard/db/save-timing

● But we just said that an edit may require an update on millions of others!

● How to implement all the previous changes?

Page 89: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

89

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 90: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

90

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Wikimedia solution

● Only immediate tasks are done synchronously, transactionally:– Wikitext content is saved on External Storage– The recentchanges table adds a new entry for

reviewing– The user edit count is increased in some wikis– The page is parsed for display to the user

Page 91: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

91

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Background jobs

● Many tasks are enqueued on a Redis job queue

● Most of those tasks are cached/denormalized on database tables for easy joining:– Add the page to the proper category (categorylinks)– Add the page to the proper links (pagelinks)– Add the page to the list of templates used (templatelinks)– Many others such as refreshing the list of titles and its

index on elasticsearch

Page 92: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

92

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

doRefreshLinks() Job

function run() { // Job to update all (or a range of) backlink pages for a page $this->runForTitle( $this->title );}

protected function runForTitle( Title $title ) { $revision = Revision::newFromTitle( $title, false, Revision::READ_LATEST ); $parserOutput = $content->getParserOutput( $title, $revision->$title, $revision->$title, $revision getId(), $parserOptions, false );→ $updates = $content->getSecondaryDataUpdates($title, null, !empty( $this, $revision->$title, $revision->getId(), $parserOptions, false ); $updates = $content getSecondaryDataUpdates(→ $title, null, !empty( $this->params['useRecursiveLinksUpdate'] ), $parserOutput ); InfoAction::invalidateCache( $title );

Page 93: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

93

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

*links tables

● They store the page_ids where the resource is used but the namespace and the title they include, not the ids

● This is because pages reference a title, not an entity (e.g. you can include the template {{stub}}, or link to a page that can or not exist

● Some of those tables can grow a lot, and store very redundant data (there could be millions of rows with the template {{cc-by-sa-2.5}} on Wikimedia Commons

Page 94: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

94

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Example: PageLinks Table

SHOW CREATE TABLE pagelinks\G*************************** 1. row *************************** Table: pagelinksCreate Table: CREATE TABLE `pagelinks` ( `pl_from` int(8) unsigned NOT NULL DEFAULT '0', `pl_namespace` int(11) NOT NULL DEFAULT '0', `pl_title` varbinary(255) NOT NULL DEFAULT '', `pl_from_namespace` int(11) NOT NULL DEFAULT '0', UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`), KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`), KEY `pl_backlinks_namespace` (`pl_from_namespace`,`pl_namespace`,`pl_title`,`pl_from`)) ENGINE=InnoDB DEFAULT CHARSET=binary

Page 95: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

95

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Future

● A lot of space could be saved by normalizing/deduplicating the title text:mysql> SHOW TABLE STATUS like '%links';+--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length |+--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+| categorylinks | InnoDB | 10 | Compact | 106308892 | 210 | 22379020288 | 0 | 30838489088 || externallinks | InnoDB | 10 | Compact | 91491185 | 279 | 25571622912 | 0 | 40650997760 || imagelinks | InnoDB | 10 | Compact | 81270504 | 83 | 6765756416 | 0 | 9644818432 || iwlinks | InnoDB | 10 | Compact | 16909955 | 95 | 1622573056 | 0 | 2367488000 || langlinks | InnoDB | 10 | Compact | 26155404 | 74 | 1941733376 | 0 | 1434042368 || msg_resource_links | InnoDB | 10 | Compact | 3524 | 130 | 458752 | 0 | 0 || pagelinks | InnoDB | 10 | Compact | 1192043172 | 74 | 89018269696 | 0 | 122491224064 || templatelinks | InnoDB | 10 | Compact | 616611334 | 80 | 49860050944 | 0 | 66098577408 |+--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+

● That would make title a first-class entity, diferent from the page entity

Page 96: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

96

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #8: ANECDOTES: THE GHOST TABLES AND TIMESTAMPS

MySQL Schema design in practice

Page 97: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

97

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Story time

● A recently pooled slave broke its replication with:Error 'Table 'idwiki.hitcounter' doesn't exist' on query. Default database: 'idwiki'. Query: 'DELETE FROM `idwiki`.`hitcounter`'

● The table was indeed non-existent, but it was deprecated and was unused by mediawiki

● Hackers? Replication bug? Someone doing maintenance out-of-band?

Page 98: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

98

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 99: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

99

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Architecture

● At the moment, Wikimedia servers mostly use STATEMENT-based replication due to several application dependencies

● A Master-Master active/passive replication was being used among datacenters

● Replication was temporarily routed through a slave to deploy new TLS certificates

● The middle slave was rebooted

Page 100: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

100

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

hitcounter table was using the MEMORY engine

● On server restart, because the server to avoid replication issues, a DELETE command is sent

● The DELETE got replicated towards the remote master, back to the primary master, and finally to the new slave– This broke replication because the new slave didn’t

have the obsolete table, despite not writing to it

● Remember to clean up your tables on all servers to avoid issues like this!

Page 101: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

101

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Timestamps and MySQL < 4.1

● There where automatically updated on INSERT and UPDATE with no possibility of disabling that

● No strict mode disallowing zero dates

● Diferent databases and standards support needed: https://www.mediawiki.org/wiki/Manual:WfTimestamp#Formats

Page 102: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

102

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

How to implement timestamps?

● They must store UTC times, controlled at application side

● They must be strictly sortable to be used on listings

● They must work on mysql 4.0

● They must work similarly for other database backends

Page 103: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

103

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 104: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

104

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Mediawiki solution

● Timestamps are stored as binary(14) https://www.mediawiki.org/wiki/Manual:Timestamp

● For backwards and join compatibility, that is still the preferred format

● TIMESTAMP (4 bytes) now has all required features and would be much more compact, but converting millions of records is not worth the efort right now

Page 105: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

105

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

tables.sql

-- The MySQL table backend for MediaWiki currently uses-- 14-character BINARY or VARBINARY fields to store timestamps.-- The format is YYYYMMDDHHMMSS, which is derived from the-- text format of MySQL's TIMESTAMP fields.–-- Historically TIMESTAMP fields were used, but abandoned-- in early 2002 after a lot of trouble with the fields-- auto-updating.–-- The Postgres backend uses TIMESTAMPTZ fields for timestamps,-- and we will migrate the MySQL definitions at some point as-- well.CREATE TABLE /*_*/logging ( -- Log ID, for referring to this specific log entry, probably for deletion and such. log_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT, -- Symbolic keys for the general log type and the action type -- within the log. The output format will be controlled by the -- action field, but only the type controls categorization. log_type varbinary(32) NOT NULL default '', log_action varbinary(32) NOT NULL default '',

-- Timestamp. Duh. log_timestamp binary(14) NOT NULL default '19700101000000',

Page 106: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

106

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

CASE #9: SLOTSMySQL Schema design in practice

Page 107: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

107

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Structured data for Wikipedia

● Wikitext is powerful but complex– It is not easy to do changes such as categories,

infoboxes (templates)– It is also not easy to ofer them in a computer-

readable way

● Tools like wikidata or image metadata require an easier way to integrate structured data

Page 108: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

108

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

How to implement Multi-Content Revisions?

● A page may combine multiple sources (Image & Image metadata, wikitext and discussions, etc.)

● A new revision could be created by editing regular text or some of the structured data

● They types of structured data may vary and could be added later with new extensions

Page 109: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

109

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Brainstorming time

Page 110: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

110

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Current Multi-Content Revision proposal

● page and revision will work as usual (except a few of its fields will be made redundant)

● 2 new tables: content and content_revision will be added

● content initially will have the metadata for wikitext

● Other types of content can be referenced by revision thought content_revision– A revision now can handle multiple contents

Page 111: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

111

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

The idea is to multiplex revision

● https://phabricator.wikimedia.org/T107595

● We go from the straight forward– page -> revision -> text ( -> external store )

● To a more indirect model:– page -> ( revision -> ) slots -> content -> ( text | external store )

Page 112: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

112

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

The proposal is not issue-free

● Revision table is already very large for non-point SELECTs– How to handle ranges when content will be an even taller

table (several content types per revision)– Also previously independent tables will now be

integrated there (image_revisions, etc.)

● Maybe a diferent, multi-table implementation for the polymorphic association would be preferred?– Normalization and easiness to code vs. performance on

implementation for large wikis

Page 113: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

113

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

FINAL REMARKSMySQL Schema design in practice

Page 114: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

114

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

“Do not let reality spoil your perfect on-paper design”

● Not always the most elegant design is the most appropriate– Reliability is the enemy of performance: simple, fast,

safe – chose two– Technology available now does not normally cover

100% of the use cases; chose the one that cover 99%, implement the other 1%

Page 115: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

115

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Where to know more about design

● Learn how other are doing it:– Uber:

● https://eng.uber.com/schemaless-part-one/● https://www.percona.com/live/plam16/sessions/relational-databases-uber-mysql-postgres

– Facebook:● https://www.percona.com/live/plam16/sessions/massive-schema-changes-facebook● https://www.facebook.com/MySQLatFacebook/

– Youtube: https://www.percona.com/live/plam16/sessions/launching-vitess-how-run-youtubes-mysql-sharding-engine

– Pinterest: https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet

● Chose the right literature:– https://pragprog.com/book/bksqla/sql-antipatterns

Page 116: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

116

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Where to know more about Mediawiki / MySQL@Wikipedia

● MySQL at Wikipedia Introduction:https://www.mediawiki.org/wiki/File:MySQL_at_Wikipedia.pdf

● Mediawiki source code and documentation: https://www.mediawiki.org/wiki/MediaWiki

● Wikitech (technical documentation):https://wikitech.wikimedia.org/

● Operations/puppet (infrastructure) git repository: https://phabricator.wikimedia.org/difusion/OPUP/

Page 117: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

117

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Was this session helpful?

● Consider supporting the Wikimedia Foundation!– The Wikimedia Foundation, Inc. is a nonprofit charitable organization

dedicated to encouraging the growth, development and distribution of free, multilingual, educational content, and to providing the full content of these wiki-based projects to the public free of charge.

– https://wikimediafoundation.org

● You can contribute with:– Your code (including infrastructure!):

https://phabricator.wikimedia.org/difusion/– Your time: https://en.wikipedia.org/wiki/Help:Editing– Your money: https://donate.wikimedia.org/

Page 118: MySQL Schema Design in Practice - Percona...MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. . License: CC-BY-SA-3.0 MySQL Schema Design in Practice

118

MySQL Schema design in practice

© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0

Thank You! Remember to rate my session