1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

23
1 NAAM Oracle Character sets Aino Andriessen

Transcript of 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

Page 1: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

1

NAAM

Oracle Character setsAino Andriessen

Page 2: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

2

Demo1

Page 3: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

4

nls_length_semantics

Intializatie parameterCHAR of BYTE (default)Van toepassing op multi byte character setsDefinieert het type voor de lengte van character

kolommen en variabelenalter session set nls_length_semantics=CHAR;

niet met terugwerkende kracht ev pl/sql recompile alter system

Page 4: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

5

nls_length_semantics 2

lengte van karakter kolommen en variabelen expliciet opgeven create table demo (naam varchar2(4 char)) create table demo (naam varchar2(4 byte))

t_naam varchar2(4 char); t_naam demo2.naam%TYPE

Page 5: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

6

Demo2

Page 6: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

8

Character encoding

Page 7: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

9

Character set

Character set definieert de 'mapping' tussen binary/headecimale code en het character UTF8 WE8MSWIN1252 WE8ISO8859P1 JA16EUC US7ASCII WE8DEC ...

Code pages IBM / windows terminologie ~ analoog met character set code page per language

Page 8: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

10

Character sets 2

ASCII 1 byte 128 karakters standaard letters uit het engels zonder accenten

ISO 8859 en latin-1 1 byte (8 bit) 256 karakters

CP-1252 Windows variant op latin 1

UTF8 variabel, multibyte max 4 bytes ~100000 karakters

• ~1 miljoen beschikbaar meertalig ascii codes zijn gelijk

Page 9: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

11

Voorbeelden

Character Set Hexadecimale code - Euro

AL32UTF8 E282AC

WE8MSWIN1252 80

ASCII -

WE8ISO8859P1 -

WE8ISO8859P15 164

Character Set Hexadecimale code - é

AL32UTF8 C3A9 (50089)

WE8MSWIN1252 E9 (233)

ASCII -

WE8ISO8859P1 E9

WE8ISO8859P15 E9

Page 10: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

12

Unicode / UTF 8 example

The image shows the number of bytes needed to store different kinds of characters in the UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes. The supplementary character (treble clef sign) requires 4 bytes of storage.

Page 11: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

13

Diakrieten en speciale tekens

Diakrieten zijn accenten die bij (boven, onder of zelfs door) een letter gezet worden om de uitspraak van een letter te veranderen en daarmee taaleigen klanken van een (gewijzigde) letter te voorzien. àÿęňĜş etc.

Speciale tekens ßæ¿

Page 12: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

14

Diakrieten en speciale tekens

Single byte character sets 1 byte voor samengesteld karakter Niet alle combinaties mogelijk code pages

UTF-8 diakriet heeft eigen codering samengesteld karakter heeft eigen codering

• meestal (altijd) samenstelling van oorspronkelijke karakter + diakriet

Page 13: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

15

Database functies

Character functies substr - substrb - substrc - substr2 instr - ... length - lengthb

chr (n) Returns a character corresponding to the number passed in as the argument in the

database character set select chr (50089) from dual;

dump Returns a VARCHAR2 value containing the datatype code, length in bytes, and internal

representation of expr. The returned result is always in the database character set. select dump (naam, 1017) from demo2;

convert Converts a character string from one character set to another

utl_raw select utl_raw.cast_to_raw(naam) from demo2;

unistr() Converts the characters in x to the national language character set select (unistr('Ren\00e9')) from dual;

Page 14: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

16

Demo3

Page 15: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

18

nls_lang

Client character setWhen the client NLS_LANG character set is set to

the same value as the database character set, Oracle assumes that the data being sent or received are of the same (correct) encoding, so no conversions or validations may occur for performance reasons. The data is just stored as delivered by the client, bit by bit.

Page 16: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

19

nls lang 2

language_country.character set american_america.UTF8 dutch_the netherlands.WE8MSWIN1252 american_THE NETHERLANDS.WE8MSWIN1252

Environment variable, nls_lang

Verschil in Windows GUI (WE8MSWIN1252) en command line (WE8PC850)

Wordt niet door Java clients gebruikt

Page 17: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

20

Demo4

Page 18: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

22

National character set

Support for another character set next to the database character set

e.g to allow japanese in a MSWIN1252 or ISO8859 character set

Less necessary in a UTF8 database

Multibytenvarchar, nclob etc.

Page 19: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

23

Case

TELETEX karakterset bestaat niet meer in Oracle

select convert(naam,’TELETEX’,’UTF8’) from tabel;

Locale builder

Page 20: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

25

sql> select name from emp

sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw (name)) from emp@db

sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw@db (name)) from emp@db

sql> select name from emp@db

Page 21: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

26

Vraag

Diacrietloos zoeken

Case insensitive zoeken

Page 22: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

27

Summary

nls_lenght_semanticsAlways explicitly define a character column with its

type (CHAR or BYTE)Oracle performs automatic character set

conversion wysinawyg

Use a Java clientWorking with character sets can be confusing

UTF8 is often the preferred character set

Page 23: 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

28

Referenties

Unicode en Ultraedit http://www.ultraedit.com/support/tutorials_power_tips/

ultraedit/unicode.html

nls_lang http://www.oracle.com/technology/tech/globalization/

htdocs/nls_lang%20faq.htm

Oracle globalization support http://download.oracle.com/docs/cd/B28359_01/

server.111/b28298/toc.htm

Wikipedia