1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

Post on 31-Mar-2015

226 views 0 download

Transcript of 1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

1

NAAM

Oracle Character setsAino Andriessen

2

Demo1

4

nls_length_semantics

Intializatie parameterCHAR of BYTE (default)Van toepassing op multi byte character setsDefinieert het type voor de lengte van character

kolommen en variabelenalter session set nls_length_semantics=CHAR;

niet met terugwerkende kracht ev pl/sql recompile alter system

5

nls_length_semantics 2

lengte van karakter kolommen en variabelen expliciet opgeven create table demo (naam varchar2(4 char)) create table demo (naam varchar2(4 byte))

t_naam varchar2(4 char); t_naam demo2.naam%TYPE

6

Demo2

8

Character encoding

9

Character set

Character set definieert de 'mapping' tussen binary/headecimale code en het character UTF8 WE8MSWIN1252 WE8ISO8859P1 JA16EUC US7ASCII WE8DEC ...

Code pages IBM / windows terminologie ~ analoog met character set code page per language

10

Character sets 2

ASCII 1 byte 128 karakters standaard letters uit het engels zonder accenten

ISO 8859 en latin-1 1 byte (8 bit) 256 karakters

CP-1252 Windows variant op latin 1

UTF8 variabel, multibyte max 4 bytes ~100000 karakters

• ~1 miljoen beschikbaar meertalig ascii codes zijn gelijk

11

Voorbeelden

Character Set Hexadecimale code - Euro

AL32UTF8 E282AC

WE8MSWIN1252 80

ASCII -

WE8ISO8859P1 -

WE8ISO8859P15 164

Character Set Hexadecimale code - é

AL32UTF8 C3A9 (50089)

WE8MSWIN1252 E9 (233)

ASCII -

WE8ISO8859P1 E9

WE8ISO8859P15 E9

12

Unicode / UTF 8 example

The image shows the number of bytes needed to store different kinds of characters in the UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes. The supplementary character (treble clef sign) requires 4 bytes of storage.

13

Diakrieten en speciale tekens

Diakrieten zijn accenten die bij (boven, onder of zelfs door) een letter gezet worden om de uitspraak van een letter te veranderen en daarmee taaleigen klanken van een (gewijzigde) letter te voorzien. àÿęňĜş etc.

Speciale tekens ßæ¿

14

Diakrieten en speciale tekens

Single byte character sets 1 byte voor samengesteld karakter Niet alle combinaties mogelijk code pages

UTF-8 diakriet heeft eigen codering samengesteld karakter heeft eigen codering

• meestal (altijd) samenstelling van oorspronkelijke karakter + diakriet

15

Database functies

Character functies substr - substrb - substrc - substr2 instr - ... length - lengthb

chr (n) Returns a character corresponding to the number passed in as the argument in the

database character set select chr (50089) from dual;

dump Returns a VARCHAR2 value containing the datatype code, length in bytes, and internal

representation of expr. The returned result is always in the database character set. select dump (naam, 1017) from demo2;

convert Converts a character string from one character set to another

utl_raw select utl_raw.cast_to_raw(naam) from demo2;

unistr() Converts the characters in x to the national language character set select (unistr('Ren\00e9')) from dual;

16

Demo3

18

nls_lang

Client character setWhen the client NLS_LANG character set is set to

the same value as the database character set, Oracle assumes that the data being sent or received are of the same (correct) encoding, so no conversions or validations may occur for performance reasons. The data is just stored as delivered by the client, bit by bit.

19

nls lang 2

language_country.character set american_america.UTF8 dutch_the netherlands.WE8MSWIN1252 american_THE NETHERLANDS.WE8MSWIN1252

Environment variable, nls_lang

Verschil in Windows GUI (WE8MSWIN1252) en command line (WE8PC850)

Wordt niet door Java clients gebruikt

20

Demo4

22

National character set

Support for another character set next to the database character set

e.g to allow japanese in a MSWIN1252 or ISO8859 character set

Less necessary in a UTF8 database

Multibytenvarchar, nclob etc.

23

Case

TELETEX karakterset bestaat niet meer in Oracle

select convert(naam,’TELETEX’,’UTF8’) from tabel;

Locale builder

25

sql> select name from emp

sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw (name)) from emp@db

sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw@db (name)) from emp@db

sql> select name from emp@db

26

Vraag

Diacrietloos zoeken

Case insensitive zoeken

27

Summary

nls_lenght_semanticsAlways explicitly define a character column with its

type (CHAR or BYTE)Oracle performs automatic character set

conversion wysinawyg

Use a Java clientWorking with character sets can be confusing

UTF8 is often the preferred character set

28

Referenties

Unicode en Ultraedit http://www.ultraedit.com/support/tutorials_power_tips/

ultraedit/unicode.html

nls_lang http://www.oracle.com/technology/tech/globalization/

htdocs/nls_lang%20faq.htm

Oracle globalization support http://download.oracle.com/docs/cd/B28359_01/

server.111/b28298/toc.htm

Wikipedia