_____ _____ Diacritical marks

Global Sourcebook for International Data Management

_____

____

This resource costs about € 500 per month to host and maintain. Help me to keep it updated and free for all by donating as little as € 1 at Ko-Fi here. Alternatively, use one of these links to make purchases at Amazon.com or Amazon.co.uk, for which I receive a small remuneration at no cost to yourself. Thank you
____

Diacritical marks

Global Sourcebook | Index | Properties

Tips for practical management of data in different languages External

Whether you hold your address database information in DOS, Windows™ or on another platform; whether your data is for a single country or for many; you will be certain to be faced with the challenge of storing diacritical marks (accents).

Most of the World’s languages (including English) contain one or more diacritical marks. These marks indicate that the letter has a different phonetic value or is stressed differently to the same letter without the mark. Though the marks may appear strange (and possibly irrelevant) to non-speakers of the language concerned, they are required to ensure that the text remains accurate and logical.

The first computer operating systems used a set of character codes called ASCII (American Standard Code for Information Interchange). These are reproduced by holding down the ALT key and typing a 1-, 2- or 3-number code on the key pad. Limited at that time by computer hard- and software to 256 characters, in no way can this system represent each character used in the World’s more than 6 000 languages. To overcome this limitation, a number of code pages were created - each of which were supposed to contain sufficient letters with diacritical marks to represent the alphabets of at least one language.

The most common code page used by English-speakers is 437. This contains the following letters with diacritical marks:

   á, â, Ä, ä, à, Å, å, Æ, æ, Ç, ç, ê, ë, è, É, é, ï, î, ì, í, ñ, Ñ, ô, Ö, ö, ò, ó, ß, Ü, ü, û, ù, ú, ÿ

This is sufficient for a number of Western-European and Scandinavian languages, but there are serious omissions. A number of languages in the same group, such as Icelandic, are not represented. Certain letters in Scandinavian languages, such as the "Ø" in Danish and Norwegian are missing. Even for French, the letter “Œ” cannot be reproduced. Equally, a number of upper-case equivalents of lower-case letters with diacritical marks cannot be reproduced using this code page.

Code page 850 supplements code page 437 with the following diacritical marks:

   Á, Â, À, ã, Ã, ð, Ð, Ê, Ë, È, Í, Ï,  Ì, Ó, Ô, Ò, õ, Õ, Þ, þ, Ú, Û, Ù, ý, Ý.

Further code pages are 852 (Slavic languages/Hungarian); 860 (Portuguese); and 865 (Scandinavian).

Whilst this system allows for the most part for the storage of data for a single country or language area, problems arise when trying to store data from more than one country or language area. Whilst using code page 850, for example, will allow the storage of (most) addresses for Western Europe, it cannot allow the correct storage of addresses from Eastern Europe, even for those languages written in the Latin script. In Hungarian, for example, code page 850 misses ő, Ő,ű and Ű. For Czech, the following letters with diacritical marks are missing: Č, č, ď, Ď, Ě, ě, Í, Ň, ň, ř, Ř, Š, ś, ť, Ť, Ů, ů, ý, Ý, ž, and Ž. Clearly, no accurate representation of the language can succeed without these characters.

On the DOS/Windows^TM platform, a character can look different on different code pages but retain the same code value. Of the Icelandic street type HLÍÐ, for example, only the HL will appear correctly when represented using code page 437, though the ASCII codes of its constituent letters remain the same. Changing the code page to 850 will allow this word to be represented correctly.

Changing the code page is done differently under different operating platforms - refer to your documentation. In DOS/Windows™ systems, the code page (for screen output) is set in the autoexec.bat file by adding the lines:

   MODE CON CODEPAGE PREPARE=((nnn) [drive path filename])
   MODE CON CODEPAGE SELECT=nnn

where nnn is the code page number and drive path filename is the file containing the code page information (having the extension .CPI). For example, for code page 850, the lines might read:

   MODE CON CODEPAGE PREPARE=((850) C:\WINDOWS\COMMAND\EGA.CPI)
   MODE CON CODEPAGE SELECT=850

To check which code page is active, type

   MODE

at the DOS prompt.

ANSI codes (American National Standard Institute) are used with systems such as Microsoft Windows™. Each ANSI page allows the use of fonts which in turn allow for the reproduction of 224 different characters. This allows the reproduction of most of the characters required for the World’s languages in such systems as word-processing programs by changing the font within the document. There are, however, few database programs which allow fonts to be changed within fields or within tables. These programs take as default the ANSI page defined by the underlying Windows™ operating system.

Whilst ANSI allows the output of letters with diacritical marks (assuming a compliant printer and font type), the large problem of data transfer between platforms and software remains. Transferring data between one ASCII page and another; one ANSI page and another; or between ASCII and ANSI pages will necessarily change the appearance (and in the last case the value) of the letter that you are trying to reproduce. This can cause immense damage to the data and make correct storage and output impossible.

An alternative methodology for storing character values is The Unicode™ Standard. This avoids the limitations of hardware, platform and software by not attempting to represent a diacritic using a single character but instead using a four-character code (composed of standard alpha-numeric characters with the same values regardless of code page) to indicate which diacritic mark is to be represented. This code in turn is usually enclosed between other symbols (e.g. <00C1>) to distinguish from the rest of the text. This code system is increasingly being included in new operating systems and software. Here are a number of Unicode™ values:

Á	00C1
á	00E1
Ã	00C3
É	00C9
Ñ	00D1
Ò	00D2
Ð	00F8

Unfortunately, I can only draw your attention to the issue of diacritical marks without providing a solution for reproducing these marks for all languages within a single database system. To the best of my knowledge, an all encompassing solution does not yet exist. Aim, however, to use operating systems and software which have Unicode™ support so that you will in the near future be able to handle diacritical marks better than is currently possible.

Accents

The tables below cover a large number of European languages and some other languages written in Latin script.

The tables show the diacritical marks contained in the language concerned, the alternative ‘non-ASCII’ equivalent where available (i.e. an alternative without diacritical marks), the ASCII code of the accented letter (if available) in code pages 437 and 850, and the Unicode™ value.

Whilst it is not always possible to type the correctly accented letter using ASCII, accented letters can be output on most printers using alternative ASCII codes. Thus, for example, typing the code to produce a + symbol would, by using an alternative font set, be output as an accented character. To do this, however, means that you have to maintain a consistent platform for your output devices and you will have to use a different printing set for each language to be printed as all accents cannot be covered in a single set.

NB: Alphabetization- in the alphabetization lists, letters between brackets immediately following another letter indicate that this letter is included in the sort for the previous letter. I.e., it is given the same value as the preceding letter for sorting purposes. Those between brackets on their own are letters borrowed from other languages and are sorted, when found, in the position shown. When two “letters” are shown in upper case, these letters form a single letter in the language concerned. The alphabetization tables include only upper case versions of letters - when a lower case form of a letter differs markedly in appearance to its upper case equivalent, the lower case version is printed immediately next to the upper case version.

Albanian

Spoken by about 4 million people, in Albania, Kosovo, Macedonia, Serbia, Montenegro, Bulgaria, Romania and Italy.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
â		131	131	00E2
Â		na*	182	00C2
ç		135	135	00E7
Ç		128	128	00C7
ë		137	137	00EB
Ë		na	211	00CB

*na = not available as an ASCII code in the specified code page

Albanian is alphabetized as follows:

   A B C Ç D E Ë F G H I J K L M N O P Q R S T U V W X Y Z

Basque

Basque is spoken by some 700 000 people in the Spanish regions of País-Vasco and Navarra, and in south-eastern France.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
ç		135	135	00E7
Ç		128	128	00C7
é		130	130	00E9
É		144	144	00C9
ñ		164	164	00F1
Ñ		165	165	00D1
ü		129	129	00FC
Ü		154	154	00DC

Basque is alphabetized as follows:

   A B C (Ç) D E F G H I J K L M N Ñ O P Q R S T U (Ü) V W X Y Z

Breton

Breton, a Celtic language, is spoken in the westernmost parts of Brittany, France, by some 600 000 people.

Apostrophes are used within words in Breton.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
â		131	131	00E2
Â		na*	182	00C2
ê		136	136	00EA
Ê		na	210	00CA
ñ		164	164	00F1
Ñ		165	165	00D1
ù		151	151	00F9
Ù		na	235	00D9
ü		129	129	00FC
Ü		154	154	00DC

*na = not available as an ASCII code in the specified code page

Breton is alphabetized as follows:

   A (Â) B (C) Ch C’h D E (Ê) F G H I J K L M N Ñ O P (Q) R S T U (Ù,Ü) V W (X) Y Z

Catalan

There are about 10 540 000 speakers of Catalan - in Spain it is spoken by some 5 980 000 people in Catalonia, 3 350 000 people in Valencia (where it is called Valencian and is considered by some a distinct language), 755 000 people in the Balearic Islands, 48 000 people in the eastern part of Aragon and 2 000 people in Murcia. It is the national language of Andorra and is spoken there by 38 000 people. There are 330 000 speakers in Roussillon in south-eastern France and 37 000 speakers in the town of Alghero (L’Alguer) on Sardinia, Italy.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
à		133	133	00E0
À		na*	183	00C0
ç		135	135	00E7
Ç		128	128	00C7
é		130	130	00E9
É		144	144	00C9
è		138	138	00E8
È		na	212	00C8
í		161	161	00ED
Í		na	214	00CD
ï		139	139	00EF
Ï		na	216	00CF
ŀ		na	na	0140
Ŀ		na	na	013F
ó		162	162	00F3
Ó		na	224	00D3
ò		149	149	00F2
Ò		na	227	00D2
ú		163	163	00FA
Ú		na	233	00DA

*na = not available as an ASCII code in the specified code page

In Catalan, abbreviations of plurals are doubled. For example, the Catalan abbreviations for the United States is EE.UU. The full stops must be added after each double-letter, never between them.

Catalan is alphabetized as follows:

   A (À) B C (Ç) D E (É, È) F G H I (Í, Ï) J K L Ŀl M N O (Ó, Ò) P Q R S T U (Ú, Ü) V W X Y Z

In Catalan a middle dot (“·”) may appear within words between ells to indicate differences in pronunciation. “ll” will indicate a single sound as in Spanish. “l·l” will indicate an ell sound more akin to the English. For example “Paral·lel”, “Col·legi”. This dot may be found in databases as a “.” or a hyphen (“-”). It should not be removed.

Croatian

Although Serbian and Croat are basically the same language, the former is written in Cyrillic script, the latter in Roman. Croatian is spoken by about 6 million people in Croatia, Slovenia, Bosnia-Hercegovina and Serbia.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
č	cc	na*	na*	010D
Č	Cc	na	na	010C
ć	ch	na	na	0107
Ć	Ch	na	na	0106
ð	dj	na	na	0111
Ð	Dj	na	na	0110
š	sh	na	na	0161
Š	Sh	na	na	0160
ž	zz	na	na	017E
Ž	Zh	na	na	017D

*na = not available as an ASCII code in the specified code page

Croatian is alphabetized as follows:

   A B C Ć Č D Ð E F G H I J K L M N O P Q R S Š T U V (W) (X) (Y) Z Ž

Czech

Czech is spoken by about 10 million people in the Czechia / Czech Republic.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
á		160	160	00E1
Á		na*	181	00C1
č		na	na*	010D
Č		na	na	010C
ď		na	na	010F
Ď		na	na	010E
é		130	130	00E9
É		144	144	00C9
ě		na	na	011B
Ě		na	na	011A
í		161	161	00ED
Í		na	214	00CD
ň		na	na	0148
Ň		na	na	0147
ó		162	162	00F3
Ó		na	224	00D3
ř		na	na	0159
Ř		na	na	0158
š		na	na	0161
Š		na	na	0160
ť		na	na	0165
Ť		na	na	0164
ú		163	163	00FA
Ú		na	233	00DA
ů		na	na	016F
Ů		na	na	016E
ý		na	236	00FD
Ý		na	237	00DD
ž		na	na	017E
Ž		na	na	017D

*na = not available as an ASCII code in the specified code page

Czech is alphabetized as follows:

   A Á B C Č D Ď E É Ě F G H Ch I Í J K L M N Ň O Ó P Q R Ř S Š T Ťť U Ú Ů V W X Y Ý Z

Danish

Danish is spoken by 5 million people in Denmark, as well as some inhabitants of the Faeroe Islands / Faroe Islands, Greenland and northern Germany.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
á		160	160	00E1
Á		na*	181	00C1
å	aa	134	134	00E5
Å	AA	143	143	00C5
æ	ae	145	145	00E6
Æ	AE	146	146	00C6
é		130	130	00E9
É		144	144	00C9
í		161	161	00ED
Í		na	214	00CD
ó		162	162	00F3
Ó		na	224	00D3
ø	oe	na	155	00F8
Ø	OE	na	157	00D8
ú		163	163	00FA
Ú		na	233	00DA
ý		na	Na	00FD
Ý		na	Na	00DD

*na = not available as an ASCII code in the specified code page

Note: aa was officially replaced by å in 1948 in all words, but aa remains allowed in place names such as Aalborg and in personal names.

Danish is alphabetized as follows:

   A (Á) B C D E (É) F G H I (Í) J K L M N O (Ó) P Q R S T U (Ú) V W X Y (Ý) Z Æ Ø Å (AA)

Dutch

Dutch is spoken by about 14 million people in The Netherlands and about 5 million Belgians. There is a small Dutch-speaking minority in northern France.

*Letter*	*Alternative*	*ASCII code (page 437)*	*ASCII code (page 850)*	*Unicode™*
æ	ae	145	145	00E6
Æ	AE	146	146	00C6
ä	a	132	132	00E4
Ä	A	142	142	00C4
ë	e	137	137	00EB
Ë	E	na*	211	00CB
é	e	130	130	00E9
É	E	144	144	00C9
è	e	138	138	00E8
È	E	na	212	00C8
ê	e	136	136	00EA
Ê	E	na	210	00CA
ï	i	139	139	00EF
Ï	I	na	216	00CF
ö	o	148	148	00F6
Ö	O	153	153	00D6
ó	o	162	162	00F3
Ó	O	na	224	00D3
ò	o	149	149	00F2
Ò	O	na	227	00D2
ô	o	147	147	00F4
Ô	O	na	226	00D4
ü	u	129	129	00FC
Ü	U	154	154	00DC
ij	ij or y	152	152	0133
IJ	IJ or Y	na	na*	0132

*na = not available as an ASCII code in the specified code page

NB Only ë and é and their upper case equivalents are commonly found in Dutch. Note that the letter ij is a single letter in the Dutch alphabet, coming between y and z, but it is always, without exception, typed as two letters - i and j - in normal usage, or it is written as a y. You should also do this. Note, however, that when these occur at the beginning of a real noun, both the I and the J must be in upper case, i.e. Krimpen aan de IJssel.

Dutch is alphabetized as follows:

   A (Ä) B C D E (Ë) F G H I (Ï) J K L M N O (Ö) P Q R S T U (Ü)V W X Y IJ Z

English

English is spoken by about 322 000 000 throughout the World, in many countries as a second language. Spoken in the United Kingdom, Ireland, the United States of America, Canada, Australia, New Zealand, South Africa and many former British colonies.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à	a	133	133	00E0
À	A	na*	183	00C0
ç	c	135	135	00E7
Ç	C	128	128	00C7
ë	e	137	137	00EB
Ë	E	na	211	00CB
é	e	130	130	00E9
É	E	144	144	00C9
è	e	138	138	00E8
È	E	na	212	00C8
ê	e	136	136	00EA
Ê	E	na	210	00CA
ï	i	139	139	00EC
Ï	I	na	216	00CF
ñ	n	164	164	00F1
Ñ	N	165	165	00D1
ô	o	147	147	00F4
Ô	O	na	226	00D4
ö	o	148	148	00F6
Ö	O	153	153	00D6

*na = not available as an ASCII code in the specified code page

English words are often written without diacritical marks because of ignorance, but many words, especially of French origin, such as façade, rôle, éclair, belovèd, naïve and so on, should correctly be written using a diacritic mark.

English is alphabetized as follows:

   A (À) B C (Ç) D E (É È Ê Ë) F G H I (Ï) J K L M N (Ñ) O (Ö Ô) P Q R S T U V W X Y Z

Estonian

Estonian is spoken by almost a million people in Estonia, Russia and Latvia.

It is written in the Latin, not the Cyrillic, script.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ä		132	132	00E4
Ä		142	142	00C4
ö		148	148	00F6
Ö		153	153	00D6
õ		na*	228	00F5
Õ		na	229	00D5
š		na	na*	0161
Š		na	na	0160
ü		129	129	00FC
Ü		154	154	00DC
ž		na	na	017E
Ž		na	na	017D

*na = not available as an ASCII code in the specified code page

Estonian is alphabetized as follows:

   A B (C) D E F G H I J K L M N O P (Q) R S Š Z Ž T U V (W) (X) Ō Ā Ö Ü (Y)

Faroese

Faroese is spoken by most of the 40 000 inhabitants of the Faeroe Islands / Faroe Islands. It is related to Icelandic and resembles old Norse.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
á		160	160	00E1
Á		na*	181	00C1
æ		145	145	00E6
Æ		146	146	00C6
ð		na	208	00F0
Ð		na	209	00D0
í		161	161	00ED
Í		na	214	00CD
ó		162	162	00F3
Ó		na	224	00D3
ø		na	155	00F8
Ø		na	157	00D8
ú		163	163	00FA
Ú		na	233	00DA
ý		na	236	00FD
Ý		na	237	00DD

*na = not available as an ASCII code in the specified code page

Faroese is alphabetized as follows:

   A Á B C D Ð E F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý Z Æ Ø

Finnish

Finnish is spoken by about 4.5 million people in Finland, and by about 50,000 people in Russia and 30,000 in northern Sweden.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ä	a	132	132	00E4
Ä	A	142	142	00C4
å	a	134	134	00E5
Å	A	143	143	00C5
ö	o	148	148	00F6
Ö	O	153	153	00D6
š	s	na*	na	0161
Š	S	na	na	0160
ž	z	na	na	017E
Ž	Z	na	na	017D

*na = not available as an ASCII code in the specified code page

Finnish is alphabetized as follows:

   A B (C) D E F G H I J K L M N O P (Q) R S Š T U V (W) (X) Y Z Ž (Å) Ä Ö

French

French is spoken by about 72 million people throughout the World; 56 million people in France and Monaco, 6 million people in Canada, 4 million people in Belgium, 3 million people in Switzerland, 1 million in the United States of America and about 300 000 people in Luxembourg. It is also spoken, often as a second language, by inhabitants of France’s ex-colonies.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à	a	133	133	00E0
À	A	na*	183	00C0
â	a	131	131	00E2
Â	A	na	182	00C2
æ	ae	145	145	00E6
Æ	AE	146	146	00C6
ç	c	135	135	00E7
Ç	C	128	128	00C7
ë	e	137	137	00EB
Ë	E	na	211	00CB
é	e	130	130	00E9
É	E	144	144	00C9
è	e	138	138	00E8
È	E	na	212	00C8
ê	e	136	136	00EA
Ê	E	na	210	00CA
î	i	140	140	00EE
Î	I	na	215	00CE
ï	i	139	139	00EF
Ï	I	na	216	00CF
ô	o	147	147	00F4
Ô	O	na	226	00D4
œ	oe	na	na*	0153
Œ	OE	na	na	0152
ù	u	151	151	00F9
Ù	U	na	235	00D9
û	u	150	150	00FB
Û	U	na	234	00DB
ü	u	129	129	00FC
Ü	U	154	154	00DC
ÿ	y	na	na	00FF
Ÿ	Y	na	na	0178

*na = not available as an ASCII code in the specified code page

French-speakers rarely assign accents to upper-case letters. Some listings will simply omit accents in upper-case letters, others will use the lower-case accented equivalent even where the rest of the word is in upper case.

French is alphabetized as follows:

   A (Â, À, Æ) B C (Ç) D E (É, Ê, È, Ë) F G H I (Î, Ï) J K L M N O (Ô, Œ) P Q R S T U (Û, Ù) V W X Y (Ÿ) Z

Friesian or Frisian (West)

There are about 300,000 speakers of West Friesian in the province of Friesland in the northern Netherlands.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
â		131	131	00E1
Â		na*	182	00C2
ä		132	132	00E4
Ä		142	142	00C4
é		130	130	00E9
É		144	144	00C9
ê		136	136	00EA
Ê		Na	210	00CA
ë		137	137	00EB
Ë		Na	211	00CB
ï		139	139	00EF
Ï		Na	216	00CF
ô		147	147	00F4
Ô		Na	226	00DA
ö		148	148	00F6
Ö		153	153	00D6
û		150	150	00FB
Û		Na	234	00DB
ú		163	163	00FA
Ú		Na	233	00DA
ü		129	129	00FC
Ü		154	154	00DC

*na = not available as an ASCII code in the specified code page

West Friesian is alphabetized as follows:

   A (Â, Ä) B C D E (É, Ê, Ë) F G H I (Ï, Y) J K L M N O (Ô, Ö) P Q R S T U (Ú, Û, Ü) V W X Z

Friulian

There are some 600 000 speakers of Friulian in Northeast Italy and a small number in Slovenia.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à		133	133	00E0
À		na*	183	00C0
á		160	160	00E1
Á		Na	181	00C1
â		131	131	00E2
Â		Na	182	00C2
è		138	138	00E8
È		Na	212	00C8
ì		141	141	00EC
Ì		Na	222	00CC
ò		149	149	00F2
Ò		Na	227	00D2
ù		151	151	00F9
Ù		Na	235	00D9

*na = not available as an ASCII code in the specified code page

Friulian is alphabetized as follows:

   A À Á Â B C D E È F G H I Ì J K L M N O Ò P Q R S T U (Ù) V W X Y Z

Gaelic

Gaelic is spoken in two distinct varieties in Scotland and Ireland. It has almost 20 000 speakers in the former and some 500 000 in the latter.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à		133	133	00E0
À		na*	183	00C0
á		160	160	00E1
Á		na	181	00C1
é		130	130	00E9
É		144	144	00C9
è		138	138	00E8
È		na	212	00C8
í		161	161	00ED
Í		na	214	00CD
ì		141	141	00EC
Ì		na	222	00CC
ó		162	162	00F3
Ó		na	224	00D3
ò		149	149	00F2
Ò		na	227	00D2
ú		163	163	00FA
Ú		na	233	00DA
ù		151	151	00F9
Ù		na	235	00D9

*na = not available as an ASCII code in the specified code page

Irish is alphabetized as follows:

   A (Á) B C D E (É) F G H I (Í) J (K) L M N O (Ó) P Q R S T U (Ú) V W X Y Z

The Scottish Gaelic alphabet has 18 letters and is alphabetized as follows:

   A (À, Á) B C D E (È, É) F G H I (Ì) (J) (K) L M N O (Ò, Ó) P (Q) R S T U (Ù) (V) (W) (X) (Y) (Z)

Galician

Galician is spoken by over 3 million people in Spain.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
á		160	160	00E1
Á		na*	181	00C1
é		130	130	00E9
É		144	144	00C9
í		161	161	00ED
Í		na	214	00CD
ñ		164	164	00F1
Ñ		165	165	00D1
ó		162	162	00F3
Ó		na	224	00D3
ú		163	163	00FA
Ú		na	233	00DA
ü		129	129	00FC
Ü		154	154	00DC

*na = not available as an ASCII code in the specified code page

Galician is alphabetized as follows:

   A Á B C D E (É) F G H I (Í) J K L M N Ñ O (Ó) P Q R S T U (Ú Ü) V W X Y Z

German

German is one of the most widely spread languages in Europe, spoken by about 80 million people in Germany, 7.7 million people in Austria, 4.4 million people in Switzerland, 66 000 people in Belgium, 28 500 people in the South Tyrol region of Italy and 29 000 people in Liechtenstein, as well as by smaller minorities in southern Denmark, eastern France and Luxembourg. There are large regional variations in the form of German spoken, especially between Germany, Switzerland and Austria.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ä	ae	132	132	00E4
Ä	AE	142	142	00C4
ö	oe	148	148	00F6
Ö	OE	153	153	00D6
ü	ue	129	129	00FC
Ü	UE	154	154	00DC
ß	ss	225	225	00DF

The Scharfes S (“ß”) is used only in lower case. The upper case equivalent is SS. This symbol is NOT used in Swiss-German. Its use is also defined by the new German spelling rules, introduced on 1st August 2005, but which are not used in Bavaria or Nordrhein-Westphalia. Where alternate spellings are possible, this has been noted within the text of this book.

It is now allowed to write three consonants together without a hyphen. As this change is recent, it may occur that where three esses come together in German, a hyphen is used as follows:

   Strauss-strasse

Instead of

   Straussstrasse

All nouns in German are written with the first letter as a capital.

German is alphabetized as follows:

   A (Ä) B C D E F G H I J K L M N O (Ö) P Q R S (ß=SS) T U Ü V W X Y Z

Greek

Greek is spoken by about 10 million people in Greece and Cyprus. It has its own alphabet which has to be transliterated for use with databases containing Roman script. The table below gives the transliteration symbol of each Greek character. Each character is given in upper case then lower case version:

Letter	Name	Transliteration	Unicode™
Αά	álfa	a	0391, 03B1
Ββ	víta	v	0392, 03B2
Ѓγ	gháma	gh (y before an e sound)	0393, 03B3
Δδ	thélta	th	0394, 03B4
Єε	épsilon	e	0395, 03B5
Ζζ	zíta	z	0396, 03B6
Ηη	íta	i	0397, 03B7
Θθ	thíta	th	0398, 03B8
Ιι	yóta	i	0399, 03B9
Κκ	kápa	k	039A, 03BA
Λλ	lámtha	l	039B, 03BB
Μμ	mi	m	039C, 03BC
Νν	ni	n	039D, 03BD
Ξξ	ksi	ks	039E, 03BE
Οο	ómikron	o	039F, 03BF
Ππ	pi	p	03A0, 03C0
Ρρ	ro	r	03A1, 03C1
Σσς	sigma	s	03A3, 03C3, 03C2
Ττ	taf	t	03A4, 03C4
Υυ	ípsilon	i	03A5, 03C5
Φφ	fi	f	03A6, 03C6
Χχ	hi	h	03A7, 03C7
Ψψ	psi	ps	03A8, 03C8
Ωω	omégha	o	03A9, 03C9

Note that the letter ς is only used at the end of a word.

Greenlandic

Greenlandic is spoken by 40 000 people in Greenland and a further 7 000 people in Denmark.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
K’		na*	na	0138
æ		145	145	00E6
Æ		146	146	00C6
ø		na	155	00F8
Ø		na	157	00D8
å		134	134	00E5
Å		143	143	00C5

*na = not available as an ASCII code in the specified code page

Greenlandic is alphabetized as follows:

   A B C D E F G H I J K L M N O P Q (K’) R S T U V W X Y Z Æ Ø Å

Hungarian

Hungarian is spoken by about 10 million people in Hungary, 1.5 million people in Romania, and by minorities in Slovakia, Slovenia, Croatia and Serbia.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
á	a’	160	160	00E1
Á	A’	na*	181	00C1
é	e’	130	130	00E9
É	E’	144	144	00C9
í	i’	161	161	00ED
Í	I’	na	214	00CD
ó	o’	162	162	00F3
Ó	O’	na	224	00D3
ö		148	148	00F6
Ö		153	153	00D6
ő		na	na*	0151
Ő		na	na	0150
ú		163	163	00FA
Ú		na	233	00DA
ü		129	129	00FC
Ü		154	154	00DC
ű		na	na	0171
Ű		na	na	0170

*na = not available as an ASCII code in the specified code page

Hungarian is alphabetized as follows:

   A Á B C CS D E É F G GY H I Í J K L LY M N NY O Ó Ö Ő P Q R S SZ T TY U Ú Ü Ű V W X Y Z ZS

Icelandic

Icelandic is spoken by 250 000 people in Iceland.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
á	a	160	160	00E1
Á	A	na*	181	00C1
æ	ae	145	145	00E6
Æ	AE	146	146	00C6
ð	d	na	208	00F0
Ð	D	na	209	00D0
é	e	130	130	00E9
É	E	144	144	00C9
í	i	161	161	00ED
Í	I	na	214	00CD
ó	o	162	162	00F3
Ó	O	na	224	00D3
ö	o	148	148	00F6
Ö	O	153	153	00D6
œ	oe	na	na*	0153
Œ	OE	na	na	0152
þ	th	na	232	00FE
Þ	TH	na	231	00DE
ú	u	163	163	00FA
Ú	U	na	233	00DA
ý	y	na	236	00FD
Ý	Y	na	237	00DD

*na = not available as an ASCII code in the specified code page

Icelandic is alphabetized as follows:

   A Á B C D Ð E É F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý C Þ Æ Ö

Italian

Italian is spoken by about 57 million people in Italy, San Marino and the Holy See and about 500 000 people in Switzerland.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à		133	133	00E0
À		na*	183	00C0
é	e’	130	130	00E9
É	E’	144	144	00C9
è		138	138	00E8
È		na	212	00C8
í	i’	161	161	00ED
Í	I’	na	214	00CD
ì		141	141	00EC
Ì		na	222	00CC
ï		139	139	00EF
Ï		na	216	00CF
ó	o’	162	162	00F3
Ó	O’	na	224	00D3
ò		149	149	00F2
Ò		na	227	00D2
ú	u’	163	163	00FA
Ú	U’	na	233	00DA
ù		151	151	00F9
Ù		na	235	00D9

*na = not available as an ASCII code in the specified code page

Italian is alphabetized as follows:

   A (À) B C D E (É, È) F G H I (Í, Ì, Ï) J K L M N O (Ó,Ò) P Q R S T U (Ú, Ù) V W X Y Z

Ladin

There are about 30 000 speakers of Ladin in Northeast Italy.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
â		131	131	00E2
Â		na*	182	00C2
á		160	160	00E1
Á		na	181	00C1
à		133	133	00E0
À		na	183	00C0
ê		136	136	00EA
Ê		na	210	00CA
é		130	130	00E9
É		144	144	00C9
è		138	138	00E8
È		na	212	00C8
î		140	140	00EE
Î		na	215	00CE
í		161	161	00ED
Í		na	214	00CD
ì		141	141	00EC
Ì		na	222	00CC
ô		147	147	00F4
Ô		na	226	00D4
ó		162	162	00F3
Ó		na	224	00D3
ò		149	149	00F2
Ò		na	227	00D2
û		150	150	00FB
Û		na	234	00DB
ú		163	163	00FA
Ú		na	233	00DA
ù		151	151	00F9
Ù		na	235	00D9

*na = not available as an ASCII code in the specified code page

Ladin is alphabetized as follows:

   A (Â Á À) B C D E (Ê É È) F G H I (Î Í Ì) J K L M N O (Ô Ó Ò) P Q R S T U (Û Ú Ù) V W X Y Z

Latvian

There are about 1 400 000 speakers of Latvian, mainly in Latvia.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ā		na*	na	0101
A		na	na	0100
č		na	na	010D
Č		na	na	010C
ē		na	na	0113
Ē		na	na	0112
ģ		na	na	0121
Ģ		na	na	0122
ī		na	na	012B
Ī		na	na	012A
ķ		na	na	0137
Ķ		na	na	0136
ļ		na	na	013C
Ļ		na	na	013B
ņ		na	na	0146
Ņ		na	na	0145
ō		na	na	014D
Ō		na	na	014C
ŗ		na	na	0157
Ŗ		na	na	0156
š		na	na	0161
Š		na	na	0160
ū		na	na	016B
Ū		na	na	016A
ž		na	na	017E
Ž		na	na	017D

*na = not available as an ASCII code in the specified code page

Latvian is alphabetized as follows:

   A (A) B C Č D E (Ē) F G Ģģ H I (Ī) J K Ķ L Ļ M N Ņ O (Ō) P (Q) R Ŗ S Š T U (Ū) V (W) (X) (Y) Z Ž

Letzebuergesch

Letzebuergesch, a language related to German, is the official language of Luxembourg and is spoken by some 350 000 people.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
â		131	131	00E2
Â		na*	182	00C2
ä		132	132	00E4
Ä		142	142	00C4
é		130	130	00E9
É		144	144	00C9
ë		137	137	00EB
Ë		na	211	00CB
M^		na	na	004D+0302
m^		na	na	006D+0302
Ň		na	na	004E+0302
ň		na	na	006E+0302
ö		148	148	00F6
Ö		153	153	00D6
ô		147	147	00F4
Ô		na	226	00D4
ü		129	129	00FC
Ü		154	154	00DC

*na = not available as an ASCII code in the specified code page

Letzebuergesch is alphabetized as follows:

   A (Ä) B C D E (Ë, É) F G H I J K L M (M^) N (Ň) O (Ö) P Q R S T U (Ü) V W X Y Z

Lithuanian

There are almost 3 million speakers of Lithuanian, mainly in Lithuania.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ą		na*	na	0105
Ą		na	na	0104
č		na	na	010D
Č		na	na	010C
ę		na	na	0119
Ę		na	na	0118
ė		na	na	0117
Ė		na	na	0116
į		na	na	012F
Į		na	na	012E
š		na	na	0161
Š		na	na	0160
ų		na	na	0173
Ų		na	na	0172
ū		na	na	016B
Ū		na	na	016A
ž		na	na	017E
Ž		na	na	017D

*na = not available as an ASCII code in the specified code page

Lithuanian is alphabetized as follows:

   A (Ą) B C Č D E (Ę Ė) F G H (Į Y) J K L M N O P (Q) R S Š T U (U Ū) V (W)(X) Z Ž

Maltese

Maltese, an ancient Arabic language with strong Romance influences, is spoken by about 400 000 people in Malta.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à		133	133	00E0
À		na*	183	00C0
ċ		na	na*	010B
Ċ		na	na	010A
è		138	138	00E8
È		na	212	00C8
ġ		na	na	0121
Ġ		na	na	0120
għ		na	na	g+0127
GĦ		na	na	G+0126
ħ		na	na	0127
Ħ		na	na	0126
ì		141	141	00EC
Ì		na	222	00CC
ĩ		na	na	0129
Ĩ		na	na	0128
ò		149	149	00F2
Ò		na	227	00D2
ù		151	151	00F9
Ù		na	235	00D9
ż		na	na	017C
Ż		na	na	017B

*na = not available as an ASCII code in the specified code page

Maltese is alphabetized as follows:

   A (À) B Ċ (C) D E (È) F Ġ G H Ħ I (Ì, Î) J K L M N Għ O (Ò) P Q R S T U (Ù) V W X (Y) Ż Z

Norwegian

Norwegian is spoken by about 4 million people in Norway.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
å	aa	134	134	00E5
Å	AA	143	143	00C5
æ	ae	145	145	00E6
Æ	AE	146	146	00C6
é		130	130	00E9
É		144	144	00C9
ó		162	162	00F3
Ó		na*	224	00D3
ò		149	149	00F2
Ò		na	227	00D2
ø	oe	na	155	00F8
Ø	OE	na	157	00D8

*na = not available as an ASCII code in the specified code page

Norwegian is alphabetized as follows:

   A B C D E É F G H I J K L M N O Ó Ò P Q R S T U V W X Y Z Æ Ø Å

Polish

Polish is spoken by about 35 million people, mainly in Poland.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ą		na*	na*	0105
Ą		na	na	0104
ć	c’	na	na	0107
Ć	C’	na	na	0106
ę		na	na	0119
Ę		na	na	0118
ł		na	na	0142
Ł		na	na	0141
ń	n’	na	na	0144
Ń	N’	na	na	0143
ó		162	162	00F3
Ó		na	224	00D3
ś	s’	na	na	015B
Ś	S’	na	na	015A
ź	z’	na	na	017A
Ź	Z’	na	na	0179
ż		na	na	017C
Ż		na	na	017B

*na = not available as an ASCII code in the specified code page

The letters Q, V and X are only used in foreign words.

Where an accent cannot be reproduced, it is usually just dropped. However, an apostrophe or a comma may be added immediately following the accented letter to clarify meaning. For example: was (=to you), wa,s (=moustache).

Opening quotation marks in Polish are usually written on the same level as the text as follows: ,,Hello’’. However most computer systems, even in Poland, do not allow this, and the standard “Western” quotation marks are used.

Polish is alphabetized as follows:

   A Ą B C Ć D E Ę F G H I J K L Ł M N Ń O Ó P R S Ś T U W Y Z Ź Ż

Portuguese

Portuguese is spoken by about 10 million people in Portugal and 163 million people in Brazil. There are also many second-language speakers in Portugal’s ex-colonies. In Portugal, there are Northern dialects and Central-Southern dialects.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
á		160	160	00E1
Á		na*	181	00C1
à		133	133	00E0
À		na	183	00C0
â		131	131	00E2
Â		na	182	00C2
ã		na	198	00E3
Ã		na	199	00C3
ç		135	135	00E7
Ç		128	128	00C7
é		130	130	00E9
É		144	144	00C9
è		138	138	00E8
È		na	212	00C8
ê		136	136	00EA
Ê		na	210	00CA
í		161	161	00ED
Í		na	214	00CD
ì		141	141	00EC
Ì		na	222	00CC
ó		162	162	00F3
Ó		na	224	00D3
ò		149	149	00F2
Ò		na	227	00D2
ô		147	147	00F4
Ô		na	226	00D4
õ		na	228	00F5
Õ		na	229	00D5
ú		163	163	00FA
Ú		na	233	00DA
ü		129	129	00FC
Ü		154	154	00DC

*na = not available as an ASCII code in the specified code page

Portuguese accents should not be replaced by apostrophes as apostrophes may be used in Portuguese to indicate certain contractions, such as Sant’Ana.

Portuguese is alphabetized as follows:

   A (Á, À, Â, Ã) B C (Ç) D E (É, Ê) F G H I (Í) J K L M N O (Ó, Ô, Õ) P Q R S T U (Ú, Ü) V W X Y Z

Provençals

Provençals is a Romance language related to French and Catalan, found in south-eastern France and north-western Italy. It has a number of dialects.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à		133	133	00E0
À		na*	183	00C0
ç		135	135	00E7
Ç		128	128	00C7
é		130	130	00E9
É		144	144	00C9
è		138	138	00E8
È		na	212	00C8
ó		162	162	00F3
Ó		na	224	00D3

*na = not available as an ASCII code in the specified code page

Rhaeto-Romance

Dialects of Rhaeto-Romance are spoken in Switzerland, western Austria and northern Italy by about 500 000 people.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à		133	133	00E0
À		na*	183	00C0
è		138	138	00E8
È		na	212	00C8
é		130	130	00E9
É		144	144	00C9
ì		141	141	00EC
Ì		na	222	00CC
ò		149	149	00F2
Ò		na	227	00D2
ù		151	151	00F9
Ù		na	235	00D9

*na = not available as an ASCII code in the specified code page

Rhaeto-Romance is alphabetized as follows:

   A (À) B C D E (È,É) F G H I (Ì) J (K) L M N O (Ò) P Q R S T U (Ù) V (W) X (Y) Z

Romanian

Romanian is spoken by about 20 million people in Romania. A similar language to Romanian, written using the Cyrillic alphabet, is spoken in Moldova.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
à		133	133	00E0
À		na*	183	00C0
â		131	131	00E2
Â		na	182	00C2
ă		na	na*	0103
Ă		na	na	0102
è		138	138	00E8
È		na	212	00C8
ì		141	141	00EC
Ì		na	222	00CC
î		140	140	00EE
Î		na	215	00CE
ş		na	na	015F
Ş		na	na	015E
ţ		na	na	0163
Ţ		na	na	0162
ù		151	151	00F9
Ù		na	235	00D9

*na = not available as an ASCII code in the specified code page

Romanian is alphabetized as follows:

   A Â Ă B C D E F G H I Î J K L M N O P (Q) R S Ş T Ţ U V W X (Y) Z

Romany

Romany is the language of Europe’s Roma people, and knows many dialects and forms, each strongly influenced by the indigenous languages of the region which its speakers inhabit.

Apostrophes are used in Romany words.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
è		na*	na*	010D
š		na	na	0161
ž		na	na	017E

*na = not available as an ASCII code in the specified code page

Russian

Russian is spoken by about 143 million people in Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine & Uzbekistan. It has its own alphabet which has to be transliterated for use with databases containing Roman script. Each character is given in upper case then lower case version:

Letter	Transliteration	Unicode™
Аa	a	0410, 0430
Бб	b	0411, 0431
Вв	v	0412, 0432
Гг	g	0413, 0433
Дд	d	0414, 0434
Ее	e	0415, 0435
Ёё	yo	0401, 0451
Жж	su (as in leisure)	0416, 0436
Зз	z	0417, 0437
Ии	ee	0418, 0438
Йй	i	0419, 0439
Кк	k	041A, 043A
Лл	l	041B, 043B
Мм	m	041C, 043C
Нн	n	041D, 043D
Оо	o	041E, 043E
Пп	p	041F, 043F
Рр	r	0420, 0440
Сс	s	0421, 0441
Тт	t	0422, 0442
Уу	u (oo)	0423, 0443
Фф	f	0424, 0444
Хх	ch (as in loch)	0425, 0445
Цц	ts	0426, 0446
Чч	ch	0427, 0447
Шш	sh	0428,0448
Щщ	shch	0429, 0449
Ъъ	hard sign (not pronounced)	042A, 044A
Ыы	iy	042B, 044B
Ьь	soft sign (not pronounced)	042C, 044C
Ээ	e	042D, 044D
Юю	yoo	042E, 044E
Яя	ya	042F, 044F

Russian is alphabetized as in the table above.

Sámi

Sámi is spoken in the north of Norway, Sweden, Finland and Russia by some 35 000 people.

There are around ten versions of Sámi ^[1], depending on definition: Kildin (using the Cyrillic script); Akkala, Inari, Lule, Northern, Pite, Skoty, Southern, Ter and Ume (using the Latin script).

This table encompasses all diacritical characters used for those using the Latin script.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
á		160	160	00E1
Á		na*	181	00C1
å		134	134	00E5
Å		143	143	00C5
ä		132	132	00E4
Ä		142	142	00C4
â		131	131	00E2
Â		na	182	00C2
č		na	na*	010D
Č		na	na	010C
Ð		na	209	0110
ð		na	na	0111
ń		na	na	0144
Ń		na	na	0143
ŋ		na	na	014B
Ŋ		na	na	014A
ö		148	148	00F6
Ö		153	153	00D6
õ		na	228	00F5
Õ		na	229	00D5
ø		na	155	00F8
Ø		na	157	00D8
š		na	na	0161
Š		na	na	0160
Ŧ		na	na	0166
ŧ		na	na	0167
ž		na	na	017E
Ž		na	na	017D
3^V		na	na	01EE

Diacritical characters borrowed from Finnish, Swedish and Norwegian are also used.

*na = not available as an ASCII code in the specified code page

Slovak

Spoken by about 5 million people, mainly in Slovakia. Slovak is usually written in the Cyrillic script. When transliterated to Latin characters, loss of the diacritical marks should be avoided.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
á		160	160	00E1
Á		na*	181	00C1
č		na	na*	010D
Č		na	na	010C
ď or d^v	d’	na	na	010F
Ď		na	na	010E
é		130	130	00E9
É		144	144	00C9
ě		na	na	011B
Ě		na	na	011A
í		161	161	00ED
Í		na	214	00CD
ĺ	l’	na	na	013A
Ĺ	L’	na	na	0139
ň		na	na	0148
Ň		na	na	0147
ó		162	162	00F3
Ó		na	224	00D3
ř		na	na	0159
Ř		na	na	0158
š		na	na	0161
Š		na	na	0160
ť or t^v	t’	na	na	0165
Ť		na	na	0164
ú		163	163	00FA
Ú		na	233	00DA
ů		na	na	016F
Ů		na	na	016E
ý		na	na	00FD
Ý		na	na	00DD
ž		na	na	017E
Ž		na	na	017D

*na = not available as an ASCII code in the specified code page

Slovak is alphabetized as follows:

   A Á B C Č D Ď E É Ě F G H CH I Í J K L M N Ň O Ó P Q R Ř S Š T Ť U Ú Ů V W X Y Ý Z Ž

Slovenian

Slovenian is spoken by about 1.5 million people in Slovenia and small parts of Hungary and Italy.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
č	ch	na*	na*	010D
Č	Ch	na	na	010C
š	sh	na	na	0161
Š	Sh	na	na	0160
ž	zh	na	na	017E
Ž	Zh	na	na	017D

*na = not available as an ASCII code in the specified code page

Slovenian is alphabetized as follows:

   A B C Č D E F G H I J K L M N O P (Q) R S Š T U V (W) (X) (Y) Z Ž

Sorbian

Sorbian is a Slavic language spoken by some 50 000 people in two distinct dialects in the south-easternmost part of the former East Germany.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ć		na*	na*	0107
Ć		na	na	0106
č		na	na	010D
Č		na	na	010C
ě		na	na	011B
Ě		na	na	011A
ł		na	na	0142
Ł		na	na	0141
ń		na	na	0144
Ń		na	na	0143
ó		162	162	00F3
Ó		na	224	00D3
ř		na	na	0159
Ř		na	na	0158
ś		na	na	015B
Ś		na	na	015A
š		na	na	0161
Š		na	na	0160
ź		na	na	017A
Ź		na	na	0179
ž		na	na	017E
Ž		na	na	017D

*na = not available as an ASCII code in the specified code page

Upper Sorbian is alphabetized as follows:

   A B C Č D DŹ E Ě F G H CH I J K Ł L M N Ń O Ó P R Ř S Š T Ć U W X Y Z Ž

Lower Sorbian is alphabetized as follows:

   A B Bj C Č Ć D DŹ E Ě F G H CH I J K L Ł M MJ N Ń NJ O P PJ (Q) R Ŕ RJ S Ś Š TŚ TŠ T U (V) W WJ (X) Y Z Ź Ž

Spanish

Spanish is spoken by about 27.5 million people in Spain, 81 million people in Central America, 18 million people in the Caribbean, 90 million people in South America, 22 million people in the United States of America and by many others as a second language, especially in Spain’s ex-colonies.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
Á	a	160	160	00E1
Á	A	na*	181	00C1
É	e	130	130	00E9
É	E	144	144	00C9
Í	i	161	161	00ED
Í	I	na	214	00CD
Ñ	n	164	164	00F1
Ñ	N	165	165	00D1
Ó	o	162	162	00F3
Ó	O	na	224	00D3
Ú	u	163	163	00DA
Ú	U	na	233	00FA
Ü	u	129	129	00FC
Ü	U	154	154	00DC

*na = not available as an ASCII code in the specified code page

In Spanish, abbreviations of plurals are doubled. For example, the Spanish abbreviations for the United States is EE.UU. The full stops must be added after each double-letter, never between them.

Spanish is alphabetized as follows:

   A (Á) B C CH D E (É) F G H I (Í) J K L LL M N Ñ O (Ó) P Q R RR S T U (Ú, Ü) V W X Y Z

A law passed in Spain has removed “ch” and “ll” as separate letters of the Spanish alphabet, though most Spanish-speakers still use them as such.

Swedish

Swedish is spoken by about 8 million people in Sweden and about 300 000 people in western and southern Finland.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
Å		134	134	00E5
Å		143	143	00C5
Ä	ae	132	132	00C4
Ä	AE	142	142	00E4
É		130	130	00E9
É		144	144	00C9
Ö	oe	148	148	00F6
Ö	OE	153	153	00D6
Ü		129	129	00FC
Ü		154	154	00DC

*na = not available as an ASCII code in the specified code page

Swedish is alphabetized as follows:

   A B C D E (É) F G H I J K L M N O P Q R S T U V (W) X Y (Ü) Z Å Ä Ö

Note: in Swedish, in certain grammatical forms, a colon is used in contractions and abbreviations where an apostrophe (or nothing) might be used in English.

Turkish

Turkish is spoken by about 57 million people in Turkey, and by minorities in Greece, Bulgaria and Cyprus.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
â		131	131	00E2
Â		na*	182	00C2
ç		135	135	00E7
Ç		128	128	00C7
ğ		na	na*	011F
Ğ		na	na	011E
İ		na	na	0130
ı		na	na	0131
î		140	140	00EE
Î		na	215	00CE
ö		148	148	00D6
Ö		153	153	00F6
ş		na	na	015F
Ş		na	na	015E
ü		129	129	00FC
Ü		154	154	00DC
Û		150	150	00FB
Û		na	234	00DB

*na = not available as an ASCII code in the specified code page

The distinction between the dotted and dotless I is very important. They are distinct letters and the pronunciation of both are very different. Avoid the temptation of replacing a dotless i with a dotted i.

Turkish is alphabetized as follows:

   A (Â) B C Ç D E F G Ğ H Iı İi J K L M N O Ö P (Q) R S Ş T U Ü Û V (W) (X) Y Z

Welsh

Welsh is spoken by about 600 000 people in Wales.

There are no accepted alternatives to accented characters in Welsh.

Letter	Alternative	ASCII code (page 437)	ASCII code (page 850)	Unicode™
ä		132	132	00E4
Ä		142	142	00C4
â		131	131	00E2
Â		na*	182	00C2
á		160	160	00E1
Á		na	181	00C1
à		133	133	00E0
À		na	183	00C0
ë		137	137	00EB
Ë		na	211	00CB
ê		136	136	00EA
Ê		na	210	00CA
é		130	130	00E9
É		144	144	00C9
è		138	138	00E8
È		na	212	00C8
ï		139	139	00EC
Ï		na	216	00CF
î		140	140	00EE
Î		na	215	00CE
í		161	161	00ED
Í		na	214	00CD
ì		141	141	00EC
Ì		na	222	00CC
ö		148	148	00F6
Ö		153	153	00D6
ô		147	147	00F4
Ô		na	226	00D4
ó		162	162	00F3
Ó		na	224	00D3
ò		149	149	00F2
Ò		na	227	00D2
û		150	150	00FB
Û		na	234	00DB
ü		129	129	00FC
Ü		154	154	00DC
ú		163	163	00FA
Ú		na	233	00DA
ù		151	151	00F9
Ù		na	235	00D9
ŵ		na	na*	0175
Ŵ		na	na	0174
ẅ		na	na	0077+0308
Ẅ		na	na	0057+0308
ẃ		na	na	0077+0301
Ẃ		na	na	0057+0301
ẁ		na	na	0077+0300
Ẁ		na	na	0057+0300
v^..		na	na	0076+0308
V^..		na	na	0056+0308
v^{^}		na	na	0076+0302
V^{^}		na	na	0056+0302
ŷ		na	na	0177
Ŷ		na	na	0176
ÿ		na	na	00FF
Ÿ		na	na	0178
ý		na	na	00FD
Ý		na	na	00DD
ỳ		na	na	0079+0300
Ỳ		na	na	0059+0300

*na = not available as an ASCII code in the specified code page

The Welsh alphabet has 31 letters and is alphabetized as follows:

   A (Â, Á, À, Ä) B C CH D DD E (Ê, É, È, Ë) F FF G NG H I (Î, Í, Ì, Ï) J K L LL M N O (Ô, Ó, Ò, Ö) P PH (Q) R RH S T TH U (Û, Ú, Ù, Ü) (V) W (Ŵ, Ẃ, Ẁ, Ẅ) (X) Y (Ŷ, Ý, Ỳ, Ÿ) Z

References

^ Ludvig Solnør 20221218; Ethnologue https://www.ethnologue.com/browse/names/s 20221219; Wikipedia https://en.wikipedia.org/wiki/Sámi_languages 20221219

Every effort is made to keep this resource updated. If you find any errors, or have any questions or requests, please don't hesitate to contact the author.

All information copyright Graham Rhind 2025. Any information used should be acknowledged and referenced.