_____ _____
Global Sourcebook for International Data Management
by Graham Rhind
Tips for practical management of data in different languages
Whether you hold your address database information in DOS, Windows™ or on another platform; whether your data is for a single country or for many; you will be certain to be faced with the challenge of storing diacritical marks (accents).
Most of the World’s languages (including English) contain one or more diacritical marks. These marks indicate that the letter has a different phonetic value or is stressed differently to the same letter without the mark. Though the marks may appear strange (and possibly irrelevant) to non-speakers of the language concerned, they are required to ensure that the text remains accurate and logical.
The first computer operating systems used a set of character codes called ASCII (American Standard Code for Information Interchange). These are reproduced by holding down the ALT key and typing a 1-, 2- or 3-number code on the key pad. Limited at that time by computer hard- and software to 256 characters, in no way can this system represent each character used in the World’s more than 6 000 languages. To overcome this limitation, a number of code pages were created - each of which were supposed to contain sufficient letters with diacritical marks to represent the alphabets of at least one language.
The most common code page used by English-speakers is 437. This contains the following letters with diacritical marks:
á, â, Ä, ä, à, Å, å, Æ, æ, Ç, ç, ê, ë, è, É, é, ï, î, ì, í, ñ, Ñ, ô, Ö, ö, ò, ó, ß, Ü, ü, û, ù, ú, ÿ
This is sufficient for a number of Western-European and Scandinavian languages, but there are serious omissions. A number of languages in the same group, such as Icelandic, are not represented. Certain letters in Scandinavian languages, such as the "Ø" in Danish and Norwegian are missing. Even for French, the letter “Œ” cannot be reproduced. Equally, a number of upper-case equivalents of lower-case letters with diacritical marks cannot be reproduced using this code page.
Code page 850 supplements code page 437 with the following diacritical marks:
Á, Â, À, ã, Ã, ð, Ð, Ê, Ë, È, Í, Ï, Ì, Ó, Ô, Ò, õ, Õ, Þ, þ, Ú, Û, Ù, ý, Ý.
Further code pages are 852 (Slavic languages/Hungarian); 860 (Portuguese); and 865 (Scandinavian).
Whilst this system allows for the most part for the storage of data for a single country or language area, problems arise when trying to store data from more than one country or language area. Whilst using code page 850, for example, will allow the storage of (most) addresses for Western Europe, it cannot allow the correct storage of addresses from Eastern Europe, even for those languages written in the Latin script. In Hungarian, for example, code page 850 misses ő, Ő,ű and Ű. For Czech, the following letters with diacritical marks are missing: Č, č, ď, Ď, Ě, ě, Í, Ň, ň, ř, Ř, Š, ś, ť, Ť, Ů, ů, ý, Ý, ž, and Ž. Clearly, no accurate representation of the language can succeed without these characters.
On the DOS/WindowsTM platform, a character can look different on different code pages but retain the same code value. Of the Icelandic street type HLÍÐ, for example, only the HL will appear correctly when represented using code page 437, though the ASCII codes of its constituent letters remain the same. Changing the code page to 850 will allow this word to be represented correctly.
Changing the code page is done differently under different operating platforms - refer to your documentation. In DOS/Windows™ systems, the code page (for screen output) is set in the autoexec.bat file by adding the lines:
MODE CON CODEPAGE PREPARE=((nnn) [drive path filename]) MODE CON CODEPAGE SELECT=nnn
where nnn is the code page number and drive path filename is the file containing the code page information (having the extension .CPI). For example, for code page 850, the lines might read:
MODE CON CODEPAGE PREPARE=((850) C:\WINDOWS\COMMAND\EGA.CPI) MODE CON CODEPAGE SELECT=850
To check which code page is active, type
MODE
at the DOS prompt.
ANSI codes (American National Standard Institute) are used with systems such as Microsoft Windows™. Each ANSI page allows the use of fonts which in turn allow for the reproduction of 224 different characters. This allows the reproduction of most of the characters required for the World’s languages in such systems as word-processing programs by changing the font within the document. There are, however, few database programs which allow fonts to be changed within fields or within tables. These programs take as default the ANSI page defined by the underlying Windows™ operating system.
Whilst ANSI allows the output of letters with diacritical marks (assuming a compliant printer and font type), the large problem of data transfer between platforms and software remains. Transferring data between one ASCII page and another; one ANSI page and another; or between ASCII and ANSI pages will necessarily change the appearance (and in the last case the value) of the letter that you are trying to reproduce. This can cause immense damage to the data and make correct storage and output impossible.
An alternative methodology for storing character values is The Unicode™ Standard. This avoids the limitations of hardware, platform and software by not attempting to represent a diacritic using a single character but instead using a four-character code (composed of standard alpha-numeric characters with the same values regardless of code page) to indicate which diacritic mark is to be represented. This code in turn is usually enclosed between other symbols (e.g. <00C1>) to distinguish from the rest of the text. This code system is increasingly being included in new operating systems and software. Here are a number of Unicode™ values:
Á | 00C1 |
á | 00E1 |
à | 00C3 |
É | 00C9 |
Ñ | 00D1 |
Ò | 00D2 |
Ð | 00F8 |
Unfortunately, I can only draw your attention to the issue of diacritical marks without providing a solution for reproducing these marks for all languages within a single database system. To the best of my knowledge, an all encompassing solution does not yet exist. Aim, however, to use operating systems and software which have Unicode™ support so that you will in the near future be able to handle diacritical marks better than is currently possible.
Accents
The tables below cover a large number of European languages and some other languages written in Latin script.
The tables show the diacritical marks contained in the language concerned, the alternative ‘non-ASCII’ equivalent where available (i.e. an alternative without diacritical marks), the ASCII code of the accented letter (if available) in code pages 437 and 850, and the Unicode™ value.
Whilst it is not always possible to type the correctly accented letter using ASCII, accented letters can be output on most printers using alternative ASCII codes. Thus, for example, typing the code to produce a + symbol would, by using an alternative font set, be output as an accented character. To do this, however, means that you have to maintain a consistent platform for your output devices and you will have to use a different printing set for each language to be printed as all accents cannot be covered in a single set.
NB: Alphabetization- in the alphabetization lists, letters between brackets immediately following another letter indicate that this letter is included in the sort for the previous letter. I.e., it is given the same value as the preceding letter for sorting purposes. Those between brackets on their own are letters borrowed from other languages and are sorted, when found, in the position shown. When two “letters” are shown in upper case, these letters form a single letter in the language concerned. The alphabetization tables include only upper case versions of letters - when a lower case form of a letter differs markedly in appearance to its upper case equivalent, the lower case version is printed immediately next to the upper case version.
Albanian
Spoken by about 4 million people, in Albania, Kosovo, Macedonia, Serbia, Montenegro, Bulgaria, Romania and Italy.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
â | 131 | 131 | 00E2 | |
 | na* | 182 | 00C2 | |
ç | 135 | 135 | 00E7 | |
Ç | 128 | 128 | 00C7 | |
ë | 137 | 137 | 00EB | |
Ë | na | 211 | 00CB |
*na = not available as an ASCII code in the specified code page
Albanian is alphabetized as follows:
A B C Ç D E Ë F G H I J K L M N O P Q R S T U V W X Y Z
Basque
Basque is spoken by some 700 000 people in the Spanish regions of País-Vasco and Navarra, and in south-eastern France.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ç | 135 | 135 | 00E7 | |
Ç | 128 | 128 | 00C7 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
ñ | 164 | 164 | 00F1 | |
Ñ | 165 | 165 | 00D1 | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC |
Basque is alphabetized as follows:
A B C (Ç) D E F G H I J K L M N Ñ O P Q R S T U (Ü) V W X Y Z
Breton
Breton, a Celtic language, is spoken in the westernmost parts of Brittany, France, by some 600 000 people.
Apostrophes are used within words in Breton.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
â | 131 | 131 | 00E2 | |
 | na* | 182 | 00C2 | |
ê | 136 | 136 | 00EA | |
Ê | na | 210 | 00CA | |
ñ | 164 | 164 | 00F1 | |
Ñ | 165 | 165 | 00D1 | |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC |
*na = not available as an ASCII code in the specified code page
Breton is alphabetized as follows:
A (Â) B (C) Ch C’h D E (Ê) F G H I J K L M N Ñ O P (Q) R S T U (Ù,Ü) V W (X) Y Z
Catalan
There are about 10 540 000 speakers of Catalan - in Spain it is spoken by some 5 980 000 people in Catalonia, 3 350 000 people in Valencia (where it is called Valencian and is considered by some a distinct language), 755 000 people in the Balearic Islands, 48 000 people in the eastern part of Aragon and 2 000 people in Murcia. It is the national language of Andorra and is spoken there by 38 000 people. There are 330 000 speakers in Roussillon in south-eastern France and 37 000 speakers in the town of Alghero (L’Alguer) on Sardinia, Italy.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
ç | 135 | 135 | 00E7 | |
Ç | 128 | 128 | 00C7 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ï | 139 | 139 | 00EF | |
Ï | na | 216 | 00CF | |
ŀ | na | na | 0140 | |
Ŀ | na | na | 013F | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA |
*na = not available as an ASCII code in the specified code page
In Catalan, abbreviations of plurals are doubled. For example, the Catalan abbreviations for the United States is EE.UU. The full stops must be added after each double-letter, never between them.
Catalan is alphabetized as follows:
A (À) B C (Ç) D E (É, È) F G H I (Í, Ï) J K L Ŀl M N O (Ó, Ò) P Q R S T U (Ú, Ü) V W X Y Z
In Catalan a middle dot (“·”) may appear within words between ells to indicate differences in pronunciation. “ll” will indicate a single sound as in Spanish. “l·l” will indicate an ell sound more akin to the English. For example “Paral·lel”, “Col·legi”. This dot may be found in databases as a “.” or a hyphen (“-”). It should not be removed.
Croatian
Although Serbian and Croat are basically the same language, the former is written in Cyrillic script, the latter in Roman. Croatian is spoken by about 6 million people in Croatia, Slovenia, Bosnia-Hercegovina and Serbia.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
č | cc | na* | na* | 010D |
Č | Cc | na | na | 010C |
ć | ch | na | na | 0107 |
Ć | Ch | na | na | 0106 |
ð | dj | na | na | 0111 |
Ð | Dj | na | na | 0110 |
š | sh | na | na | 0161 |
Š | Sh | na | na | 0160 |
ž | zz | na | na | 017E |
Ž | Zh | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Croatian is alphabetized as follows:
A B C Ć Č D Ð E F G H I J K L M N O P Q R S Š T U V (W) (X) (Y) Z Ž
Czech
Czech is spoken by about 10 million people in the Czechia / Czech Republic.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | 160 | 160 | 00E1 | |
Á | na* | 181 | 00C1 | |
č | na | na* | 010D | |
Č | na | na | 010C | |
ď | na | na | 010F | |
Ď | na | na | 010E | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
ě | na | na | 011B | |
Ě | na | na | 011A | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ň | na | na | 0148 | |
Ň | na | na | 0147 | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ř | na | na | 0159 | |
Ř | na | na | 0158 | |
š | na | na | 0161 | |
Š | na | na | 0160 | |
ť | na | na | 0165 | |
Ť | na | na | 0164 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ů | na | na | 016F | |
Ů | na | na | 016E | |
ý | na | 236 | 00FD | |
Ý | na | 237 | 00DD | |
ž | na | na | 017E | |
Ž | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Czech is alphabetized as follows:
A Á B C Č D Ď E É Ě F G H Ch I Í J K L M N Ň O Ó P Q R Ř S Š T Ťť U Ú Ů V W X Y Ý Z
Danish
Danish is spoken by 5 million people in Denmark, as well as some inhabitants of the Faeroe Islands / Faroe Islands, Greenland and northern Germany.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | 160 | 160 | 00E1 | |
Á | na* | 181 | 00C1 | |
å | aa | 134 | 134 | 00E5 |
Å | AA | 143 | 143 | 00C5 |
æ | ae | 145 | 145 | 00E6 |
Æ | AE | 146 | 146 | 00C6 |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ø | oe | na | 155 | 00F8 |
Ø | OE | na | 157 | 00D8 |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ý | na | Na | 00FD | |
Ý | na | Na | 00DD |
*na = not available as an ASCII code in the specified code page
Note: aa was officially replaced by å in 1948 in all words, but aa remains allowed in place names such as Aalborg and in personal names.
Danish is alphabetized as follows:
A (Á) B C D E (É) F G H I (Í) J K L M N O (Ó) P Q R S T U (Ú) V W X Y (Ý) Z Æ Ø Å (AA)
Dutch
Dutch is spoken by about 14 million people in The Netherlands and about 5 million Belgians. There is a small Dutch-speaking minority in northern France.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
æ | ae | 145 | 145 | 00E6 |
Æ | AE | 146 | 146 | 00C6 |
ä | a | 132 | 132 | 00E4 |
Ä | A | 142 | 142 | 00C4 |
ë | e | 137 | 137 | 00EB |
Ë | E | na* | 211 | 00CB |
é | e | 130 | 130 | 00E9 |
É | E | 144 | 144 | 00C9 |
è | e | 138 | 138 | 00E8 |
È | E | na | 212 | 00C8 |
ê | e | 136 | 136 | 00EA |
Ê | E | na | 210 | 00CA |
ï | i | 139 | 139 | 00EF |
Ï | I | na | 216 | 00CF |
ö | o | 148 | 148 | 00F6 |
Ö | O | 153 | 153 | 00D6 |
ó | o | 162 | 162 | 00F3 |
Ó | O | na | 224 | 00D3 |
ò | o | 149 | 149 | 00F2 |
Ò | O | na | 227 | 00D2 |
ô | o | 147 | 147 | 00F4 |
Ô | O | na | 226 | 00D4 |
ü | u | 129 | 129 | 00FC |
Ü | U | 154 | 154 | 00DC |
ij | ij or y | 152 | 152 | 0133 |
IJ | IJ or Y | na | na* | 0132 |
*na = not available as an ASCII code in the specified code page
NB Only ë and é and their upper case equivalents are commonly found in Dutch. Note that the letter ij is a single letter in the Dutch alphabet, coming between y and z, but it is always, without exception, typed as two letters - i and j - in normal usage, or it is written as a y. You should also do this. Note, however, that when these occur at the beginning of a real noun, both the I and the J must be in upper case, i.e. Krimpen aan de IJssel.
Dutch is alphabetized as follows:
A (Ä) B C D E (Ë) F G H I (Ï) J K L M N O (Ö) P Q R S T U (Ü)V W X Y IJ Z
English
English is spoken by about 322 000 000 throughout the World, in many countries as a second language. Spoken in the United Kingdom, Ireland, the United States of America, Canada, Australia, New Zealand, South Africa and many former British colonies.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | a | 133 | 133 | 00E0 |
À | A | na* | 183 | 00C0 |
ç | c | 135 | 135 | 00E7 |
Ç | C | 128 | 128 | 00C7 |
ë | e | 137 | 137 | 00EB |
Ë | E | na | 211 | 00CB |
é | e | 130 | 130 | 00E9 |
É | E | 144 | 144 | 00C9 |
è | e | 138 | 138 | 00E8 |
È | E | na | 212 | 00C8 |
ê | e | 136 | 136 | 00EA |
Ê | E | na | 210 | 00CA |
ï | i | 139 | 139 | 00EC |
Ï | I | na | 216 | 00CF |
ñ | n | 164 | 164 | 00F1 |
Ñ | N | 165 | 165 | 00D1 |
ô | o | 147 | 147 | 00F4 |
Ô | O | na | 226 | 00D4 |
ö | o | 148 | 148 | 00F6 |
Ö | O | 153 | 153 | 00D6 |
*na = not available as an ASCII code in the specified code page
English words are often written without diacritical marks because of ignorance, but many words, especially of French origin, such as façade, rôle, éclair, belovèd, naïve and so on, should correctly be written using a diacritic mark.
English is alphabetized as follows:
A (À) B C (Ç) D E (É È Ê Ë) F G H I (Ï) J K L M N (Ñ) O (Ö Ô) P Q R S T U V W X Y Z
Estonian
Estonian is spoken by almost a million people in Estonia, Russia and Latvia.
It is written in the Latin, not the Cyrillic, script.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ä | 132 | 132 | 00E4 | |
Ä | 142 | 142 | 00C4 | |
ö | 148 | 148 | 00F6 | |
Ö | 153 | 153 | 00D6 | |
õ | na* | 228 | 00F5 | |
Õ | na | 229 | 00D5 | |
š | na | na* | 0161 | |
Š | na | na | 0160 | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC | |
ž | na | na | 017E | |
Ž | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Estonian is alphabetized as follows:
A B (C) D E F G H I J K L M N O P (Q) R S Š Z Ž T U V (W) (X) Ō Ā Ö Ü (Y)
Faroese
Faroese is spoken by most of the 40 000 inhabitants of the Faeroe Islands / Faroe Islands. It is related to Icelandic and resembles old Norse.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | 160 | 160 | 00E1 | |
Á | na* | 181 | 00C1 | |
æ | 145 | 145 | 00E6 | |
Æ | 146 | 146 | 00C6 | |
ð | na | 208 | 00F0 | |
Ð | na | 209 | 00D0 | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ø | na | 155 | 00F8 | |
Ø | na | 157 | 00D8 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ý | na | 236 | 00FD | |
Ý | na | 237 | 00DD |
*na = not available as an ASCII code in the specified code page
Faroese is alphabetized as follows:
A Á B C D Ð E F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý Z Æ Ø
Finnish
Finnish is spoken by about 4.5 million people in Finland, and by about 50,000 people in Russia and 30,000 in northern Sweden.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ä | a | 132 | 132 | 00E4 |
Ä | A | 142 | 142 | 00C4 |
å | a | 134 | 134 | 00E5 |
Å | A | 143 | 143 | 00C5 |
ö | o | 148 | 148 | 00F6 |
Ö | O | 153 | 153 | 00D6 |
š | s | na* | na | 0161 |
Š | S | na | na | 0160 |
ž | z | na | na | 017E |
Ž | Z | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Finnish is alphabetized as follows:
A B (C) D E F G H I J K L M N O P (Q) R S Š T U V (W) (X) Y Z Ž (Å) Ä Ö
French
French is spoken by about 72 million people throughout the World; 56 million people in France and Monaco, 6 million people in Canada, 4 million people in Belgium, 3 million people in Switzerland, 1 million in the United States of America and about 300 000 people in Luxembourg. It is also spoken, often as a second language, by inhabitants of France’s ex-colonies.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | a | 133 | 133 | 00E0 |
À | A | na* | 183 | 00C0 |
â | a | 131 | 131 | 00E2 |
 | A | na | 182 | 00C2 |
æ | ae | 145 | 145 | 00E6 |
Æ | AE | 146 | 146 | 00C6 |
ç | c | 135 | 135 | 00E7 |
Ç | C | 128 | 128 | 00C7 |
ë | e | 137 | 137 | 00EB |
Ë | E | na | 211 | 00CB |
é | e | 130 | 130 | 00E9 |
É | E | 144 | 144 | 00C9 |
è | e | 138 | 138 | 00E8 |
È | E | na | 212 | 00C8 |
ê | e | 136 | 136 | 00EA |
Ê | E | na | 210 | 00CA |
î | i | 140 | 140 | 00EE |
Î | I | na | 215 | 00CE |
ï | i | 139 | 139 | 00EF |
Ï | I | na | 216 | 00CF |
ô | o | 147 | 147 | 00F4 |
Ô | O | na | 226 | 00D4 |
œ | oe | na | na* | 0153 |
Œ | OE | na | na | 0152 |
ù | u | 151 | 151 | 00F9 |
Ù | U | na | 235 | 00D9 |
û | u | 150 | 150 | 00FB |
Û | U | na | 234 | 00DB |
ü | u | 129 | 129 | 00FC |
Ü | U | 154 | 154 | 00DC |
ÿ | y | na | na | 00FF |
Ÿ | Y | na | na | 0178 |
*na = not available as an ASCII code in the specified code page
French-speakers rarely assign accents to upper-case letters. Some listings will simply omit accents in upper-case letters, others will use the lower-case accented equivalent even where the rest of the word is in upper case.
French is alphabetized as follows:
A (Â, À, Æ) B C (Ç) D E (É, Ê, È, Ë) F G H I (Î, Ï) J K L M N O (Ô, Œ) P Q R S T U (Û, Ù) V W X Y (Ÿ) Z
Friesian or Frisian (West)
There are about 300,000 speakers of West Friesian in the province of Friesland in the northern Netherlands.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
â | 131 | 131 | 00E1 | |
 | na* | 182 | 00C2 | |
ä | 132 | 132 | 00E4 | |
Ä | 142 | 142 | 00C4 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
ê | 136 | 136 | 00EA | |
Ê | Na | 210 | 00CA | |
ë | 137 | 137 | 00EB | |
Ë | Na | 211 | 00CB | |
ï | 139 | 139 | 00EF | |
Ï | Na | 216 | 00CF | |
ô | 147 | 147 | 00F4 | |
Ô | Na | 226 | 00DA | |
ö | 148 | 148 | 00F6 | |
Ö | 153 | 153 | 00D6 | |
û | 150 | 150 | 00FB | |
Û | Na | 234 | 00DB | |
ú | 163 | 163 | 00FA | |
Ú | Na | 233 | 00DA | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC |
*na = not available as an ASCII code in the specified code page
West Friesian is alphabetized as follows:
A (Â, Ä) B C D E (É, Ê, Ë) F G H I (Ï, Y) J K L M N O (Ô, Ö) P Q R S T U (Ú, Û, Ü) V W X Z
Friulian
There are some 600 000 speakers of Friulian in Northeast Italy and a small number in Slovenia.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
á | 160 | 160 | 00E1 | |
Á | Na | 181 | 00C1 | |
â | 131 | 131 | 00E2 | |
 | Na | 182 | 00C2 | |
è | 138 | 138 | 00E8 | |
È | Na | 212 | 00C8 | |
ì | 141 | 141 | 00EC | |
Ì | Na | 222 | 00CC | |
ò | 149 | 149 | 00F2 | |
Ò | Na | 227 | 00D2 | |
ù | 151 | 151 | 00F9 | |
Ù | Na | 235 | 00D9 |
*na = not available as an ASCII code in the specified code page
Friulian is alphabetized as follows:
A À Á Â B C D E È F G H I Ì J K L M N O Ò P Q R S T U (Ù) V W X Y Z
Gaelic
Gaelic is spoken in two distinct varieties in Scotland and Ireland. It has almost 20 000 speakers in the former and some 500 000 in the latter.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
á | 160 | 160 | 00E1 | |
Á | na | 181 | 00C1 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 |
*na = not available as an ASCII code in the specified code page
Irish is alphabetized as follows:
A (Á) B C D E (É) F G H I (Í) J (K) L M N O (Ó) P Q R S T U (Ú) V W X Y Z
The Scottish Gaelic alphabet has 18 letters and is alphabetized as follows:
A (À, Á) B C D E (È, É) F G H I (Ì) (J) (K) L M N O (Ò, Ó) P (Q) R S T U (Ù) (V) (W) (X) (Y) (Z)
Galician
Galician is spoken by over 3 million people in Spain.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | 160 | 160 | 00E1 | |
Á | na* | 181 | 00C1 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ñ | 164 | 164 | 00F1 | |
Ñ | 165 | 165 | 00D1 | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC |
*na = not available as an ASCII code in the specified code page
Galician is alphabetized as follows:
A Á B C D E (É) F G H I (Í) J K L M N Ñ O (Ó) P Q R S T U (Ú Ü) V W X Y Z
German
German is one of the most widely spread languages in Europe, spoken by about 80 million people in Germany, 7.7 million people in Austria, 4.4 million people in Switzerland, 66 000 people in Belgium, 28 500 people in the South Tyrol region of Italy and 29 000 people in Liechtenstein, as well as by smaller minorities in southern Denmark, eastern France and Luxembourg. There are large regional variations in the form of German spoken, especially between Germany, Switzerland and Austria.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ä | ae | 132 | 132 | 00E4 |
Ä | AE | 142 | 142 | 00C4 |
ö | oe | 148 | 148 | 00F6 |
Ö | OE | 153 | 153 | 00D6 |
ü | ue | 129 | 129 | 00FC |
Ü | UE | 154 | 154 | 00DC |
ß | ss | 225 | 225 | 00DF |
The Scharfes S (“ß”) is used only in lower case. The upper case equivalent is SS. This symbol is NOT used in Swiss-German. Its use is also defined by the new German spelling rules, introduced on 1st August 2005, but which are not used in Bavaria or Nordrhein-Westphalia. Where alternate spellings are possible, this has been noted within the text of this book.
It is now allowed to write three consonants together without a hyphen. As this change is recent, it may occur that where three esses come together in German, a hyphen is used as follows:
Strauss-strasse
Instead of
Straussstrasse
All nouns in German are written with the first letter as a capital.
German is alphabetized as follows:
A (Ä) B C D E F G H I J K L M N O (Ö) P Q R S (ß=SS) T U Ü V W X Y Z
Greek
Greek is spoken by about 10 million people in Greece and Cyprus. It has its own alphabet which has to be transliterated for use with databases containing Roman script. The table below gives the transliteration symbol of each Greek character. Each character is given in upper case then lower case version:
Letter | Name | Transliteration | Unicode™ |
Αά | álfa | a | 0391, 03B1 |
Ββ | víta | v | 0392, 03B2 |
Ѓγ | gháma | gh (y before an e sound) | 0393, 03B3 |
Δδ | thélta | th | 0394, 03B4 |
Єε | épsilon | e | 0395, 03B5 |
Ζζ | zíta | z | 0396, 03B6 |
Ηη | íta | i | 0397, 03B7 |
Θθ | thíta | th | 0398, 03B8 |
Ιι | yóta | i | 0399, 03B9 |
Κκ | kápa | k | 039A, 03BA |
Λλ | lámtha | l | 039B, 03BB |
Μμ | mi | m | 039C, 03BC |
Νν | ni | n | 039D, 03BD |
Ξξ | ksi | ks | 039E, 03BE |
Οο | ómikron | o | 039F, 03BF |
Ππ | pi | p | 03A0, 03C0 |
Ρρ | ro | r | 03A1, 03C1 |
Σσς | sigma | s | 03A3, 03C3, 03C2 |
Ττ | taf | t | 03A4, 03C4 |
Υυ | ípsilon | i | 03A5, 03C5 |
Φφ | fi | f | 03A6, 03C6 |
Χχ | hi | h | 03A7, 03C7 |
Ψψ | psi | ps | 03A8, 03C8 |
Ωω | omégha | o | 03A9, 03C9 |
Note that the letter ς is only used at the end of a word.
Greenlandic
Greenlandic is spoken by 40 000 people in Greenland and a further 7 000 people in Denmark.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
K’ | na* | na | 0138 | |
æ | 145 | 145 | 00E6 | |
Æ | 146 | 146 | 00C6 | |
ø | na | 155 | 00F8 | |
Ø | na | 157 | 00D8 | |
å | 134 | 134 | 00E5 | |
Å | 143 | 143 | 00C5 |
*na = not available as an ASCII code in the specified code page
Greenlandic is alphabetized as follows:
A B C D E F G H I J K L M N O P Q (K’) R S T U V W X Y Z Æ Ø Å
Hungarian
Hungarian is spoken by about 10 million people in Hungary, 1.5 million people in Romania, and by minorities in Slovakia, Slovenia, Croatia and Serbia.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | a’ | 160 | 160 | 00E1 |
Á | A’ | na* | 181 | 00C1 |
é | e’ | 130 | 130 | 00E9 |
É | E’ | 144 | 144 | 00C9 |
í | i’ | 161 | 161 | 00ED |
Í | I’ | na | 214 | 00CD |
ó | o’ | 162 | 162 | 00F3 |
Ó | O’ | na | 224 | 00D3 |
ö | 148 | 148 | 00F6 | |
Ö | 153 | 153 | 00D6 | |
ő | na | na* | 0151 | |
Ő | na | na | 0150 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC | |
ű | na | na | 0171 | |
Ű | na | na | 0170 |
*na = not available as an ASCII code in the specified code page
Hungarian is alphabetized as follows:
A Á B C CS D E É F G GY H I Í J K L LY M N NY O Ó Ö Ő P Q R S SZ T TY U Ú Ü Ű V W X Y Z ZS
Icelandic
Icelandic is spoken by 250 000 people in Iceland.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | a | 160 | 160 | 00E1 |
Á | A | na* | 181 | 00C1 |
æ | ae | 145 | 145 | 00E6 |
Æ | AE | 146 | 146 | 00C6 |
ð | d | na | 208 | 00F0 |
Ð | D | na | 209 | 00D0 |
é | e | 130 | 130 | 00E9 |
É | E | 144 | 144 | 00C9 |
í | i | 161 | 161 | 00ED |
Í | I | na | 214 | 00CD |
ó | o | 162 | 162 | 00F3 |
Ó | O | na | 224 | 00D3 |
ö | o | 148 | 148 | 00F6 |
Ö | O | 153 | 153 | 00D6 |
œ | oe | na | na* | 0153 |
Œ | OE | na | na | 0152 |
þ | th | na | 232 | 00FE |
Þ | TH | na | 231 | 00DE |
ú | u | 163 | 163 | 00FA |
Ú | U | na | 233 | 00DA |
ý | y | na | 236 | 00FD |
Ý | Y | na | 237 | 00DD |
*na = not available as an ASCII code in the specified code page
Icelandic is alphabetized as follows:
A Á B C D Ð E É F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý C Þ Æ Ö
Italian
Italian is spoken by about 57 million people in Italy, San Marino and the Holy See and about 500 000 people in Switzerland.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
é | e’ | 130 | 130 | 00E9 |
É | E’ | 144 | 144 | 00C9 |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
í | i’ | 161 | 161 | 00ED |
Í | I’ | na | 214 | 00CD |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
ï | 139 | 139 | 00EF | |
Ï | na | 216 | 00CF | |
ó | o’ | 162 | 162 | 00F3 |
Ó | O’ | na | 224 | 00D3 |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
ú | u’ | 163 | 163 | 00FA |
Ú | U’ | na | 233 | 00DA |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 |
*na = not available as an ASCII code in the specified code page
Italian is alphabetized as follows:
A (À) B C D E (É, È) F G H I (Í, Ì, Ï) J K L M N O (Ó,Ò) P Q R S T U (Ú, Ù) V W X Y Z
Ladin
There are about 30 000 speakers of Ladin in Northeast Italy.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
â | 131 | 131 | 00E2 | |
 | na* | 182 | 00C2 | |
á | 160 | 160 | 00E1 | |
Á | na | 181 | 00C1 | |
à | 133 | 133 | 00E0 | |
À | na | 183 | 00C0 | |
ê | 136 | 136 | 00EA | |
Ê | na | 210 | 00CA | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
î | 140 | 140 | 00EE | |
Î | na | 215 | 00CE | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
ô | 147 | 147 | 00F4 | |
Ô | na | 226 | 00D4 | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
û | 150 | 150 | 00FB | |
Û | na | 234 | 00DB | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 |
*na = not available as an ASCII code in the specified code page
Ladin is alphabetized as follows:
A (Â Á À) B C D E (Ê É È) F G H I (Î Í Ì) J K L M N O (Ô Ó Ò) P Q R S T U (Û Ú Ù) V W X Y Z
Latvian
There are about 1 400 000 speakers of Latvian, mainly in Latvia.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ā | na* | na | 0101 | |
A | na | na | 0100 | |
č | na | na | 010D | |
Č | na | na | 010C | |
ē | na | na | 0113 | |
Ē | na | na | 0112 | |
ģ | na | na | 0121 | |
Ģ | na | na | 0122 | |
ī | na | na | 012B | |
Ī | na | na | 012A | |
ķ | na | na | 0137 | |
Ķ | na | na | 0136 | |
ļ | na | na | 013C | |
Ļ | na | na | 013B | |
ņ | na | na | 0146 | |
Ņ | na | na | 0145 | |
ō | na | na | 014D | |
Ō | na | na | 014C | |
ŗ | na | na | 0157 | |
Ŗ | na | na | 0156 | |
š | na | na | 0161 | |
Š | na | na | 0160 | |
ū | na | na | 016B | |
Ū | na | na | 016A | |
ž | na | na | 017E | |
Ž | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Latvian is alphabetized as follows:
A (A) B C Č D E (Ē) F G Ģģ H I (Ī) J K Ķ L Ļ M N Ņ O (Ō) P (Q) R Ŗ S Š T U (Ū) V (W) (X) (Y) Z Ž
Letzebuergesch
Letzebuergesch, a language related to German, is the official language of Luxembourg and is spoken by some 350 000 people.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
â | 131 | 131 | 00E2 | |
 | na* | 182 | 00C2 | |
ä | 132 | 132 | 00E4 | |
Ä | 142 | 142 | 00C4 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
ë | 137 | 137 | 00EB | |
Ë | na | 211 | 00CB | |
M^ | na | na | 004D+0302 | |
m^ | na | na | 006D+0302 | |
Ň | na | na | 004E+0302 | |
ň | na | na | 006E+0302 | |
ö | 148 | 148 | 00F6 | |
Ö | 153 | 153 | 00D6 | |
ô | 147 | 147 | 00F4 | |
Ô | na | 226 | 00D4 | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC |
*na = not available as an ASCII code in the specified code page
Letzebuergesch is alphabetized as follows:
A (Ä) B C D E (Ë, É) F G H I J K L M (M^) N (Ň) O (Ö) P Q R S T U (Ü) V W X Y Z
Lithuanian
There are almost 3 million speakers of Lithuanian, mainly in Lithuania.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ą | na* | na | 0105 | |
Ą | na | na | 0104 | |
č | na | na | 010D | |
Č | na | na | 010C | |
ę | na | na | 0119 | |
Ę | na | na | 0118 | |
ė | na | na | 0117 | |
Ė | na | na | 0116 | |
į | na | na | 012F | |
Į | na | na | 012E | |
š | na | na | 0161 | |
Š | na | na | 0160 | |
ų | na | na | 0173 | |
Ų | na | na | 0172 | |
ū | na | na | 016B | |
Ū | na | na | 016A | |
ž | na | na | 017E | |
Ž | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Lithuanian is alphabetized as follows:
A (Ą) B C Č D E (Ę Ė) F G H (Į Y) J K L M N O P (Q) R S Š T U (U Ū) V (W)(X) Z Ž
Maltese
Maltese, an ancient Arabic language with strong Romance influences, is spoken by about 400 000 people in Malta.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
ċ | na | na* | 010B | |
Ċ | na | na | 010A | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
ġ | na | na | 0121 | |
Ġ | na | na | 0120 | |
għ | na | na | g+0127 | |
GĦ | na | na | G+0126 | |
ħ | na | na | 0127 | |
Ħ | na | na | 0126 | |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
ĩ | na | na | 0129 | |
Ĩ | na | na | 0128 | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 | |
ż | na | na | 017C | |
Ż | na | na | 017B |
*na = not available as an ASCII code in the specified code page
Maltese is alphabetized as follows:
A (À) B Ċ (C) D E (È) F Ġ G H Ħ I (Ì, Î) J K L M N Għ O (Ò) P Q R S T U (Ù) V W X (Y) Ż Z
Norwegian
Norwegian is spoken by about 4 million people in Norway.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
å | aa | 134 | 134 | 00E5 |
Å | AA | 143 | 143 | 00C5 |
æ | ae | 145 | 145 | 00E6 |
Æ | AE | 146 | 146 | 00C6 |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
ó | 162 | 162 | 00F3 | |
Ó | na* | 224 | 00D3 | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
ø | oe | na | 155 | 00F8 |
Ø | OE | na | 157 | 00D8 |
*na = not available as an ASCII code in the specified code page
Norwegian is alphabetized as follows:
A B C D E É F G H I J K L M N O Ó Ò P Q R S T U V W X Y Z Æ Ø Å
Polish
Polish is spoken by about 35 million people, mainly in Poland.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ą | na* | na* | 0105 | |
Ą | na | na | 0104 | |
ć | c’ | na | na | 0107 |
Ć | C’ | na | na | 0106 |
ę | na | na | 0119 | |
Ę | na | na | 0118 | |
ł | na | na | 0142 | |
Ł | na | na | 0141 | |
ń | n’ | na | na | 0144 |
Ń | N’ | na | na | 0143 |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ś | s’ | na | na | 015B |
Ś | S’ | na | na | 015A |
ź | z’ | na | na | 017A |
Ź | Z’ | na | na | 0179 |
ż | na | na | 017C | |
Ż | na | na | 017B |
*na = not available as an ASCII code in the specified code page
The letters Q, V and X are only used in foreign words.
Where an accent cannot be reproduced, it is usually just dropped. However, an apostrophe or a comma may be added immediately following the accented letter to clarify meaning. For example: was (=to you), wa,s (=moustache).
Opening quotation marks in Polish are usually written on the same level as the text as follows: ,,Hello’’. However most computer systems, even in Poland, do not allow this, and the standard “Western” quotation marks are used.
Polish is alphabetized as follows:
A Ą B C Ć D E Ę F G H I J K L Ł M N Ń O Ó P R S Ś T U W Y Z Ź Ż
Portuguese
Portuguese is spoken by about 10 million people in Portugal and 163 million people in Brazil. There are also many second-language speakers in Portugal’s ex-colonies. In Portugal, there are Northern dialects and Central-Southern dialects.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | 160 | 160 | 00E1 | |
Á | na* | 181 | 00C1 | |
à | 133 | 133 | 00E0 | |
À | na | 183 | 00C0 | |
â | 131 | 131 | 00E2 | |
 | na | 182 | 00C2 | |
ã | na | 198 | 00E3 | |
à | na | 199 | 00C3 | |
ç | 135 | 135 | 00E7 | |
Ç | 128 | 128 | 00C7 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
ê | 136 | 136 | 00EA | |
Ê | na | 210 | 00CA | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
ô | 147 | 147 | 00F4 | |
Ô | na | 226 | 00D4 | |
õ | na | 228 | 00F5 | |
Õ | na | 229 | 00D5 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC |
*na = not available as an ASCII code in the specified code page
Portuguese accents should not be replaced by apostrophes as apostrophes may be used in Portuguese to indicate certain contractions, such as Sant’Ana.
Portuguese is alphabetized as follows:
A (Á, À, Â, Ã) B C (Ç) D E (É, Ê) F G H I (Í) J K L M N O (Ó, Ô, Õ) P Q R S T U (Ú, Ü) V W X Y Z
Provençals
Provençals is a Romance language related to French and Catalan, found in south-eastern France and north-western Italy. It has a number of dialects.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
ç | 135 | 135 | 00E7 | |
Ç | 128 | 128 | 00C7 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 |
*na = not available as an ASCII code in the specified code page
Rhaeto-Romance
Dialects of Rhaeto-Romance are spoken in Switzerland, western Austria and northern Italy by about 500 000 people.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 |
*na = not available as an ASCII code in the specified code page
Rhaeto-Romance is alphabetized as follows:
A (À) B C D E (È,É) F G H I (Ì) J (K) L M N O (Ò) P Q R S T U (Ù) V (W) X (Y) Z
Romanian
Romanian is spoken by about 20 million people in Romania. A similar language to Romanian, written using the Cyrillic alphabet, is spoken in Moldova.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
à | 133 | 133 | 00E0 | |
À | na* | 183 | 00C0 | |
â | 131 | 131 | 00E2 | |
 | na | 182 | 00C2 | |
ă | na | na* | 0103 | |
Ă | na | na | 0102 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
î | 140 | 140 | 00EE | |
Î | na | 215 | 00CE | |
ş | na | na | 015F | |
Ş | na | na | 015E | |
ţ | na | na | 0163 | |
Ţ | na | na | 0162 | |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 |
*na = not available as an ASCII code in the specified code page
Romanian is alphabetized as follows:
A Â Ă B C D E F G H I Î J K L M N O P (Q) R S Ş T Ţ U V W X (Y) Z
Romany
Romany is the language of Europe’s Roma people, and knows many dialects and forms, each strongly influenced by the indigenous languages of the region which its speakers inhabit.
Apostrophes are used in Romany words.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
è | na* | na* | 010D | |
š | na | na | 0161 | |
ž | na | na | 017E |
*na = not available as an ASCII code in the specified code page
Russian
Russian is spoken by about 143 million people in Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine & Uzbekistan. It has its own alphabet which has to be transliterated for use with databases containing Roman script. Each character is given in upper case then lower case version:
Letter | Transliteration | Unicode™ |
Аa | a | 0410, 0430 |
Бб | b | 0411, 0431 |
Вв | v | 0412, 0432 |
Гг | g | 0413, 0433 |
Дд | d | 0414, 0434 |
Ее | e | 0415, 0435 |
Ёё | yo | 0401, 0451 |
Жж | su (as in leisure) | 0416, 0436 |
Зз | z | 0417, 0437 |
Ии | ee | 0418, 0438 |
Йй | i | 0419, 0439 |
Кк | k | 041A, 043A |
Лл | l | 041B, 043B |
Мм | m | 041C, 043C |
Нн | n | 041D, 043D |
Оо | o | 041E, 043E |
Пп | p | 041F, 043F |
Рр | r | 0420, 0440 |
Сс | s | 0421, 0441 |
Тт | t | 0422, 0442 |
Уу | u (oo) | 0423, 0443 |
Фф | f | 0424, 0444 |
Хх | ch (as in loch) | 0425, 0445 |
Цц | ts | 0426, 0446 |
Чч | ch | 0427, 0447 |
Шш | sh | 0428,0448 |
Щщ | shch | 0429, 0449 |
Ъъ | hard sign (not pronounced) | 042A, 044A |
Ыы | iy | 042B, 044B |
Ьь | soft sign (not pronounced) | 042C, 044C |
Ээ | e | 042D, 044D |
Юю | yoo | 042E, 044E |
Яя | ya | 042F, 044F |
Russian is alphabetized as in the table above.
Sámi
Sámi is spoken in the north of Norway, Sweden, Finland and Russia by some 35 000 people.
There are around ten versions of Sámi [1], depending on definition: Kildin (using the Cyrillic script); Akkala, Inari, Lule, Northern, Pite, Skoty, Southern, Ter and Ume (using the Latin script).
This table encompasses all diacritical characters used for those using the Latin script.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | 160 | 160 | 00E1 | |
Á | na* | 181 | 00C1 | |
å | 134 | 134 | 00E5 | |
Å | 143 | 143 | 00C5 | |
ä | 132 | 132 | 00E4 | |
Ä | 142 | 142 | 00C4 | |
â | 131 | 131 | 00E2 | |
 | na | 182 | 00C2 | |
č | na | na* | 010D | |
Č | na | na | 010C | |
Ð | na | 209 | 0110 | |
ð | na | na | 0111 | |
ń | na | na | 0144 | |
Ń | na | na | 0143 | |
ŋ | na | na | 014B | |
Ŋ | na | na | 014A | |
ö | 148 | 148 | 00F6 | |
Ö | 153 | 153 | 00D6 | |
õ | na | 228 | 00F5 | |
Õ | na | 229 | 00D5 | |
ø | na | 155 | 00F8 | |
Ø | na | 157 | 00D8 | |
š | na | na | 0161 | |
Š | na | na | 0160 | |
Ŧ | na | na | 0166 | |
ŧ | na | na | 0167 | |
ž | na | na | 017E | |
Ž | na | na | 017D | |
3V | na | na | 01EE |
Diacritical characters borrowed from Finnish, Swedish and Norwegian are also used.
*na = not available as an ASCII code in the specified code page
Slovak
Spoken by about 5 million people, mainly in Slovakia. Slovak is usually written in the Cyrillic script. When transliterated to Latin characters, loss of the diacritical marks should be avoided.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
á | 160 | 160 | 00E1 | |
Á | na* | 181 | 00C1 | |
č | na | na* | 010D | |
Č | na | na | 010C | |
ď or dv | d’ | na | na | 010F |
Ď | na | na | 010E | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
ě | na | na | 011B | |
Ě | na | na | 011A | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ĺ | l’ | na | na | 013A |
Ĺ | L’ | na | na | 0139 |
ň | na | na | 0148 | |
Ň | na | na | 0147 | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ř | na | na | 0159 | |
Ř | na | na | 0158 | |
š | na | na | 0161 | |
Š | na | na | 0160 | |
ť or tv | t’ | na | na | 0165 |
Ť | na | na | 0164 | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ů | na | na | 016F | |
Ů | na | na | 016E | |
ý | na | na | 00FD | |
Ý | na | na | 00DD | |
ž | na | na | 017E | |
Ž | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Slovak is alphabetized as follows:
A Á B C Č D Ď E É Ě F G H CH I Í J K L M N Ň O Ó P Q R Ř S Š T Ť U Ú Ů V W X Y Ý Z Ž
Slovenian
Slovenian is spoken by about 1.5 million people in Slovenia and small parts of Hungary and Italy.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
č | ch | na* | na* | 010D |
Č | Ch | na | na | 010C |
š | sh | na | na | 0161 |
Š | Sh | na | na | 0160 |
ž | zh | na | na | 017E |
Ž | Zh | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Slovenian is alphabetized as follows:
A B C Č D E F G H I J K L M N O P (Q) R S Š T U V (W) (X) (Y) Z Ž
Sorbian
Sorbian is a Slavic language spoken by some 50 000 people in two distinct dialects in the south-easternmost part of the former East Germany.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ć | na* | na* | 0107 | |
Ć | na | na | 0106 | |
č | na | na | 010D | |
Č | na | na | 010C | |
ě | na | na | 011B | |
Ě | na | na | 011A | |
ł | na | na | 0142 | |
Ł | na | na | 0141 | |
ń | na | na | 0144 | |
Ń | na | na | 0143 | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ř | na | na | 0159 | |
Ř | na | na | 0158 | |
ś | na | na | 015B | |
Ś | na | na | 015A | |
š | na | na | 0161 | |
Š | na | na | 0160 | |
ź | na | na | 017A | |
Ź | na | na | 0179 | |
ž | na | na | 017E | |
Ž | na | na | 017D |
*na = not available as an ASCII code in the specified code page
Upper Sorbian is alphabetized as follows:
A B C Č D DŹ E Ě F G H CH I J K Ł L M N Ń O Ó P R Ř S Š T Ć U W X Y Z Ž
Lower Sorbian is alphabetized as follows:
A B Bj C Č Ć D DŹ E Ě F G H CH I J K L Ł M MJ N Ń NJ O P PJ (Q) R Ŕ RJ S Ś Š TŚ TŠ T U (V) W WJ (X) Y Z Ź Ž
Spanish
Spanish is spoken by about 27.5 million people in Spain, 81 million people in Central America, 18 million people in the Caribbean, 90 million people in South America, 22 million people in the United States of America and by many others as a second language, especially in Spain’s ex-colonies.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
Á | a | 160 | 160 | 00E1 |
Á | A | na* | 181 | 00C1 |
É | e | 130 | 130 | 00E9 |
É | E | 144 | 144 | 00C9 |
Í | i | 161 | 161 | 00ED |
Í | I | na | 214 | 00CD |
Ñ | n | 164 | 164 | 00F1 |
Ñ | N | 165 | 165 | 00D1 |
Ó | o | 162 | 162 | 00F3 |
Ó | O | na | 224 | 00D3 |
Ú | u | 163 | 163 | 00DA |
Ú | U | na | 233 | 00FA |
Ü | u | 129 | 129 | 00FC |
Ü | U | 154 | 154 | 00DC |
*na = not available as an ASCII code in the specified code page
In Spanish, abbreviations of plurals are doubled. For example, the Spanish abbreviations for the United States is EE.UU. The full stops must be added after each double-letter, never between them.
Spanish is alphabetized as follows:
A (Á) B C CH D E (É) F G H I (Í) J K L LL M N Ñ O (Ó) P Q R RR S T U (Ú, Ü) V W X Y Z
A law passed in Spain has removed “ch” and “ll” as separate letters of the Spanish alphabet, though most Spanish-speakers still use them as such.
Swedish
Swedish is spoken by about 8 million people in Sweden and about 300 000 people in western and southern Finland.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
Å | 134 | 134 | 00E5 | |
Å | 143 | 143 | 00C5 | |
Ä | ae | 132 | 132 | 00C4 |
Ä | AE | 142 | 142 | 00E4 |
É | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
Ö | oe | 148 | 148 | 00F6 |
Ö | OE | 153 | 153 | 00D6 |
Ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC |
*na = not available as an ASCII code in the specified code page
Swedish is alphabetized as follows:
A B C D E (É) F G H I J K L M N O P Q R S T U V (W) X Y (Ü) Z Å Ä Ö
Note: in Swedish, in certain grammatical forms, a colon is used in contractions and abbreviations where an apostrophe (or nothing) might be used in English.
Turkish
Turkish is spoken by about 57 million people in Turkey, and by minorities in Greece, Bulgaria and Cyprus.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
â | 131 | 131 | 00E2 | |
 | na* | 182 | 00C2 | |
ç | 135 | 135 | 00E7 | |
Ç | 128 | 128 | 00C7 | |
ğ | na | na* | 011F | |
Ğ | na | na | 011E | |
İ | na | na | 0130 | |
ı | na | na | 0131 | |
î | 140 | 140 | 00EE | |
Î | na | 215 | 00CE | |
ö | 148 | 148 | 00D6 | |
Ö | 153 | 153 | 00F6 | |
ş | na | na | 015F | |
Ş | na | na | 015E | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC | |
Û | 150 | 150 | 00FB | |
Û | na | 234 | 00DB |
*na = not available as an ASCII code in the specified code page
The distinction between the dotted and dotless I is very important. They are distinct letters and the pronunciation of both are very different. Avoid the temptation of replacing a dotless i with a dotted i.
Turkish is alphabetized as follows:
A (Â) B C Ç D E F G Ğ H Iı İi J K L M N O Ö P (Q) R S Ş T U Ü Û V (W) (X) Y Z
Welsh
Welsh is spoken by about 600 000 people in Wales.
There are no accepted alternatives to accented characters in Welsh.
Letter | Alternative | ASCII code (page 437) | ASCII code (page 850) | Unicode™ |
ä | 132 | 132 | 00E4 | |
Ä | 142 | 142 | 00C4 | |
â | 131 | 131 | 00E2 | |
 | na* | 182 | 00C2 | |
á | 160 | 160 | 00E1 | |
Á | na | 181 | 00C1 | |
à | 133 | 133 | 00E0 | |
À | na | 183 | 00C0 | |
ë | 137 | 137 | 00EB | |
Ë | na | 211 | 00CB | |
ê | 136 | 136 | 00EA | |
Ê | na | 210 | 00CA | |
é | 130 | 130 | 00E9 | |
É | 144 | 144 | 00C9 | |
è | 138 | 138 | 00E8 | |
È | na | 212 | 00C8 | |
ï | 139 | 139 | 00EC | |
Ï | na | 216 | 00CF | |
î | 140 | 140 | 00EE | |
Î | na | 215 | 00CE | |
í | 161 | 161 | 00ED | |
Í | na | 214 | 00CD | |
ì | 141 | 141 | 00EC | |
Ì | na | 222 | 00CC | |
ö | 148 | 148 | 00F6 | |
Ö | 153 | 153 | 00D6 | |
ô | 147 | 147 | 00F4 | |
Ô | na | 226 | 00D4 | |
ó | 162 | 162 | 00F3 | |
Ó | na | 224 | 00D3 | |
ò | 149 | 149 | 00F2 | |
Ò | na | 227 | 00D2 | |
û | 150 | 150 | 00FB | |
Û | na | 234 | 00DB | |
ü | 129 | 129 | 00FC | |
Ü | 154 | 154 | 00DC | |
ú | 163 | 163 | 00FA | |
Ú | na | 233 | 00DA | |
ù | 151 | 151 | 00F9 | |
Ù | na | 235 | 00D9 | |
ŵ | na | na* | 0175 | |
Ŵ | na | na | 0174 | |
ẅ | na | na | 0077+0308 | |
Ẅ | na | na | 0057+0308 | |
ẃ | na | na | 0077+0301 | |
Ẃ | na | na | 0057+0301 | |
ẁ | na | na | 0077+0300 | |
Ẁ | na | na | 0057+0300 | |
v.. | na | na | 0076+0308 | |
V.. | na | na | 0056+0308 | |
v^ | na | na | 0076+0302 | |
V^ | na | na | 0056+0302 | |
ŷ | na | na | 0177 | |
Ŷ | na | na | 0176 | |
ÿ | na | na | 00FF | |
Ÿ | na | na | 0178 | |
ý | na | na | 00FD | |
Ý | na | na | 00DD | |
ỳ | na | na | 0079+0300 | |
Ỳ | na | na | 0059+0300 |
*na = not available as an ASCII code in the specified code page
The Welsh alphabet has 31 letters and is alphabetized as follows:
A (Â, Á, À, Ä) B C CH D DD E (Ê, É, È, Ë) F FF G NG H I (Î, Í, Ì, Ï) J K L LL M N O (Ô, Ó, Ò, Ö) P PH (Q) R RH S T TH U (Û, Ú, Ù, Ü) (V) W (Ŵ, Ẃ, Ẁ, Ẅ) (X) Y (Ŷ, Ý, Ỳ, Ÿ) Z
Every effort is made to keep this resource updated. If you find any errors, or have any questions or requests, please don't hesitate to contact the author.
All information copyright Graham Rhind 2024. Any information used should be acknowledged and referenced.