_____ _____ Diacritical marks

Global Sourcebook for International Data Management

                                         by Graham Rhind

_____


____

This resource costs about € 500 per month to host and maintain. Help me to keep it updated and free for all by donating as little as € 1 at Ko-Fi here. Alternatively, use one of these links to make purchases at Amazon.com or Amazon.co.uk, for which I receive a small remuneration at no cost to yourself. Thank you
____

Diacritical marks

Global Sourcebook | Index | Properties

image

Tips for practical management of data in different languages External

Whether you hold your address database information in DOS, Windows™ or on another platform; whether your data is for a single country or for many; you will be certain to be faced with the challenge of storing diacritical marks (accents).

Most of the World’s languages (including English) contain one or more diacritical marks. These marks indicate that the letter has a different phonetic value or is stressed differently to the same letter without the mark. Though the marks may appear strange (and possibly irrelevant) to non-speakers of the language concerned, they are required to ensure that the text remains accurate and logical.

The first computer operating systems used a set of character codes called ASCII (American Standard Code for Information Interchange). These are reproduced by holding down the ALT key and typing a 1-, 2- or 3-number code on the key pad. Limited at that time by computer hard- and software to 256 characters, in no way can this system represent each character used in the World’s more than 6 000 languages. To overcome this limitation, a number of code pages were created - each of which were supposed to contain sufficient letters with diacritical marks to represent the alphabets of at least one language.

The most common code page used by English-speakers is 437. This contains the following letters with diacritical marks:

   á, â, Ä, ä, à, Å, å, Æ, æ, Ç, ç, ê, ë, è, É, é, ï, î, ì, í, ñ, Ñ, ô, Ö, ö, ò, ó, ß, Ü, ü, û, ù, ú, ÿ

This is sufficient for a number of Western-European and Scandinavian languages, but there are serious omissions. A number of languages in the same group, such as Icelandic, are not represented. Certain letters in Scandinavian languages, such as the "Ø" in Danish and Norwegian are missing. Even for French, the letter “Œ” cannot be reproduced. Equally, a number of upper-case equivalents of lower-case letters with diacritical marks cannot be reproduced using this code page.

Code page 850 supplements code page 437 with the following diacritical marks:

   Á, Â, À, ã, Ã, ð, Ð, Ê, Ë, È, Í, Ï,  Ì, Ó, Ô, Ò, õ, Õ, Þ, þ, Ú, Û, Ù, ý, Ý.

Further code pages are 852 (Slavic languages/Hungarian); 860 (Portuguese); and 865 (Scandinavian).

Whilst this system allows for the most part for the storage of data for a single country or language area, problems arise when trying to store data from more than one country or language area. Whilst using code page 850, for example, will allow the storage of (most) addresses for Western Europe, it cannot allow the correct storage of addresses from Eastern Europe, even for those languages written in the Latin script. In Hungarian, for example, code page 850 misses ő, Ő,ű and Ű. For Czech, the following letters with diacritical marks are missing: Č, č, ď, Ď, Ě, ě, Í, Ň, ň, ř, Ř, Š, ś, ť, Ť, Ů, ů, ý, Ý, ž, and Ž. Clearly, no accurate representation of the language can succeed without these characters.

On the DOS/WindowsTM platform, a character can look different on different code pages but retain the same code value. Of the Icelandic street type HLÍÐ, for example, only the HL will appear correctly when represented using code page 437, though the ASCII codes of its constituent letters remain the same. Changing the code page to 850 will allow this word to be represented correctly.

Changing the code page is done differently under different operating platforms - refer to your documentation. In DOS/Windows™ systems, the code page (for screen output) is set in the autoexec.bat file by adding the lines:

   MODE CON CODEPAGE PREPARE=((nnn) [drive path filename])
   MODE CON CODEPAGE SELECT=nnn

where nnn is the code page number and drive path filename is the file containing the code page information (having the extension .CPI). For example, for code page 850, the lines might read:

   MODE CON CODEPAGE PREPARE=((850) C:\WINDOWS\COMMAND\EGA.CPI)
   MODE CON CODEPAGE SELECT=850

To check which code page is active, type

   MODE

at the DOS prompt.

ANSI codes (American National Standard Institute) are used with systems such as Microsoft Windows™. Each ANSI page allows the use of fonts which in turn allow for the reproduction of 224 different characters. This allows the reproduction of most of the characters required for the World’s languages in such systems as word-processing programs by changing the font within the document. There are, however, few database programs which allow fonts to be changed within fields or within tables. These programs take as default the ANSI page defined by the underlying Windows™ operating system.

Whilst ANSI allows the output of letters with diacritical marks (assuming a compliant printer and font type), the large problem of data transfer between platforms and software remains. Transferring data between one ASCII page and another; one ANSI page and another; or between ASCII and ANSI pages will necessarily change the appearance (and in the last case the value) of the letter that you are trying to reproduce. This can cause immense damage to the data and make correct storage and output impossible.

An alternative methodology for storing character values is The Unicode™ Standard. This avoids the limitations of hardware, platform and software by not attempting to represent a diacritic using a single character but instead using a four-character code (composed of standard alpha-numeric characters with the same values regardless of code page) to indicate which diacritic mark is to be represented. This code in turn is usually enclosed between other symbols (e.g. <00C1>) to distinguish from the rest of the text. This code system is increasingly being included in new operating systems and software. Here are a number of Unicode™ values:

Á 00C1
á 00E1
à 00C3
É 00C9
Ñ 00D1
Ò 00D2
Ð 00F8

Unfortunately, I can only draw your attention to the issue of diacritical marks without providing a solution for reproducing these marks for all languages within a single database system. To the best of my knowledge, an all encompassing solution does not yet exist. Aim, however, to use operating systems and software which have Unicode™ support so that you will in the near future be able to handle diacritical marks better than is currently possible.

Accents

The tables below cover a large number of European languages and some other languages written in Latin script.

The tables show the diacritical marks contained in the language concerned, the alternative ‘non-ASCII’ equivalent where available (i.e. an alternative without diacritical marks), the ASCII code of the accented letter (if available) in code pages 437 and 850, and the Unicode™ value.

Whilst it is not always possible to type the correctly accented letter using ASCII, accented letters can be output on most printers using alternative ASCII codes. Thus, for example, typing the code to produce a + symbol would, by using an alternative font set, be output as an accented character. To do this, however, means that you have to maintain a consistent platform for your output devices and you will have to use a different printing set for each language to be printed as all accents cannot be covered in a single set.

NB: Alphabetization- in the alphabetization lists, letters between brackets immediately following another letter indicate that this letter is included in the sort for the previous letter. I.e., it is given the same value as the preceding letter for sorting purposes. Those between brackets on their own are letters borrowed from other languages and are sorted, when found, in the position shown. When two “letters” are shown in upper case, these letters form a single letter in the language concerned. The alphabetization tables include only upper case versions of letters - when a lower case form of a letter differs markedly in appearance to its upper case equivalent, the lower case version is printed immediately next to the upper case version.

Albanian

Spoken by about 4 million people, in Albania, Kosovo, Macedonia, Serbia, Montenegro, Bulgaria, Romania and Italy.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
â   131 131 00E2
   na* 182 00C2
ç   135 135 00E7
Ç   128 128 00C7
ë   137 137 00EB
Ë   na 211 00CB

*na = not available as an ASCII code in the specified code page

Albanian is alphabetized as follows:

   A B C Ç D E Ë F G H I J K L M N O P Q R S T U V W X Y Z

Basque

Basque is spoken by some 700 000 people in the Spanish regions of País-Vasco and Navarra, and in south-eastern France.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ç   135 135 00E7
Ç   128 128 00C7
é   130 130 00E9
É   144 144 00C9
ñ   164 164 00F1
Ñ   165 165 00D1
ü   129 129 00FC
Ü   154 154 00DC

Basque is alphabetized as follows:

   A B C (Ç) D E F G H I J K L M N Ñ O P Q R S T U (Ü) V W X Y Z

Breton

Breton, a Celtic language, is spoken in the westernmost parts of Brittany, France, by some 600 000 people.

Apostrophes are used within words in Breton.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
â   131 131 00E2
   na* 182 00C2
ê   136 136 00EA
Ê   na 210 00CA
ñ   164 164 00F1
Ñ   165 165 00D1
ù   151 151 00F9
Ù   na 235 00D9
ü   129 129 00FC
Ü   154 154 00DC

*na = not available as an ASCII code in the specified code page

Breton is alphabetized as follows:

   A (Â) B (C) Ch C’h D E (Ê) F G H I J K L M N Ñ O P (Q) R S T U (Ù,Ü) V W (X) Y Z

Catalan

There are about 10 540 000 speakers of Catalan - in Spain it is spoken by some 5 980 000 people in Catalonia, 3 350 000 people in Valencia (where it is called Valencian and is considered by some a distinct language), 755 000 people in the Balearic Islands, 48 000 people in the eastern part of Aragon and 2 000 people in Murcia. It is the national language of Andorra and is spoken there by 38 000 people. There are 330 000 speakers in Roussillon in south-eastern France and 37 000 speakers in the town of Alghero (L’Alguer) on Sardinia, Italy.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
ç   135 135 00E7
Ç   128 128 00C7
é   130 130 00E9
É   144 144 00C9
è   138 138 00E8
È   na 212 00C8
í   161 161 00ED
Í   na 214 00CD
ï   139 139 00EF
Ï   na 216 00CF
ŀ   na na 0140
Ŀ   na na 013F
ó   162 162 00F3
Ó   na 224 00D3
ò   149 149 00F2
Ò   na 227 00D2
ú   163 163 00FA
Ú   na 233 00DA

*na = not available as an ASCII code in the specified code page

In Catalan, abbreviations of plurals are doubled. For example, the Catalan abbreviations for the United States is EE.UU. The full stops must be added after each double-letter, never between them.

Catalan is alphabetized as follows:

   A (À) B C (Ç) D E (É, È) F G H I (Í, Ï) J K L Ŀl M N O (Ó, Ò) P Q R S T U (Ú, Ü) V W X Y Z

In Catalan a middle dot (“·”) may appear within words between ells to indicate differences in pronunciation. “ll” will indicate a single sound as in Spanish. “l·l” will indicate an ell sound more akin to the English. For example “Paral·lel”, “Col·legi”. This dot may be found in databases as a “.” or a hyphen (“-”). It should not be removed.

Croatian

Although Serbian and Croat are basically the same language, the former is written in Cyrillic script, the latter in Roman. Croatian is spoken by about 6 million people in Croatia, Slovenia, Bosnia-Hercegovina and Serbia.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
č cc na* na* 010D
Č Cc na na 010C
ć ch na na 0107
Ć Ch na na 0106
ð dj na na 0111
Ð Dj na na 0110
š sh na na 0161
Š Sh na na 0160
ž zz na na 017E
Ž Zh na na 017D

*na = not available as an ASCII code in the specified code page

Croatian is alphabetized as follows:

   A B C Ć Č D Ð E F G H I J K L M N O P Q R S Š T U V (W) (X) (Y) Z Ž

Czech

Czech is spoken by about 10 million people in the Czechia / Czech Republic.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á   160 160 00E1
Á   na* 181 00C1
č   na na* 010D
Č   na na 010C
ď   na na 010F
Ď   na na 010E
é   130 130 00E9
É   144 144 00C9
ě   na na 011B
Ě   na na 011A
í   161 161 00ED
Í   na 214 00CD
ň   na na 0148
Ň   na na 0147
ó   162 162 00F3
Ó   na 224 00D3
ř   na na 0159
Ř   na na 0158
š   na na 0161
Š   na na 0160
ť   na na 0165
Ť   na na 0164
ú   163 163 00FA
Ú   na 233 00DA
ů   na na 016F
Ů   na na 016E
ý   na 236 00FD
Ý   na 237 00DD
ž   na na 017E
Ž   na na 017D

*na = not available as an ASCII code in the specified code page

Czech is alphabetized as follows:

   A Á B C Č D Ď E É Ě F G H Ch I Í J K L M N Ň O Ó P Q R Ř S Š T Ťť U Ú Ů V W X Y Ý Z

Danish

Danish is spoken by 5 million people in Denmark, as well as some inhabitants of the Faeroe Islands / Faroe Islands, Greenland and northern Germany.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á   160 160 00E1
Á   na* 181 00C1
å aa 134 134 00E5
Å AA 143 143 00C5
æ ae 145 145 00E6
Æ AE 146 146 00C6
é   130 130 00E9
É   144 144 00C9
í   161 161 00ED
Í   na 214 00CD
ó   162 162 00F3
Ó   na 224 00D3
ø oe na 155 00F8
Ø OE na 157 00D8
ú   163 163 00FA
Ú   na 233 00DA
ý   na Na 00FD
Ý   na Na 00DD

*na = not available as an ASCII code in the specified code page

Note: aa was officially replaced by å in 1948 in all words, but aa remains allowed in place names such as Aalborg and in personal names.

Danish is alphabetized as follows:

   A (Á) B C D E (É) F G H I (Í) J K L M N O (Ó) P Q R S T U (Ú) V W X Y (Ý) Z Æ Ø Å (AA)

Dutch

Dutch is spoken by about 14 million people in The Netherlands and about 5 million Belgians. There is a small Dutch-speaking minority in northern France.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
æ ae 145 145 00E6
Æ AE 146 146 00C6
ä a 132 132 00E4
Ä A 142 142 00C4
ë e 137 137 00EB
Ë E na* 211 00CB
é e 130 130 00E9
É E 144 144 00C9
è e 138 138 00E8
È E na 212 00C8
ê e 136 136 00EA
Ê E na 210 00CA
ï i 139 139 00EF
Ï I na 216 00CF
ö o 148 148 00F6
Ö O 153 153 00D6
ó o 162 162 00F3
Ó O na 224 00D3
ò o 149 149 00F2
Ò O na 227 00D2
ô o 147 147 00F4
Ô O na 226 00D4
ü u 129 129 00FC
Ü U 154 154 00DC
ij ij or y 152 152 0133
IJ IJ or Y na na* 0132

*na = not available as an ASCII code in the specified code page

NB Only ë and é and their upper case equivalents are commonly found in Dutch. Note that the letter ij is a single letter in the Dutch alphabet, coming between y and z, but it is always, without exception, typed as two letters - i and j - in normal usage, or it is written as a y. You should also do this. Note, however, that when these occur at the beginning of a real noun, both the I and the J must be in upper case, i.e. Krimpen aan de IJssel.

Dutch is alphabetized as follows:

   A (Ä) B C D E (Ë) F G H I (Ï) J K L M N O (Ö) P Q R S T U (Ü)V W X Y IJ Z

English

English is spoken by about 322 000 000 throughout the World, in many countries as a second language. Spoken in the United Kingdom, Ireland, the United States of America, Canada, Australia, New Zealand, South Africa and many former British colonies.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à a 133 133 00E0
À A na* 183 00C0
ç c 135 135 00E7
Ç C 128 128 00C7
ë e 137 137 00EB
Ë E na 211 00CB
é e 130 130 00E9
É E 144 144 00C9
è e 138 138 00E8
È E na 212 00C8
ê e 136 136 00EA
Ê E na 210 00CA
ï i 139 139 00EC
Ï I na 216 00CF
ñ n 164 164 00F1
Ñ N 165 165 00D1
ô o 147 147 00F4
Ô O na 226 00D4
ö o 148 148 00F6
Ö O 153 153 00D6

*na = not available as an ASCII code in the specified code page

English words are often written without diacritical marks because of ignorance, but many words, especially of French origin, such as façade, rôle, éclair, belovèd, naïve and so on, should correctly be written using a diacritic mark.

English is alphabetized as follows:

   A (À) B C (Ç) D E (É È Ê Ë) F G H I (Ï) J K L M N (Ñ) O (Ö Ô) P Q R S T U V W X Y Z

Estonian

Estonian is spoken by almost a million people in Estonia, Russia and Latvia.

It is written in the Latin, not the Cyrillic, script.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ä   132 132 00E4
Ä   142 142 00C4
ö   148 148 00F6
Ö   153 153 00D6
õ   na* 228 00F5
Õ   na 229 00D5
š   na na* 0161
Š   na na 0160
ü   129 129 00FC
Ü   154 154 00DC
ž   na na 017E
Ž   na na 017D

*na = not available as an ASCII code in the specified code page

Estonian is alphabetized as follows:

   A B (C) D E F G H I J K L M N O P (Q) R S Š Z Ž T U V (W) (X) Ō Ā Ö Ü (Y)

Faroese

Faroese is spoken by most of the 40 000 inhabitants of the Faeroe Islands / Faroe Islands. It is related to Icelandic and resembles old Norse.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á   160 160 00E1
Á   na* 181 00C1
æ   145 145 00E6
Æ   146 146 00C6
ð   na 208 00F0
Ð   na 209 00D0
í   161 161 00ED
Í   na 214 00CD
ó   162 162 00F3
Ó   na 224 00D3
ø   na 155 00F8
Ø   na 157 00D8
ú   163 163 00FA
Ú   na 233 00DA
ý   na 236 00FD
Ý   na 237 00DD

*na = not available as an ASCII code in the specified code page

Faroese is alphabetized as follows:

   A Á B C D Ð E F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý Z Æ Ø

Finnish

Finnish is spoken by about 4.5 million people in Finland, and by about 50,000 people in Russia and 30,000 in northern Sweden.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ä a 132 132 00E4
Ä A 142 142 00C4
å a 134 134 00E5
Å A 143 143 00C5
ö o 148 148 00F6
Ö O 153 153 00D6
š s na* na 0161
Š S na na 0160
ž z na na 017E
Ž Z na na 017D

*na = not available as an ASCII code in the specified code page

Finnish is alphabetized as follows:

   A B (C) D E F G H I J K L M N O P (Q) R S Š T U V (W) (X) Y Z Ž (Å) Ä Ö

French

French is spoken by about 72 million people throughout the World; 56 million people in France and Monaco, 6 million people in Canada, 4 million people in Belgium, 3 million people in Switzerland, 1 million in the United States of America and about 300 000 people in Luxembourg. It is also spoken, often as a second language, by inhabitants of France’s ex-colonies.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à a 133 133 00E0
À A na* 183 00C0
â a 131 131 00E2
 A na 182 00C2
æ ae 145 145 00E6
Æ AE 146 146 00C6
ç c 135 135 00E7
Ç C 128 128 00C7
ë e 137 137 00EB
Ë E na 211 00CB
é e 130 130 00E9
É E 144 144 00C9
è e 138 138 00E8
È E na 212 00C8
ê e 136 136 00EA
Ê E na 210 00CA
î i 140 140 00EE
Î I na 215 00CE
ï i 139 139 00EF
Ï I na 216 00CF
ô o 147 147 00F4
Ô O na 226 00D4
œ oe na na* 0153
Œ OE na na 0152
ù u 151 151 00F9
Ù U na 235 00D9
û u 150 150 00FB
Û U na 234 00DB
ü u 129 129 00FC
Ü U 154 154 00DC
ÿ y na na 00FF
Ÿ Y na na 0178

*na = not available as an ASCII code in the specified code page

French-speakers rarely assign accents to upper-case letters. Some listings will simply omit accents in upper-case letters, others will use the lower-case accented equivalent even where the rest of the word is in upper case.

French is alphabetized as follows:

   A (Â, À, Æ) B C (Ç) D E (É, Ê, È, Ë) F G H I (Î, Ï) J K L M N O (Ô, Œ) P Q R S T U (Û, Ù) V W X Y (Ÿ) Z

Friesian or Frisian (West)

There are about 300,000 speakers of West Friesian in the province of Friesland in the northern Netherlands.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
â   131 131 00E1
   na* 182 00C2
ä   132 132 00E4
Ä   142 142 00C4
é   130 130 00E9
É   144 144 00C9
ê   136 136 00EA
Ê   Na 210 00CA
ë   137 137 00EB
Ë   Na 211 00CB
ï   139 139 00EF
Ï   Na 216 00CF
ô   147 147 00F4
Ô   Na 226 00DA
ö   148 148 00F6
Ö   153 153 00D6
û   150 150 00FB
Û   Na 234 00DB
ú   163 163 00FA
Ú   Na 233 00DA
ü   129 129 00FC
Ü   154 154 00DC

*na = not available as an ASCII code in the specified code page

West Friesian is alphabetized as follows:

   A (Â, Ä) B C D E (É, Ê, Ë) F G H I (Ï, Y) J K L M N O (Ô, Ö) P Q R S T U (Ú, Û, Ü) V W X Z

Friulian

There are some 600 000 speakers of Friulian in Northeast Italy and a small number in Slovenia.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
á   160 160 00E1
Á   Na 181 00C1
â   131 131 00E2
   Na 182 00C2
è   138 138 00E8
È   Na 212 00C8
ì   141 141 00EC
Ì   Na 222 00CC
ò   149 149 00F2
Ò   Na 227 00D2
ù   151 151 00F9
Ù   Na 235 00D9

*na = not available as an ASCII code in the specified code page

Friulian is alphabetized as follows:

   A À Á Â B C D E È F G H I Ì J K L M N O Ò P Q R S T U (Ù) V W X Y Z

Gaelic

Gaelic is spoken in two distinct varieties in Scotland and Ireland. It has almost 20 000 speakers in the former and some 500 000 in the latter.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
á   160 160 00E1
Á   na 181 00C1
é   130 130 00E9
É   144 144 00C9
è   138 138 00E8
È   na 212 00C8
í   161 161 00ED
Í   na 214 00CD
ì   141 141 00EC
Ì   na 222 00CC
ó   162 162 00F3
Ó   na 224 00D3
ò   149 149 00F2
Ò   na 227 00D2
ú   163 163 00FA
Ú   na 233 00DA
ù   151 151 00F9
Ù   na 235 00D9

*na = not available as an ASCII code in the specified code page

Irish is alphabetized as follows:

   A (Á) B C D E (É) F G H I (Í) J (K) L M N O (Ó) P Q R S T U (Ú) V W X Y Z

The Scottish Gaelic alphabet has 18 letters and is alphabetized as follows:

   A (À, Á) B C D E (È, É) F G H I (Ì) (J) (K) L M N O (Ò, Ó) P (Q) R S T U (Ù) (V) (W) (X) (Y) (Z)

Galician

Galician is spoken by over 3 million people in Spain.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á   160 160 00E1
Á   na* 181 00C1
é   130 130 00E9
É   144 144 00C9
í   161 161 00ED
Í   na 214 00CD
ñ   164 164 00F1
Ñ   165 165 00D1
ó   162 162 00F3
Ó   na 224 00D3
ú   163 163 00FA
Ú   na 233 00DA
ü   129 129 00FC
Ü   154 154 00DC

*na = not available as an ASCII code in the specified code page

Galician is alphabetized as follows:

   A Á B C D E (É) F G H I (Í) J K L M N Ñ O (Ó) P Q R S T U (Ú Ü) V W X Y Z

German

German is one of the most widely spread languages in Europe, spoken by about 80 million people in Germany, 7.7 million people in Austria, 4.4 million people in Switzerland, 66 000 people in Belgium, 28 500 people in the South Tyrol region of Italy and 29 000 people in Liechtenstein, as well as by smaller minorities in southern Denmark, eastern France and Luxembourg. There are large regional variations in the form of German spoken, especially between Germany, Switzerland and Austria.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ä ae 132 132 00E4
Ä AE 142 142 00C4
ö oe 148 148 00F6
Ö OE 153 153 00D6
ü ue 129 129 00FC
Ü UE 154 154 00DC
ß ss 225 225 00DF

The Scharfes S (“ß”) is used only in lower case. The upper case equivalent is SS. This symbol is NOT used in Swiss-German. Its use is also defined by the new German spelling rules, introduced on 1st August 2005, but which are not used in Bavaria or Nordrhein-Westphalia. Where alternate spellings are possible, this has been noted within the text of this book.

It is now allowed to write three consonants together without a hyphen. As this change is recent, it may occur that where three esses come together in German, a hyphen is used as follows:

   Strauss-strasse

Instead of

   Straussstrasse

All nouns in German are written with the first letter as a capital.

German is alphabetized as follows:

   A (Ä) B C D E F G H I J K L M N O (Ö) P Q R S (ß=SS) T U Ü V W X Y Z

Greek

Greek is spoken by about 10 million people in Greece and Cyprus. It has its own alphabet which has to be transliterated for use with databases containing Roman script. The table below gives the transliteration symbol of each Greek character. Each character is given in upper case then lower case version:

Letter Name Transliteration Unicode™
Αά álfa a 0391, 03B1
Ββ víta v 0392, 03B2
Ѓγ gháma gh (y before an e sound) 0393, 03B3
Δδ thélta th 0394, 03B4
Єε épsilon e 0395, 03B5
Ζζ zíta z 0396, 03B6
Ηη íta i 0397, 03B7
Θθ thíta th 0398, 03B8
Ιι yóta i 0399, 03B9
Κκ kápa k 039A, 03BA
Λλ lámtha l 039B, 03BB
Μμ mi m 039C, 03BC
Νν ni n 039D, 03BD
Ξξ ksi ks 039E, 03BE
Οο ómikron o 039F, 03BF
Ππ pi p 03A0, 03C0
Ρρ ro r 03A1, 03C1
Σσς sigma s 03A3, 03C3, 03C2
Ττ taf t 03A4, 03C4
Υυ ípsilon i 03A5, 03C5
Φφ fi f 03A6, 03C6
Χχ hi h 03A7, 03C7
Ψψ psi ps 03A8, 03C8
Ωω omégha o 03A9, 03C9

Note that the letter ς is only used at the end of a word.

Greenlandic

Greenlandic is spoken by 40 000 people in Greenland and a further 7 000 people in Denmark.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
K’   na* na 0138
æ   145 145 00E6
Æ   146 146 00C6
ø   na 155 00F8
Ø   na 157 00D8
å   134 134 00E5
Å   143 143 00C5

*na = not available as an ASCII code in the specified code page

Greenlandic is alphabetized as follows:

   A B C D E F G H I J K L M N O P Q (K’) R S T U V W X Y Z Æ Ø Å

Hungarian

Hungarian is spoken by about 10 million people in Hungary, 1.5 million people in Romania, and by minorities in Slovakia, Slovenia, Croatia and Serbia.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á a’ 160 160 00E1
Á A’ na* 181 00C1
é e’ 130 130 00E9
É E’ 144 144 00C9
í i’ 161 161 00ED
Í I’ na 214 00CD
ó o’ 162 162 00F3
Ó O’ na 224 00D3
ö   148 148 00F6
Ö   153 153 00D6
ő   na na* 0151
Ő   na na 0150
ú   163 163 00FA
Ú   na 233 00DA
ü   129 129 00FC
Ü   154 154 00DC
ű   na na 0171
Ű   na na 0170

*na = not available as an ASCII code in the specified code page

Hungarian is alphabetized as follows:

   A Á B C CS D E É F G GY H I Í J K L LY M N NY O Ó Ö Ő P Q R S SZ T TY U Ú Ü Ű V W X Y Z ZS

Icelandic

Icelandic is spoken by 250 000 people in Iceland.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á a 160 160 00E1
Á A na* 181 00C1
æ ae 145 145 00E6
Æ AE 146 146 00C6
ð d na 208 00F0
Ð D na 209 00D0
é e 130 130 00E9
É E 144 144 00C9
í i 161 161 00ED
Í I na 214 00CD
ó o 162 162 00F3
Ó O na 224 00D3
ö o 148 148 00F6
Ö O 153 153 00D6
œ oe na na* 0153
Œ OE na na 0152
þ th na 232 00FE
Þ TH na 231 00DE
ú u 163 163 00FA
Ú U na 233 00DA
ý y na 236 00FD
Ý Y na 237 00DD

*na = not available as an ASCII code in the specified code page

Icelandic is alphabetized as follows:

   A Á B C D Ð E É F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý C Þ Æ Ö

Italian

Italian is spoken by about 57 million people in Italy, San Marino and the Holy See and about 500 000 people in Switzerland.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
é e’ 130 130 00E9
É E’ 144 144 00C9
è   138 138 00E8
È   na 212 00C8
í i’ 161 161 00ED
Í I’ na 214 00CD
ì   141 141 00EC
Ì   na 222 00CC
ï   139 139 00EF
Ï   na 216 00CF
ó o’ 162 162 00F3
Ó O’ na 224 00D3
ò   149 149 00F2
Ò   na 227 00D2
ú u’ 163 163 00FA
Ú U’ na 233 00DA
ù   151 151 00F9
Ù   na 235 00D9

*na = not available as an ASCII code in the specified code page

Italian is alphabetized as follows:

   A (À) B C D E (É, È) F G H I (Í, Ì, Ï) J K L M N O (Ó,Ò) P Q R S T U (Ú, Ù) V W X Y Z

Ladin

There are about 30 000 speakers of Ladin in Northeast Italy.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
â   131 131 00E2
   na* 182 00C2
á   160 160 00E1
Á   na 181 00C1
à   133 133 00E0
À   na 183 00C0
ê   136 136 00EA
Ê   na 210 00CA
é   130 130 00E9
É   144 144 00C9
è   138 138 00E8
È   na 212 00C8
î   140 140 00EE
Î   na 215 00CE
í   161 161 00ED
Í   na 214 00CD
ì   141 141 00EC
Ì   na 222 00CC
ô   147 147 00F4
Ô   na 226 00D4
ó   162 162 00F3
Ó   na 224 00D3
ò   149 149 00F2
Ò   na 227 00D2
û   150 150 00FB
Û   na 234 00DB
ú   163 163 00FA
Ú   na 233 00DA
ù   151 151 00F9
Ù   na 235 00D9

*na = not available as an ASCII code in the specified code page

Ladin is alphabetized as follows:

   A (Â Á À) B C D E (Ê É È) F G H I (Î Í Ì) J K L M N O (Ô Ó Ò) P Q R S T U (Û Ú Ù) V W X Y Z

Latvian

There are about 1 400 000 speakers of Latvian, mainly in Latvia.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ā   na* na 0101
A   na na 0100
č   na na 010D
Č   na na 010C
ē   na na 0113
Ē   na na 0112
ģ   na na 0121
Ģ   na na 0122
ī   na na 012B
Ī   na na 012A
ķ   na na 0137
Ķ   na na 0136
ļ   na na 013C
Ļ   na na 013B
ņ   na na 0146
Ņ   na na 0145
ō   na na 014D
Ō   na na 014C
ŗ   na na 0157
Ŗ   na na 0156
š   na na 0161
Š   na na 0160
ū   na na 016B
Ū   na na 016A
ž   na na 017E
Ž   na na 017D

*na = not available as an ASCII code in the specified code page

Latvian is alphabetized as follows:

   A (A) B C Č D E (Ē) F G Ģģ H I (Ī) J K Ķ L Ļ M N Ņ O (Ō) P (Q) R Ŗ S Š T U (Ū) V (W) (X) (Y) Z Ž

Letzebuergesch

Letzebuergesch, a language related to German, is the official language of Luxembourg and is spoken by some 350 000 people.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
â   131 131 00E2
   na* 182 00C2
ä   132 132 00E4
Ä   142 142 00C4
é   130 130 00E9
É   144 144 00C9
ë   137 137 00EB
Ë   na 211 00CB
M^   na na 004D+0302
m^   na na 006D+0302
Ň   na na 004E+0302
ň   na na 006E+0302
ö   148 148 00F6
Ö   153 153 00D6
ô   147 147 00F4
Ô   na 226 00D4
ü   129 129 00FC
Ü   154 154 00DC

*na = not available as an ASCII code in the specified code page

Letzebuergesch is alphabetized as follows:

   A (Ä) B C D E (Ë, É) F G H I J K L M (M^) N (Ň) O (Ö) P Q R S T U (Ü) V W X Y Z

Lithuanian

There are almost 3 million speakers of Lithuanian, mainly in Lithuania.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ą   na* na 0105
Ą   na na 0104
č   na na 010D
Č   na na 010C
ę   na na 0119
Ę   na na 0118
ė   na na 0117
Ė   na na 0116
į   na na 012F
Į   na na 012E
š   na na 0161
Š   na na 0160
ų   na na 0173
Ų   na na 0172
ū   na na 016B
Ū   na na 016A
ž   na na 017E
Ž   na na 017D

*na = not available as an ASCII code in the specified code page

Lithuanian is alphabetized as follows:

   A (Ą) B C Č D E (Ę Ė) F G H (Į Y) J K L M N O P (Q) R S Š T U (U Ū) V (W)(X) Z Ž

Maltese

Maltese, an ancient Arabic language with strong Romance influences, is spoken by about 400 000 people in Malta.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
ċ   na na* 010B
Ċ   na na 010A
è   138 138 00E8
È   na 212 00C8
ġ   na na 0121
Ġ   na na 0120
  na na g+0127
  na na G+0126
ħ   na na 0127
Ħ   na na 0126
ì   141 141 00EC
Ì   na 222 00CC
ĩ   na na 0129
Ĩ   na na 0128
ò   149 149 00F2
Ò   na 227 00D2
ù   151 151 00F9
Ù   na 235 00D9
ż   na na 017C
Ż   na na 017B

*na = not available as an ASCII code in the specified code page

Maltese is alphabetized as follows:

   A (À) B Ċ (C) D E (È) F Ġ G H Ħ I (Ì, Î) J K L M N Għ O (Ò) P Q R S T U (Ù) V W X (Y) Ż Z

Norwegian

Norwegian is spoken by about 4 million people in Norway.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
å aa 134 134 00E5
Å AA 143 143 00C5
æ ae 145 145 00E6
Æ AE 146 146 00C6
é   130 130 00E9
É   144 144 00C9
ó   162 162 00F3
Ó   na* 224 00D3
ò   149 149 00F2
Ò   na 227 00D2
ø oe na 155 00F8
Ø OE na 157 00D8

*na = not available as an ASCII code in the specified code page

Norwegian is alphabetized as follows:

   A B C D E É F G H I J K L M N O Ó Ò P Q R S T U V W X Y Z Æ Ø Å

Polish

Polish is spoken by about 35 million people, mainly in Poland.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ą   na* na* 0105
Ą   na na 0104
ć c’ na na 0107
Ć C’ na na 0106
ę   na na 0119
Ę   na na 0118
ł   na na 0142
Ł   na na 0141
ń n’ na na 0144
Ń N’ na na 0143
ó   162 162 00F3
Ó   na 224 00D3
ś s’ na na 015B
Ś S’ na na 015A
ź z’ na na 017A
Ź Z’ na na 0179
ż   na na 017C
Ż   na na 017B

*na = not available as an ASCII code in the specified code page

The letters Q, V and X are only used in foreign words.

Where an accent cannot be reproduced, it is usually just dropped. However, an apostrophe or a comma may be added immediately following the accented letter to clarify meaning. For example: was (=to you), wa,s (=moustache).

Opening quotation marks in Polish are usually written on the same level as the text as follows: ,,Hello’’. However most computer systems, even in Poland, do not allow this, and the standard “Western” quotation marks are used.

Polish is alphabetized as follows:

   A Ą B C Ć D E Ę F G H I J K L Ł M N Ń O Ó P R S Ś T U W Y Z Ź Ż

Portuguese

Portuguese is spoken by about 10 million people in Portugal and 163 million people in Brazil. There are also many second-language speakers in Portugal’s ex-colonies. In Portugal, there are Northern dialects and Central-Southern dialects.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á   160 160 00E1
Á   na* 181 00C1
à   133 133 00E0
À   na 183 00C0
â   131 131 00E2
   na 182 00C2
ã   na 198 00E3
à   na 199 00C3
ç   135 135 00E7
Ç   128 128 00C7
é   130 130 00E9
É   144 144 00C9
è   138 138 00E8
È   na 212 00C8
ê   136 136 00EA
Ê   na 210 00CA
í   161 161 00ED
Í   na 214 00CD
ì   141 141 00EC
Ì   na 222 00CC
ó   162 162 00F3
Ó   na 224 00D3
ò   149 149 00F2
Ò   na 227 00D2
ô   147 147 00F4
Ô   na 226 00D4
õ   na 228 00F5
Õ   na 229 00D5
ú   163 163 00FA
Ú   na 233 00DA
ü   129 129 00FC
Ü   154 154 00DC

*na = not available as an ASCII code in the specified code page

Portuguese accents should not be replaced by apostrophes as apostrophes may be used in Portuguese to indicate certain contractions, such as Sant’Ana.

Portuguese is alphabetized as follows:

   A (Á, À, Â, Ã) B C (Ç) D E (É, Ê) F G H I (Í) J K L M N O (Ó, Ô, Õ) P Q R S T U (Ú, Ü) V W X Y Z

Provençals

Provençals is a Romance language related to French and Catalan, found in south-eastern France and north-western Italy. It has a number of dialects.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
ç   135 135 00E7
Ç   128 128 00C7
é   130 130 00E9
É   144 144 00C9
è   138 138 00E8
È   na 212 00C8
ó   162 162 00F3
Ó   na 224 00D3

*na = not available as an ASCII code in the specified code page

Rhaeto-Romance

Dialects of Rhaeto-Romance are spoken in Switzerland, western Austria and northern Italy by about 500 000 people.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
è   138 138 00E8
È   na 212 00C8
é   130 130 00E9
É   144 144 00C9
ì   141 141 00EC
Ì   na 222 00CC
ò   149 149 00F2
Ò   na 227 00D2
ù   151 151 00F9
Ù   na 235 00D9

*na = not available as an ASCII code in the specified code page

Rhaeto-Romance is alphabetized as follows:

   A (À) B C D E (È,É) F G H I (Ì) J (K) L M N O (Ò) P Q R S T U (Ù) V (W) X (Y) Z

Romanian

Romanian is spoken by about 20 million people in Romania. A similar language to Romanian, written using the Cyrillic alphabet, is spoken in Moldova.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
à   133 133 00E0
À   na* 183 00C0
â   131 131 00E2
   na 182 00C2
ă   na na* 0103
Ă   na na 0102
è   138 138 00E8
È   na 212 00C8
ì   141 141 00EC
Ì   na 222 00CC
î   140 140 00EE
Î   na 215 00CE
ş   na na 015F
Ş   na na 015E
ţ   na na 0163
Ţ   na na 0162
ù   151 151 00F9
Ù   na 235 00D9

*na = not available as an ASCII code in the specified code page

Romanian is alphabetized as follows:

   A Â Ă B C D E F G H I Î J K L M N O P (Q) R S Ş T Ţ U V W X (Y) Z

Romany

Romany is the language of Europe’s Roma people, and knows many dialects and forms, each strongly influenced by the indigenous languages of the region which its speakers inhabit.

Apostrophes are used in Romany words.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
è   na* na* 010D
š   na na 0161
ž   na na 017E

*na = not available as an ASCII code in the specified code page

Russian

Russian is spoken by about 143 million people in Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine & Uzbekistan. It has its own alphabet which has to be transliterated for use with databases containing Roman script. Each character is given in upper case then lower case version:

Letter Transliteration Unicode™
Аa a 0410, 0430
Бб b 0411, 0431
Вв v 0412, 0432
Гг g 0413, 0433
Дд d 0414, 0434
Ее e 0415, 0435
Ёё yo 0401, 0451
Жж su (as in leisure) 0416, 0436
Зз z 0417, 0437
Ии ee 0418, 0438
Йй i 0419, 0439
Кк k 041A, 043A
Лл l 041B, 043B
Мм m 041C, 043C
Нн n 041D, 043D
Оо o 041E, 043E
Пп p 041F, 043F
Рр r 0420, 0440
Сс s 0421, 0441
Тт t 0422, 0442
Уу u (oo) 0423, 0443
Фф f 0424, 0444
Хх ch (as in loch) 0425, 0445
Цц ts 0426, 0446
Чч ch 0427, 0447
Шш sh 0428,0448
Щщ shch 0429, 0449
Ъъ hard sign (not pronounced) 042A, 044A
Ыы iy 042B, 044B
Ьь soft sign (not pronounced) 042C, 044C
Ээ e 042D, 044D
Юю yoo 042E, 044E
Яя ya 042F, 044F

Russian is alphabetized as in the table above.

Sámi

Sámi is spoken in the north of Norway, Sweden, Finland and Russia by some 35 000 people.

There are around ten versions of Sámi [1], depending on definition: Kildin (using the Cyrillic script); Akkala, Inari, Lule, Northern, Pite, Skoty, Southern, Ter and Ume (using the Latin script).

This table encompasses all diacritical characters used for those using the Latin script.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á   160 160 00E1
Á   na* 181 00C1
å   134 134 00E5
Å   143 143 00C5
ä   132 132 00E4
Ä   142 142 00C4
â   131 131 00E2
   na 182 00C2
č   na na* 010D
Č   na na 010C
Ð   na 209 0110
ð   na na 0111
ń   na na 0144
Ń   na na 0143
ŋ   na na 014B
Ŋ   na na 014A
ö   148 148 00F6
Ö   153 153 00D6
õ   na 228 00F5
Õ   na 229 00D5
ø   na 155 00F8
Ø   na 157 00D8
š   na na 0161
Š   na na 0160
Ŧ   na na 0166
ŧ   na na 0167
ž   na na 017E
Ž   na na 017D
3V   na na 01EE

Diacritical characters borrowed from Finnish, Swedish and Norwegian are also used.

*na = not available as an ASCII code in the specified code page

Slovak

Spoken by about 5 million people, mainly in Slovakia. Slovak is usually written in the Cyrillic script. When transliterated to Latin characters, loss of the diacritical marks should be avoided.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
á   160 160 00E1
Á   na* 181 00C1
č   na na* 010D
Č   na na 010C
ď or dv d’ na na 010F
Ď   na na 010E
é   130 130 00E9
É   144 144 00C9
ě   na na 011B
Ě   na na 011A
í   161 161 00ED
Í   na 214 00CD
ĺ l’ na na 013A
Ĺ L’ na na 0139
ň   na na 0148
Ň   na na 0147
ó   162 162 00F3
Ó   na 224 00D3
ř   na na 0159
Ř   na na 0158
š   na na 0161
Š   na na 0160
ť or tv t’ na na 0165
Ť   na na 0164
ú   163 163 00FA
Ú   na 233 00DA
ů   na na 016F
Ů   na na 016E
ý   na na 00FD
Ý   na na 00DD
ž   na na 017E
Ž   na na 017D

*na = not available as an ASCII code in the specified code page

Slovak is alphabetized as follows:

   A Á B C Č D Ď E É Ě F G H CH I Í J K L M N Ň O Ó P Q R Ř S Š T Ť U Ú Ů V W X Y Ý Z Ž

Slovenian

Slovenian is spoken by about 1.5 million people in Slovenia and small parts of Hungary and Italy.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
č ch na* na* 010D
Č Ch na na 010C
š sh na na 0161
Š Sh na na 0160
ž zh na na 017E
Ž Zh na na 017D

*na = not available as an ASCII code in the specified code page

Slovenian is alphabetized as follows:

   A B C Č D E F G H I J K L M N O P (Q) R S Š T U V (W) (X) (Y) Z Ž 

Sorbian

Sorbian is a Slavic language spoken by some 50 000 people in two distinct dialects in the south-easternmost part of the former East Germany.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ć   na* na* 0107
Ć   na na 0106
č   na na 010D
Č   na na 010C
ě   na na 011B
Ě   na na 011A
ł   na na 0142
Ł   na na 0141
ń   na na 0144
Ń   na na 0143
ó   162 162 00F3
Ó   na 224 00D3
ř   na na 0159
Ř   na na 0158
ś   na na 015B
Ś   na na 015A
š   na na 0161
Š   na na 0160
ź   na na 017A
Ź   na na 0179
ž   na na 017E
Ž   na na 017D

*na = not available as an ASCII code in the specified code page

Upper Sorbian is alphabetized as follows:

   A B C Č D DŹ E Ě F G H CH I J K Ł L M N Ń O Ó P R Ř S Š T Ć U W X Y Z Ž

Lower Sorbian is alphabetized as follows:

   A B Bj C Č Ć D DŹ E Ě F G H CH I J K L Ł M MJ N Ń NJ O P PJ (Q) R Ŕ RJ S Ś Š TŚ TŠ T U (V) W WJ (X) Y Z Ź Ž

Spanish

Spanish is spoken by about 27.5 million people in Spain, 81 million people in Central America, 18 million people in the Caribbean, 90 million people in South America, 22 million people in the United States of America and by many others as a second language, especially in Spain’s ex-colonies.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
Á a 160 160 00E1
Á A na* 181 00C1
É e 130 130 00E9
É E 144 144 00C9
Í i 161 161 00ED
Í I na 214 00CD
Ñ n 164 164 00F1
Ñ N 165 165 00D1
Ó o 162 162 00F3
Ó O na 224 00D3
Ú u 163 163 00DA
Ú U na 233 00FA
Ü u 129 129 00FC
Ü U 154 154 00DC

*na = not available as an ASCII code in the specified code page

In Spanish, abbreviations of plurals are doubled. For example, the Spanish abbreviations for the United States is EE.UU. The full stops must be added after each double-letter, never between them.

Spanish is alphabetized as follows:

   A (Á) B C CH D E (É) F G H I (Í) J K L LL M N Ñ O (Ó) P Q R RR S T U (Ú, Ü) V W X Y Z

A law passed in Spain has removed “ch” and “ll” as separate letters of the Spanish alphabet, though most Spanish-speakers still use them as such.

Swedish

Swedish is spoken by about 8 million people in Sweden and about 300 000 people in western and southern Finland.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
Å   134 134 00E5
Å   143 143 00C5
Ä ae 132 132 00C4
Ä AE 142 142 00E4
É   130 130 00E9
É   144 144 00C9
Ö oe 148 148 00F6
Ö OE 153 153 00D6
Ü   129 129 00FC
Ü   154 154 00DC

*na = not available as an ASCII code in the specified code page

Swedish is alphabetized as follows:

   A B C D E (É) F G H I J K L M N O P Q R S T U V (W) X Y (Ü) Z Å Ä Ö

Note: in Swedish, in certain grammatical forms, a colon is used in contractions and abbreviations where an apostrophe (or nothing) might be used in English.

image

Turkish

Turkish is spoken by about 57 million people in Turkey, and by minorities in Greece, Bulgaria and Cyprus.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
â   131 131 00E2
   na* 182 00C2
ç   135 135 00E7
Ç   128 128 00C7
ğ   na na* 011F
Ğ   na na 011E
İ   na na 0130
ı   na na 0131
î   140 140 00EE
Î   na 215 00CE
ö   148 148 00D6
Ö   153 153 00F6
ş   na na 015F
Ş   na na 015E
ü   129 129 00FC
Ü   154 154 00DC
Û   150 150 00FB
Û   na 234 00DB

*na = not available as an ASCII code in the specified code page

The distinction between the dotted and dotless I is very important. They are distinct letters and the pronunciation of both are very different. Avoid the temptation of replacing a dotless i with a dotted i.

Turkish is alphabetized as follows:

   A (Â) B C Ç D E F G Ğ H Iı İi J K L M N O Ö P (Q) R S Ş T U Ü Û V (W) (X) Y Z

Welsh

Welsh is spoken by about 600 000 people in Wales.

There are no accepted alternatives to accented characters in Welsh.

Letter Alternative ASCII code (page 437) ASCII code (page 850) Unicode™
ä   132 132 00E4
Ä   142 142 00C4
â   131 131 00E2
   na* 182 00C2
á   160 160 00E1
Á   na 181 00C1
à   133 133 00E0
À   na 183 00C0
ë   137 137 00EB
Ë   na 211 00CB
ê   136 136 00EA
Ê   na 210 00CA
é   130 130 00E9
É   144 144 00C9
è   138 138 00E8
È   na 212 00C8
ï   139 139 00EC
Ï   na 216 00CF
î   140 140 00EE
Î   na 215 00CE
í   161 161 00ED
Í   na 214 00CD
ì   141 141 00EC
Ì   na 222 00CC
ö   148 148 00F6
Ö   153 153 00D6
ô   147 147 00F4
Ô   na 226 00D4
ó   162 162 00F3
Ó   na 224 00D3
ò   149 149 00F2
Ò   na 227 00D2
û   150 150 00FB
Û   na 234 00DB
ü   129 129 00FC
Ü   154 154 00DC
ú   163 163 00FA
Ú   na 233 00DA
ù   151 151 00F9
Ù   na 235 00D9
ŵ   na na* 0175
Ŵ   na na 0174
  na na 0077+0308
  na na 0057+0308
  na na 0077+0301
  na na 0057+0301
  na na 0077+0300
  na na 0057+0300
v..   na na 0076+0308
V..   na na 0056+0308
v^   na na 0076+0302
V^   na na 0056+0302
ŷ   na na 0177
Ŷ   na na 0176
ÿ   na na 00FF
Ÿ   na na 0178
ý   na na 00FD
Ý   na na 00DD
  na na 0079+0300
  na na 0059+0300

*na = not available as an ASCII code in the specified code page

The Welsh alphabet has 31 letters and is alphabetized as follows:

   A (Â, Á, À, Ä) B C CH D DD E (Ê, É, È, Ë) F FF G NG H I (Î, Í, Ì, Ï) J K L LL M N O (Ô, Ó, Ò, Ö) P PH (Q) R RH S T TH U (Û, Ú, Ù, Ü) (V) W (Ŵ, Ẃ, Ẁ, Ẅ) (X) Y (Ŷ, Ý, Ỳ, Ÿ) Z

References

  1. ^ Ludvig Solnør 20221218; Ethnologue https://www.ethnologue.com/browse/names/s External 20221219; Wikipedia https://en.wikipedia.org/wiki/Sámi_languages External 20221219

Every effort is made to keep this resource updated. If you find any errors, or have any questions or requests, please don't hesitate to contact the author.

All information copyright Graham Rhind 2024. Any information used should be acknowledged and referenced.