LISTSERV Maestro 8.1-4 Help

Forward >> << Back Table Of Contents

Character Encoding Settings

In addition to the most commonly used character encodings (or, in short, "encodings") US-ASCII and ISO-8859-1 (West European, Latin 1), LISTSERV Maestro also supports the use of various other encodings. The information presented here outlines the different encodings available. If you are not familiar with encodings and their usage, please read below for an introduction to the topic.

LISTSERV Maestro allows you to choose among the encodings listed in the table below for encoding email messages. All of the encodings have their advantages and disadvantages; therefore, make sure that you carefully consider which encoding to use. See below for possible pitfalls.

Standard:
US-ASCII American / English, contains the common letters, digits and characters.
US-ASCII is contained in all the following encodings, meaning that in all those encodings, the values 0-127 map to the same characters, the ones defined by US-ASCII.
ISO-8859-1 West European, Latin 1 - adds characters for the more common West European languages to US-ASCII
ISO-8859-2 East European, Latin 2 - adds characters for the Central and East European languages to US-ASCII
ISO-8859-3 South European, Latin 3 - adds characters for the South European languages to US-ASCII
ISO-8859-4 North European, Latin 4 - adds characters for the North European languages to US-ASCII
 
Advanced:
ISO-8859-5 Cyrillic - adds the Cyrillic characters to US-ASCII
ISO-8859-6 Arabic - adds the basic Arabic alphabet to US-ASCII
ISO-8859-7 Greek - adds the Greek characters to US-ASCII
ISO-8859-8 Hebrew - adds the Hebrew characters to US-ASCII
ISO-8859-9 Turkish - very similar to ISO-8859-1, but replaces some rarely used characters with Turkish ones
ISO-8859-15 Same as ISO-8859-1, but replaces the international currency symbol '¤' with the Euro symbol '€'
GB-2312 Simplified Chinese - mostly used in mainland China and Singapore
Big5 Traditional Chinese - mostly used in Taiwan and Hong Kong
ISO-2022-JP Japanese
EUC-JP Japanese
Shift-JIS Japanese
EUC-KR Korean
KS-5601 Korean
UTF-8 International Unicode encoding, in UTF-8 format.
Unicode is the most complete encoding. Where US-ASCII contains only 128 different characters and the ISO-8859 encodings contain 256, Unicode contains many, many thousands, making room for most of the characters of the world, including Asian characters and symbol characters, in a single encoding.
 
LISTSERV Maestro determines optimal encoding automatically (but not Unicode)
LISTSERV Maestro will automatically choose either US-ASCII, one of the ISO-8859 encodings, one of the chinese encodings or EUC-JP or Shift-JIS, depending on which characters are actually used in the mail. It will not choose UTF-8.
Note: Due to its definition, the encoding ISO-2022-JP is never chosen automatically. If your mail is viewed best with this encoding, you have to select it manually.
LISTSERV Maestro determines optimal encoding automatically (allow Unicode)
LISTSERV Maestro will automatically choose either US-ASCII or any of the other encodings (ignoring ISO-2022-JP, see above), or even UTF-8, depending on which characters are actually used in the mail.

Pitfalls to Consider When Choosing an Encoding


Mail Merge and Encodings

If LISTSERV Maestro uses a certain encoding because the user has selected a specific encoding, or has told LISTSERV Maestro to determine the encoding automatically, then the entire email message, in all its copies to all its recipients, will be sent using this same encoding.

This can create problems when using mail merge in conjunction with certain recipients types:


A Short Introduction to Encodings

What is an encoding? Why are they used?

Computers store all information as numbers, not letters and texts. Reading only numbers is extremely difficult for human beings; therefore, encodings have been introduced. An encoding (also called character set, character encoding, code page, or character page) is simply a table that matches numbers to letters, or more precisely, characters. This matching of numbers to characters is called mapping.

An example of mapping for the US-ASCII encoding is the number 65 represents the letter 'A', 66 represents the letter 'B', and so on. 97 represents 'a', 98 represents 'b', and so on. Not only letters are represented, but also digits (49 stands for '1'), punctuation marks (46 for '.'), and other characters. The @-character has the value 64 assigned.

When you give a computer a sequence of numbers like 77, 97, 105, and 108 and tell it that these numbers map to characters from the US-ASCII encoding, then the computer will determine that it is supposed to display these four numbers as the character string "Mail" on your screen.

Apart from US-ASCII, which only maps the numbers 0-127 to characters, there are many other encodings. The most widely used ones in the western hemisphere are the encodings from the ISO-8859 family, each defines a mapping of the numbers 0-255.

ISO-8859-1, the so called "Latin 1" encoding for West European languages, contains all sorts of "special" characters that are required by various European languages, such as 'ö' and 'ä' used in German and various Scandinavian languages, or 'é' and 'ç' used in French. In comparison, ISO-8859-7 is for the Greek language, and contains all the Greek letters, such as 'α' and 'β'.

What all ISO-8859 encodings have in common is that they contain the US-ASCII mapping, meaning that the numbers 0-127 are mapped to exactly the same characters as in US-ASCII. The remaining numbers, 128-255, are used to include all the special characters for the language or group of languages for which they are designed.

Different ISO-8859 encodings map the numbers 128-255 to different characters. For example, ISO-8859-1 maps the number 225 to the French character 'á' while the same number in the ISO-8859-7 encoding means the Greek character 'α'. As a result, simply giving a computer the number 225 and telling it that it is supposed to be a character is not enough. The computer also needs to know which character set to choose the character from. When computers transfer data between themselves, including mail transfer, one computer sends mail to another computer by sending a sequence of numbers. The receiving computer needs to know which encoding to use to map these numbers back to characters, so that the correct characters are displayed to the user.

Therefore, when doing mail transfers, the mail needs to be augmented with information about which encoding to use to interpret the numbers back to characters. The sending computer can determine this encoding (as LISTSERV Maestro does if you choose the LISTSERV Maestro determines optimal encoding automatically option), but there may be unforeseen problems that would cause the computer to select an encoding that is not really the best choice (see above for the drawbacks associated with the automatic choices).

To provide the most flexibility, LISTSERV Maestro offers you the option of defining which encoding to use for encoding your email. Alternatively, you can choose to let LISTSERV Maestro select the encoding, but be aware of the potential problems, (see above).

© 2002-2017 L-Soft Sweden AB. All rights reserved.