![]() |
Implementation of 8-bit coded Character Sets in Ada |
![]() |
|||||||||
Implementation of 8-bit coded Character Sets in AdaStrohmeier Alfred, Genillard Christian, Weber Mats Published in Ada Letters (1990), Vol. 10, No. 6, pp. 47-60 Swiss Federal Institute of Technology Keywords: character set, ISO, ASCII, Ada. Summary: A general frame is proposed for implementing 8-bit coded character sets in Ada in conformance with current international standards. This frame is then applied to what is termed Latin alphabet No. 1 which covers Western European languages. Also shown is how the basic needs for text input-output can be satisfied by generalizing the predefined package TEXT_IO. 1. IntroductionPure English texts may be recorded and processed using the well-known ASCII graphic characters alone. But other European languages, although written in the Latin alphabet, need more characters, some being letters on their own as the German sharp s or the Icelandic thorn, others being letters with a diacritical mark, or even special characters such as the Spanish "À". For all these languages, and indeed also for English when foreign words are part of a text, there is a strong need to define more extensive character sets. Moreover, the same holds for languages using non-Latin alphabets, such as Cyrillic, Arabic, Greek or Hebrew. In this paper we will show how 8-bit coded character sets for these alphabets can be implemented in Ada in conformance with international standards. We will first propose a general frame and then apply it for implementing Latin alphabet No. 1. We claim that the international acceptance of Ada and its overall usefulness would be improved if secondary standards on 8-bit coded character sets were agreed on. This paper is an attempt in this direction. Waiting for Ada-9X, which has still a long way to go, is not a solution in our opinion. The definition of character sets using 16-bit, or even 32-bit codes is beyond the scope of this paper. Such character sets are essential for nonalphabetical Far Eastern languages and for accommodating several 8-bit alphabets within a unique character set, but the difficulties are greater than may seem at first (e.g. see 3.2). In section 2, we will summarize some international standards in the field of character sets, and show how they may be consolidated for defining 8-bit coded character sets. Section 3 presents our design choices and the rationale behind them. The specification of a package implementing Latin alphabet No. 1 is postponed to the appendix, but section 4 compares this character set with those of "real" machines. Finally, in section 6 we propose a generic input-output package which is a close parent of the predefined package TEXT_IO but which is applicable to a large range of 8-bit coded character sets. 2. Overview of Standards2.1. ISO definitionsThe terminology used in this paper is that of the International Organization for Standardization (ISO) in the field of coded character sets. A bit combination is an ordered set of bits that represents a character. A character is a member of a set of elements used for the organization, control or representation of data. A coded character set or code is a set of unambiguous rules that establishes a character set and the one-to-one relationship between each character of the set and its coded representation. A code table shows the character allocated to each bit combination in a code. A control function is an action that affects the recording, processing, transmission or interpretation of data. A control function whose coded representation consists of a single combination of bits is called a control character. A graphic character is a character, other than a control function, that has visual representation normally handwritten, printed or displayed. 2.2. Structure of an 8-bit coded character setISO 4873 specifies the structure of 8-bit coded character sets, but does not define a single code table. The bits of the bit combinations of an 8-bit code are identified by b8, b7, b6, b5, b4, b3, b2 and b1 where b8 is the most significant bit and b1 the least significant bit. The 256 positions of a 8-bit code table may be arranged in 16 columns and 16 rows, both numbered from 00 to 15 (fig. 1). The column number is gotten by interpreting the 4 most significant bits as a number (these bits are given the weights 8, 4, 2 and 1 respectively), whereas interpreting the 4 least significant bits yields the row number. A bit combination or a code table position may therefore be identified by notations of the form column number / row number. Within the 8-bit code, ISO 4873 identifies subsets designated as C0, C1, G0 and G1. Roughly speaking C0 (columns 00 and 01) and C1 (columns 08 and 09) contain control characters, whereas G0 (columns 02 to 07) and G1 (columns 10 to 15) contain graphic characters. The following special cases may be noticed: Bit combinations 00/14 and 00/15 must be unused, the control character ESCAPE must be allocated to bit combination 01/11, bit combination 07/15 represents the control character DELETE, and the character SPACE, represented by bit combination 02/00, may be interpreted as a control character, a graphic character, or both. 2.3. Mapping the ASCII character setISO 646 defines a 7-bit coded character set with options for a number of positions in the code table. Once exercised, the result is a National Version, ASCII being the US National Version [ANSI X3.4]. ISO 4873 specifies that columns 02 to 07 (subset G0) are identical with those of ISO 646, including its options. Moreover, if control characters described in ISO 646 are used, their allocation in the table positions of columns 01 and 02 (subset C0) are the same. However bit combinations 00/14 and 00/15 must be unused in ISO 4873, whereas they correspond to the control functions SHIFT-OUT (SO) and SHIFT-IN (SI) in ISO 646, as they do in ASCII. In Ada, the predefined type CHARACTER together with the predefined package ASCII implement the 7-bit coded character set ASCII [LRM, C(13) and C(15)]. 2.4. 8-bit coded graphic character setsISO 8859 consists of several parts. Each part specifies a set of up to 191 graphic characters and their coded representations by means of a single 8-bit code. Each set is intended for use by a group of languages: Western, Eastern, Northern and Southern Europe, Cyrillic, Greek, Arabic and Hebrew. The ISO 8859 graphic character sets are in conformance to ISO 4873. In each part, subset G0 (95 graphic characters including the character SPACE) is exercised as the US National Version ASCII of ISO 646, supplemented by a specific G1 subset (up to 96 graphic characters). 2.5. Additional control charactersISO 6429 specifies additional control functions which are elements of the C1 set and the bit combinations used for their representation. Seven positions in the table are left open: 08/00, 08/01, 08/02, 08/03, 09/08, 09/09 and 09/10. They are reserved for future standardization and are not available for private use. 2.6. Consolidation of standard character setsAs we have seen, ASCII is a 7-bit coded character set, whereas ISO 8859 defines several 8-bit coded graphic character sets, but no single standard defines an 8-bit coded character set. However, several standards may be combined to build such a code, in fact, one for each part of ISO 8859. As we will see, the result is in conformance with the involved standards, apart from one exception where their reconciliation is impossible. The overall structure of the coded character set is given by ISO 4873.
The ASCII code is mapped to sets C0 and G0, and the same holds for the predefined type CHARACTER of Ada. In this way, sets C0 and G0 are in conformance with ISO 646, and set G0 is in conformance with all parts of ISO 8859. Set C1 is defined by using ISO 6429, keeping in mind that seven positions are left open by this standard. Set G1 is defined in conformance to ISO 8859, each part giving rise to another 8-bit character code, but all sharing the same C0, C1 and G1 sets. There remains one inconsistency which cannot be reconciled. As we have already seen, code positions 00/14 and 00/15 are reserved in ISO 4873, while they are defined as SHIFT-OUT and SHIFT-IN in both ASCII and ISO 646. 3. Design Decisions3.1. Implement a complete 8-bit coded character set, and not only a subsetRationale: It would be possible to implement only the graphic character set as defined by some part of ISO 8859, i.e. sets G0 and G1 (without 07/15), or even only set G1, as set G0 is already provided by the predefined type CHARACTER. With such a design choice, switching between several coded character sets, i.e. types, must be performed in most applications. This is at the least cumbersome, and often impossible in a strongly typed language, e.g. all the elements of an array (string) must belong to the same type. 3.2. The character set is implemented as an enumeration typeRationale: It is desirable to be able to write case statements whose selectors are characters, to iterate over some part of the character set and to use characters as the index type of arrays. The only choice is then a discrete type, i.e. an enumeration type or an integer type. An enumeration type is a better model for a character set, as arithmetic operations are meaningless. Note, however, that this approach cannot be extended to 16-bit or 32-bit coded character sets for the following reasons. First of all, 216 enumeration literals may be beyond the limit of some compilers. Even worse, the implementation of attribute 'IMAGE requires enumeration literals to be included in the generated code, which thus becomes bulky. 3.3. The character set is implemented as a character type [LRM 3.5.2 (1)]Rationale: Character and string literals are a convenient way to designate values. 3.4. For identical position numbers, use enumeration literals which are homographs of those of the predefined type CHARACTER or of constants associated with control characters by the package ASCIIRationale: Thus, the ASCII character set is viewed as a subset of the 8-bit code, and type conversions to and from the predefined type CHARACTER are straightforward. As a consequence, literals SO and SI are associated to bit combinations 00/14 and 00/15, although ISO 4873 specifies that these must be unused. 3.5. Establish some kind of visibility between bit combinations and enumeration literalsRationale: We claim that it is useful to think about a character interchangeably in terms of the associated enumeration literal and of its bit combination interpreted as a number. This can be achieved by providing enumeration literals for all of the 256 possible bit combinations, or otherwise stated, by using a contiguous representation beginning by 0. Position numbers and internal codes are then identical, and the 'POS and 'VAL attributes provide the conversions between bit combinations and enumeration literals. Moreover, the use of an enumeration representation, which may lead to less efficient programs [LRM 13.3 (6)], is avoided. Perhaps we should recall here that internal codes specified by an enumeration representation clause are never visible (without resorting to unchecked conversion), that attributes 'POS and 'VAL are always related to position numbers, and that similar attributes for the internal codes of enumeration literals are missing in Ada. As a consequence of our design choice, enumeration literals must be provided even for positions in the code table left open or reserved or unused by ISO standards (i.e. 00/14 and 00/15, already quoted; and also 08/00, 08/01, 08/02, 08/03, 09/08, 09/09, 09/10). 3.6. Naming of charactersThe various ISO standards assign acronyms to control characters and at least one name to each graphic character. These names were used for enumeration literals which are identifiers, constant objects and object renaming declarations. However, some names have been shortened: the words "LETTER", "WITH", "MARK" and "ACCENT" are always suppressed, and the acronyms "UC" (upper case) and "LC" (lower case) are used instead of "CAPITAL LETTER" and "SMALL LETTER" respectively. ISO uses the following name aliases: "HYPHEN" for "MINUS SIGN", "PARAGRAPH SIGN" for "SECTION SIGN", and "IS4", "IS3", "IS2", "IS1" for "FS", "GS", "RS" and "US", respectively. These aliases and the names of the special characters are declared in the local package ISO_ALIASES. Unfortunately, the names of special characters used by the predefined package ASCII [LRM C(15)] are different from those of ISO. Their definitions are provided by the local package LRM_ALIASES which also declares identifiers for lower case ASCII letters (LC_A, etc.). 3.7. Constant objects versus constant functionsFrom an abstract viewpoint, constant objects and constant functions are semantically equivalent. However, in Ada, a constant may be a static expression, whereas a function call whose name is an identifier never has this property. On the other hand, a function may be overloaded, but not a constant. As Ada did in the predefined package ASCII [LRM C(15)], we used constants for declaring character name aliases. As a consequence, these name aliases may be used as selectors in case statements. 3.8. Inclusion of a string typeA string type definition is included in the package. In this way, the character type and the string type are coupled, which could be avoided, but we decided to imitate closely Ada's style for the predefined types CHARACTER and STRING. 3.9. Inclusion of conversion functionsConversion functions between the predefined types CHARACTER and STRING and the newly defined types are included. This simply models the fact that the values of the predefined type CHARACTER are a subset of the 8-bit coded character set. 4. ImplementationIn conformance with the above discussion, the appendix shows a package specification implementing the ISO Latin alphabet No 1 [ISO 8859, part 1]. Implementation of the other Latin alphabets [ISO 8859, parts 2, 3, 4 and 9], but also of mixed alphabets, such as Latin/Cyrillic, Latin/Arabic, Latin/Greek and Latin/Hebrew [ISO 8859, parts 5, 6, 7 and 8] could be done in a similar manner. As far as we know, no "real-world" machine implements precisely an ISO Latin alphabet, and our package specification must therefore be adapted when used in "real" applications where physical devices such as keyboards, visual displays and printers are involved. For their VAX/VMS systems, Digital Equipment Corporation uses what they call the DEC Multinational Character Set, which is in fact a close parent of the ISO Latin alphabet No. 1 [DEC, Appendix A]. The differences are the following: 15 graphic characters are reserved and 5 have other definitions, all being located in set G1. The implementation of this character set is thus straightforward, and nearly compatible with that of the ISO Latin alphabet No. 1. Apple chose another approach for their Macintosh machines. Set C1 is not used for additional control characters, but non-ASCII graphic characters occupy the table positions from 08/00 to 13/08, those from 13/09 to 15/15 being left open [MAC, vol I, p. 247]. As far as we know, the Unix world always uses the ASCII character set, and standardization of one or several 8-bit coded character sets is still lacking. By convention, the package implementing character set x is named x_Character_Set and exports types x_Character and x_String. 5. Portability IssuesThe approach shown in section 4 makes it possible to write applications using an 8-bit character set, or even several 8-bit character sets at the same time (e.g. in a conversion program). We believe, however, that most applications will just use the 8-bit character set of the underlying system along with some utilities such as conversion to upper case, without needing to know the details of the character set such as the position numbers of individual characters. Within the frame defined in section 4, porting such applications from one system to another can be tedious because the names of the package and types implementing the character set must be changed everywhere. For instance, when porting from Macintosh to VAX/VMS, every occurrence of Macintosh_String must be changed to DEC_String. For this reason, on each different system, we provide package Local_Character_Set which exports types Local_Character and Local_String and implements the system's most used character set. This package, although not portable itself, eases the porting of applications that are not too sensitive to details of the character set they use. 6. Text Input-OutputFor an application programmer, a character set is not enough: some means for basic text input-output is needed. In our approach the predefined package TEXT_IO will serve as a model [LRM 14.3]. When analyzing this package, the following may be observed:
We therefore designed a generic text input-output package whose specification has the following structure: with IO_EXCEPTIONS; generic type CHARACTER_TYPE is (<>); type STRING_TYPE is array (POSITIVE range <>) of CHARACTER_TYPE; package TEXT_IO_G is ... procedure OPEN (FILE: in out FILE_TYPE; MODE: in FILE_MODE; NAME: in STRING; FORM: in STRING := ""); ... procedure PUT (FILE: in FILE_TYPE; ITEM: in CHARACTER_TYPE); procedure PUT (ITEM: in CHARACTER_TYPE); ... procedure PUT (FILE: in FILE_TYPE; ITEM: in STRING_TYPE); procedure PUT (ITEM: in STRING_TYPE); ... end TEXT_IO_G; Its declarative part is almost the same as that of the predefined package TEXT_IO. The NAME and FORM parameters of procedures CREATE and OPEN, and the result of functions NAME and FORM keep the predefined type STRING, whereas the ITEM parameters of type CHARACTER and STRING are now of type CHARACTER_TYPE and STRING_TYPE respectively. As for TEXT_IO, the effect of input or output of control characters is not defined. Moreover, the generic actual type of CHARACTER_TYPE must include in its positions 32 to 126 the graphic characters of the ASCII code. This last condition is sufficient for implementing the local generic packages INTEGER_IO, FLOAT_IO and ENUMERATION_IO. It may be dropped, if these packages are not provided within TEXT_IO_G. It may also be weakened as follows: all actual character sets contain, at predefined fixed positions, all the characters used in Ada for character literals, numeric literals and identifiers, i.e. the upper and lower case letters without diacritical marks, the digits and some special characters like "#", "+", "-", "_", "." and SPACE. We implemented the package TEXT_IO_G, including the local generic packages INTEGER_IO, FLOAT_IO and ENUMERATION_IO, for DEC's Ada compiler on VAX/VMS machines. The following restrictions for use apply: The values of the generic actual type of CHARACTER_TYPE must be 8-bit quantities, and the generic actual type of STRING_TYPE must be packed. This package has then been instantiated with the ASCII character set and with DEC's Multinational Character Set. In the first case, the result is a package semantically equivalent to the predefined package TEXT_IO. In both cases the loss in execution speed is less than 10%. References [ANSI X3.4] American National Standard Code for Information Interchange (ASCII); American National Standards Institute, 1977. [DEC] Guide to Using VMS; Digital Equipment Corporation, April 1988 (Order Number: AA-LA05A-TE). [ISO 646] ISO 7-bit coded character set for information interchange; International Organization for Standardization, 1983 (second edition). [ISO 4873] ISO 8-bit code for information interchange - Structure and rules for implementation; International Organization for Standardization, 1986 (second edition). [ISO 6429] ISO 7-bit and 8-bit coded character sets - Additional control functions for character-imaging devices; International Organization for Standardization, 1983. [ISO 8859] 8-bit single-byte coded graphic character sets; International Organization for Standardization, 1987. [LRM] Reference Manual for the Ada Programming Language, ANSI/MIL-STD 1815A, 1983. [MAC] Inside Macintosh, Apple Computer; Addison-Wesley, 1985.
Appendix --+ TITLE: Latin alphabet No. 1 (ISO 8859-1) 8-bit character set.
--+ AUTHORS: Christian Genillard and Alfred Strohmeier.
Swiss Federal Institute of Technology (EPFL),
1015 Lausanne, Switzerland.
--+ DATE: February 1990.
package LATIN1_CHARACTER_SET is
----------------------------
--+ OVERVIEW:
--+ This package implements a character type (and a string type), coded on
--+ eight bits, which is based on the Latin alphabet No. 1 defined by
--+ ISO 8859-1.
--+ The codes 0..31 and 127 are control characters as defined in the LRM and
--+ ISO 646 p. 8 and p. 4.
--+ The codes 32..126 are graphic characters as defined in the LRM and
--+ ISO 646 and ISO 8859-1 (Latin alphabet No. 1).
--+ The codes 0..127 are called ASCII characters. They may be converted to
--+ elements of the character set predefined in Ada by the type CHARACTER.
--+ The codes 128..159 are control characters as defined in ISO 6429 p.4.
--+ The codes 160..255 are graphic characters as defined in ISO 8859-1.
type LATIN1_CHARACTER is
(NUL, SOH, STX, ETX, EOT, ENQ, ACK, BEL, -- 0..7
BS, HT, LF, VT, FF, CR, SO, SI, -- 8..15
DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB, -- 16..23
CAN, EM, SUB, ESC, FS, GS, RS, US, -- 24..31
' ', '!', '"', '#', '$', '%', '&', ''', -- 32..39
'(', ')', '*', '+', ',', '-', '.', '/', -- 40..47
'0', '1', '2', '3', '4', '5', '6', '7', -- 48..55
'8', '9', ':', ';', '<', '=', '>', '?', -- 56..63
'@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', -- 64..71
'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', -- 72..79
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', -- 80..87
'X', 'Y', 'Z', '[', '\', ']', '^', '_', -- 88..95
'`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', -- 96..103
'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', -- 104..111
'p', 'q', 'r', 's', 't', 'u', 'v', 'w', -- 112..119
'x', 'y', 'z', '{', '|', '}', '~', DEL, -- 120..127
RESERVED_128,
RESERVED_129,
RESERVED_130,
RESERVED_131,
IND, NEL, SSA, ESA, -- 128..135
HTS, HTJ, VTS, PLD, PLU, RI, SS2, SS3, -- 136..143
DCS, PU1, PU2, STS, CCH, MW, SPA, EPA, -- 144..151
RESERVED_152,
RESERVED_153,
RESERVED_154,
CSI, ST, OSC, PM, APC, -- 152..159
NO_BREAK_SPACE, -- Also NBSP
INVERTED_EXCLAMATION,
CENT_SIGN,
POUND_SIGN,
CURRENCY_SIGN,
YEN_SIGN,
BROKEN_BAR,
SECTION_SIGN, -- Also PARAGRAPH_SIGN -- 160..167
DIAERESIS,
COPYRIGHT_SIGN,
FEMININE_ORDINAL_INDICATOR,
LEFT_ANGLE_QUOTATION,
NOT_SIGN,
SOFT_HYPHEN,
REGISTERED_TRADE_MARK_SIGN,
MACRON, -- 168..175
DEGREE_SIGN, -- Also RING_ABOVE
PLUS_MINUS_SIGN,
SUPERSCRIPT_TWO,
SUPERSCRIPT_THREE,
ACUTE,
MICRO_SIGN,
PILCROW_SIGN,
MIDDLE_DOT, -- 176..183
CEDILLA,
SUPERSCRIPT_ONE,
MASCULINE_ORDINAL_INDICATOR,
RIGHT_ANGLE_QUOTATION,
FRACTION_ONE_QUARTER,
FRACTION_ONE_HALF,
FRACTION_THREE_QUARTERS,
INVERTED_QUESTION, -- 184..191
UC_A_GRAVE,
UC_A_ACUTE,
UC_A_CIRCUMFLEX,
UC_A_TILDE,
UC_A_DIAERESIS,
UC_A_RING,
UC_AE_DIPHTHONG,
UC_C_CEDILLA, -- 192..199
UC_E_GRAVE,
UC_E_ACUTE,
UC_E_CIRCUMFLEX,
UC_E_DIAERESIS,
UC_I_GRAVE,
UC_I_ACUTE,
UC_I_CIRCUMFLEX,
UC_I_DIAERESIS, -- 200..207
UC_ICELANDIC_ETH,
UC_N_TILDE,
UC_O_GRAVE,
UC_O_ACUTE,
UC_O_CIRCUMFLEX,
UC_O_TILDE,
UC_O_DIAERESIS,
MULTIPLICATION_SIGN, -- 208..215
UC_O_OBLIQUE_STROKE,
UC_U_GRAVE,
UC_U_ACUTE,
UC_U_CIRCUMFLEX,
UC_U_DIAERESIS,
UC_Y_ACUTE,
UC_ICELANDIC_THORN,
LC_GERMAN_SHARP_S, -- 216..223
LC_A_GRAVE,
LC_A_ACUTE,
LC_A_CIRCUMFLEX,
LC_A_TILDE,
LC_A_DIAERESIS,
LC_A_RING,
LC_AE_DIPHTHONG,
LC_C_CEDILLA, -- 224..231
LC_E_GRAVE,
LC_E_ACUTE,
LC_E_CIRCUMFLEX,
LC_E_DIAERESIS,
LC_I_GRAVE,
LC_I_ACUTE,
LC_I_CIRCUMFLEX,
LC_I_DIAERESIS, -- 232..239
LC_ICELANDIC_ETH,
LC_N_TILDE,
LC_O_GRAVE,
LC_O_ACUTE,
LC_O_CIRCUMFLEX,
LC_O_TILDE,
LC_O_DIAERESIS,
DIVISION_SIGN, -- 240..247
LC_O_OBLIQUE_STROKE,
LC_U_GRAVE,
LC_U_ACUTE,
LC_U_CIRCUMFLEX,
LC_U_DIAERESIS,
LC_Y_ACUTE,
LC_ICELANDIC_THORN,
LC_Y_DIAERESIS); -- 248..255
for LATIN1_CHARACTER'SIZE use 8;
package ISO_ALIASES is
IS4 : constant LATIN1_CHARACTER := FS; -- ISO 6429
IS3 : constant LATIN1_CHARACTER := GS; -- ISO 6429
IS2 : constant LATIN1_CHARACTER := RS; -- ISO 6429
IS1 : constant LATIN1_CHARACTER := US; -- ISO 6429
SPACE : constant LATIN1_CHARACTER := ' ';
EXCLAMATION : constant LATIN1_CHARACTER := '!';
QUOTATION : constant LATIN1_CHARACTER := '"';
NUMBER_SIGN : constant LATIN1_CHARACTER := '#';
DOLLAR_SIGN : constant LATIN1_CHARACTER := '$';
PERCENT_SIGN : constant LATIN1_CHARACTER := '%';
AMPERSAND : constant LATIN1_CHARACTER := '&';
APOSTROPHE : constant LATIN1_CHARACTER := ''';
LEFT_PARENTHESIS : constant LATIN1_CHARACTER := '(';
RIGHT_PARENTHESIS : constant LATIN1_CHARACTER := ')';
ASTERISK : constant LATIN1_CHARACTER := '*';
PLUS_SIGN : constant LATIN1_CHARACTER := '+';
COMMA : constant LATIN1_CHARACTER := ',';
HYPHEN : constant LATIN1_CHARACTER := '-';
MINUS_SIGN : LATIN1_CHARACTER renames HYPHEN;
FULL_STOP : constant LATIN1_CHARACTER := '.';
SOLIDUS : constant LATIN1_CHARACTER := '/';
COLON : constant LATIN1_CHARACTER := ':';
SEMICOLON : constant LATIN1_CHARACTER := ';';
LESS_THAN_SIGN : constant LATIN1_CHARACTER := '<';
EQUALS_SIGN : constant LATIN1_CHARACTER := '=';
GREATER_THAN_SIGN : constant LATIN1_CHARACTER := '>';
QUESTION : constant LATIN1_CHARACTER := '?';
COMMERCIAL_AT : constant LATIN1_CHARACTER := '@';
LEFT_SQUARE_BRACKET : constant LATIN1_CHARACTER := '[';
REVERSE_SOLIDUS : constant LATIN1_CHARACTER := '\';
RIGHT_SQUARE_BRACKET : constant LATIN1_CHARACTER := ']';
CIRCUMFLEX : constant LATIN1_CHARACTER := '^';
LOW_LINE : constant LATIN1_CHARACTER := '_';
GRAVE : constant LATIN1_CHARACTER := '`';
LEFT_CURLY_BRACKET : constant LATIN1_CHARACTER := '{';
VERTICAL_LINE : constant LATIN1_CHARACTER := '|';
RIGHT_CURLY_BRACKET : constant LATIN1_CHARACTER := '}';
TILDE : constant LATIN1_CHARACTER := '~';
NBSP : constant LATIN1_CHARACTER := NO_BREAK_SPACE;
PARAGRAPH_SIGN : constant LATIN1_CHARACTER := SECTION_SIGN;
RING_ABOVE : constant LATIN1_CHARACTER := DEGREE_SIGN;
end ISO_ALIASES;
package LRM_ALIASES is
EXCLAM : constant LATIN1_CHARACTER := '!';
QUOTATION : constant LATIN1_CHARACTER := '"';
SHARP : constant LATIN1_CHARACTER := '#';
DOLLAR : constant LATIN1_CHARACTER := '$';
PERCENT : constant LATIN1_CHARACTER := '%';
AMPERSAND : constant LATIN1_CHARACTER := '&';
COLON : constant LATIN1_CHARACTER := ':';
SEMICOLON : constant LATIN1_CHARACTER := ';';
QUERY : constant LATIN1_CHARACTER := '?';
AT_SIGN : constant LATIN1_CHARACTER := '@';
L_BRACKET : constant LATIN1_CHARACTER := '[';
BACK_SLASH : constant LATIN1_CHARACTER := '\';
R_BRACKET : constant LATIN1_CHARACTER := ']';
CIRCUMFLEX : constant LATIN1_CHARACTER := '^';
UNDERLINE : constant LATIN1_CHARACTER := '_';
GRAVE : constant LATIN1_CHARACTER := '`';
L_BRACE : constant LATIN1_CHARACTER := '{';
BAR : constant LATIN1_CHARACTER := '|';
R_BRACE : constant LATIN1_CHARACTER := '}';
TILDE : constant LATIN1_CHARACTER := '~';
LC_A : constant LATIN1_CHARACTER := 'a';
LC_B : constant LATIN1_CHARACTER := 'b';
LC_C : constant LATIN1_CHARACTER := 'c';
LC_D : constant LATIN1_CHARACTER := 'd';
LC_E : constant LATIN1_CHARACTER := 'e';
LC_F : constant LATIN1_CHARACTER := 'f';
LC_G : constant LATIN1_CHARACTER := 'g';
LC_H : constant LATIN1_CHARACTER := 'h';
LC_I : constant LATIN1_CHARACTER := 'i';
LC_J : constant LATIN1_CHARACTER := 'j';
LC_K : constant LATIN1_CHARACTER := 'k';
LC_L : constant LATIN1_CHARACTER := 'l';
LC_M : constant LATIN1_CHARACTER := 'm';
LC_N : constant LATIN1_CHARACTER := 'n';
LC_O : constant LATIN1_CHARACTER := 'o';
LC_P : constant LATIN1_CHARACTER := 'p';
LC_Q : constant LATIN1_CHARACTER := 'q';
LC_R : constant LATIN1_CHARACTER := 'r';
LC_S : constant LATIN1_CHARACTER := 's';
LC_T : constant LATIN1_CHARACTER := 't';
LC_U : constant LATIN1_CHARACTER := 'u';
LC_V : constant LATIN1_CHARACTER := 'v';
LC_W : constant LATIN1_CHARACTER := 'w';
LC_X : constant LATIN1_CHARACTER := 'x';
LC_Y : constant LATIN1_CHARACTER := 'y';
LC_Z : constant LATIN1_CHARACTER := 'z';
end LRM_ALIASES;
type LATIN1_STRING is array (POSITIVE range <>) of LATIN1_CHARACTER;
pragma PACK(LATIN1_STRING);
function TO_LATIN1_CHARACTER (ITEM : CHARACTER) return LATIN1_CHARACTER;
function TO_CHARACTER (ITEM : LATIN1_CHARACTER) return CHARACTER;
--+ OVERVIEW:
--+ Returns the element of the other character type which is at the same
--+ position in the enumeration.
--+ ERRORS:
--+ Raises NON_ASCII_ERROR if TO_CHARACTER is called with an ITEM which
--+ is not an ASCII character.
function TO_LATIN1_STRING (ITEM : STRING) return LATIN1_STRING;
function TO_STRING (ITEM : LATIN1_STRING) return STRING;
--+ OVERVIEW:
--+ Returns a string converted character by character to the other string
--+ type using the conversion functions defined for characters.
--+ ERRORS:
--+ Raises NON_ASCII_ERROR if TO_STRING is called with an ITEM containing
--+ at least one component which is not an ASCII character.
function IS_ASCII (ITEM : LATIN1_CHARACTER) return BOOLEAN;
function IS_ASCII (ITEM : LATIN1_STRING) return BOOLEAN;
--+ OVERVIEW:
--+ Checks if ITEM is convertible to an ASCII string or character.
function IS_GRAPHIC (ITEM : LATIN1_CHARACTER) return BOOLEAN;
function IS_GRAPHIC (ITEM : LATIN1_STRING) return BOOLEAN;
--+ OVERVIEW:
--+ Checks if ITEM is made of graphic characters.
NON_ASCII_ERROR : exception;
-- Raised by TO_CHARACTER and TO_STRING when their argument
-- contains non-ASCII characters.
pragma INLINE (TO_LATIN1_CHARACTER, TO_CHARACTER, IS_ASCII, IS_GRAPHIC);
end LATIN1_CHARACTER_SET;
|
| EPFL | IC | LGL | Teaching | Ada | LGL Components | |||
![]() |
|||
| Last modified |
![]() |
||