This is a cut-down version of the Wikipedia page. See User_blog:Enojado271/Imp_2_Map_Gen_Keys:_In_Summary#comm-2793 for its main relevance to Imperialism and Imperialism 2.
ASCII (Template:IPAc-en Template:Respell), abbreviated from American Standard Code for Information Interchange, is a character-encoding scheme. Originally based on the English alphabet, it encodes 128 specified characters into 7-bit binary integers as shown by the ASCII chart on the right. The characters encoded are numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes that originated with Teletype machines, and space. For example, lowercase e would become binary 1100101.
ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding schemes are based on ASCII, though they support many additional characters.
ASCII includes definitions for 128 characters: 33 are non-printing control characters (many now obsolete) that affect how text and space are processed and 95 printable characters, including the space (which is considered an invisible graphic).
The IANA prefers the name US-ASCII. ASCII was the most common character encoding on the World Wide Web until December 2007, when it was surpassed by UTF-8, which includes ASCII as a subset.
The code itself was patterned so that most control codes were together, and all graphic codes were together, for ease of identification. The first two columns (32 positions) were reserved for control characters. The "space" character had to come before graphics to make sorting easier, so it became position 20hex; for the same reason, many special signs commonly used as separators were placed before digits. The committee decided it was important to support upper case 64-character alphabets, and chose to pattern ASCII so it could be reduced easily to a usable 64-character set of graphic codes, as was done in the DEC SIXBIT code. Lower case letters were therefore not interleaved with upper case. To keep options available for lower case letters and other graphics, the special and numeric codes were arranged before the letters, and the letter "A" was placed in position 41hex to match the draft of the corresponding British standard. The digits 0–9 were arranged so they correspond to values in binary prefixed with 011, making conversion with binary-coded decimal straightforward.
ASCII was incorporated into the Unicode character set as the first 128 symbols, so the 7-bit ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with 7-bit ASCII, as a UTF-8 file containing only ASCII characters is identical to an ASCII file containing the same sequence of characters. Even more importantly, forward compatibility is ensured as software that recognizes only 7-bit ASCII characters as special and does not alter bytes with the highest bit set (as is often done to support 8-bit ASCII extensions such as ISO-8859-1) will preserve UTF-8 data unchanged.
ASCII control code chartEdit
|Binary||Oct||Dec||Hex||Abbr||[lower-alpha 1]||[lower-alpha 2]||[lower-alpha 3]||Name|
|000 0000||000||0||00||NUL||␀||^@||\0||Null character|
|000 0001||001||1||01||SOH||␁||^A||Start of Header|
|000 0010||002||2||02||STX||␂||^B||Start of Text|
|000 0011||003||3||03||ETX||␃||^C||End of Text|
|000 0100||004||4||04||EOT||␄||^D||End of Transmission|
|000 1000||010||8||08||BS||␈||^H||\b||Backspace[lower-alpha 4][lower-alpha 5]|
|000 1001||011||9||09||HT||␉||^I||\t||Horizontal Tab[lower-alpha 6]|
|000 1010||012||10||0A||LF||␊||^J||\n||Line feed|
|000 1011||013||11||0B||VT||␋||^K||\v||Vertical Tab|
|000 1100||014||12||0C||FF||␌||^L||\f||Form feed|
|000 1101||015||13||0D||CR||␍||^M||\r||Carriage return[lower-alpha 7]|
|000 1110||016||14||0E||SO||␎||^N||Shift Out|
|000 1111||017||15||0F||SI||␏||^O||Shift In|
|001 0000||020||16||10||DLE||␐||^P||Data Link Escape|
|001 0001||021||17||11||DC1||␑||^Q||Device Control 1 (oft. XON)|
|001 0010||022||18||12||DC2||␒||^R||Device Control 2|
|001 0011||023||19||13||DC3||␓||^S||Device Control 3 (oft. XOFF)|
|001 0100||024||20||14||DC4||␔||^T||Device Control 4|
|001 0101||025||21||15||NAK||␕||^U||Negative Acknowledgment|
|001 0110||026||22||16||SYN||␖||^V||Synchronous idle|
|001 0111||027||23||17||ETB||␗||^W||End of Transmission Block|
|001 1001||031||25||19||EM||␙||^Y||End of Medium|
|001 1011||033||27||1B||ESC||␛||^[||\e[lower-alpha 8]||Escape[lower-alpha 9]|
|001 1100||034||28||1C||FS||␜||^\||File Separator|
|001 1101||035||29||1D||GS||␝||^]||Group Separator|
|001 1110||036||30||1E||RS||␞||^^[lower-alpha 10]||Record Separator|
|001 1111||037||31||1F||US||␟||^_||Unit Separator|
|111 1111||177||127||7F||DEL||␡||^?||Delete[lower-alpha 11][lower-alpha 5]|
- ↑ Template:Cite web
- ↑ Template:Cite web
- ↑ Template:Cite book
- ↑ International Organization for Standardization (December 1, 1975). "The set of control characters for ISO 646". Internet Assigned Numbers Authority Registry. Alternate U.S. version: . Accessed 2008-04-14.
- ↑ "RFC 20: ASCII format for Network Interchange", ANSI X3.4-1968, October 16, 1969.
- ↑ Mackenzie 1980, p. 223.
- ↑ Cite error: Invalid
<ref>tag; no text was provided for refs named
- ↑ Template:Cite web
- ↑ Template:Cite web
- ↑ Template:Cite web
- ↑ Mackenzie 1980, p. 220, Decisions 8,9.
- ↑ Mackenzie 1980, p. 237, Decision 10.
- ↑ Mackenzie 1980, p. 228, Decision 14.
- ↑ Mackenzie 1980, p. 238, Decision 18.
- ↑ Template:Cite web
ASCII printable charactersEdit
Code 20hex, the space character, denotes the space between words, as produced by the space-bar of a keyboard. Since the space character is considered an invisible graphic (rather than a control character) and thus would not normally be visible, it is represented here by Unicode character U+2420 "␠"; Unicode characters U+2422 "␢" and U+2423 "␣" are also available for use when a visible representation of a space is necessary.
Code 7Fhex corresponds to the non-printable "Delete" (DEL) control character and is therefore omitted from this chart; it is covered in the previous section's chart. Earlier versions of ASCII used the up-arrow instead of the caret (5Ehex) and the left-arrow instead of the underscore (5Fhex).
ASCII printable code chartEdit
As computer technology spread throughout the world, different standards bodies and corporations developed many variations of ASCII to facilitate the expression of non-English languages that used Roman-based alphabets. One could class some of these variations as "ASCII extensions", although some misuse that term to represent all variants, including those that do not preserve ASCII's character-map in the 7-bit range. Furthermore the ASCII extensions have also been mislabelled as ASCII.
The PETSCII code Commodore International used for their 8-bit systems is probably unique among post-1970 codes in being based on ASCII-1963, instead of the more common ASCII-1967, such as found on the ZX Spectrum computer. Atari 8-bit computers and Galaksija computers also used ASCII variants.
Eventually, as 8-, 16- and 32-bit (and later 64-bit) computers began to replace 18- and 36-bit computers as the norm, it became common to use an 8-bit byte to store each character in memory, providing an opportunity for extended, 8-bit, relatives of ASCII. In most cases these developed as true extensions of ASCII, leaving the original character-mapping intact, but adding additional character definitions after the first 128 (i.e., 7-bit) characters.
Most early home computer systems developed their own 8-bit character sets containing line-drawing and game glyphs, and often filled in some or all of the control characters from 0-31 with more graphics. Kaypro CP/M computers used the "upper" 128 characters for the Greek alphabet. The IBM PC defined code page 437, which replaced the control-characters with graphic symbols such as smiley faces, and mapped additional graphic characters to the upper 128 positions. Operating systems such as DOS supported these code-pages, and manufacturers of IBM PCs supported them in hardware. Digital Equipment Corporation developed the Multinational Character Set (DEC-MCS) for use in the popular VT220 terminal; this was one of the first extensions designed more for international languages than for block graphics. The Macintosh defined Mac OS Roman and Postscript also defined a set, both of these contained both international letters and typographic punctuation marks instead of graphics, more like modern character sets. The ISO/IEC 8859 standard (derived from the DEC-MCS) finally provided a standard that most systems copied (at least as accurately as they copied ASCII, but with many substitutions). A popular further extension designed by Microsoft, Windows-1252 (often mislabeled as ISO-8859-1), added the typographic punctuation marks needed for traditional text printing.
Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters, and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards. Therefore, ASCII can be considered a 7-bit encoding scheme for a very small subset of Unicode/UCS, and ASCII (when prefixed with 0 as the eighth bit) is valid UTF-8.
ASCII-code order is also called ASCIIbetical order. Collation of data is sometimes done in this order rather than "standard" alphabetical order (collating sequence). The main deviations in ASCII order are:
- All uppercase come before lowercase letters, for example, "Z" before "a"
- Digits and many punctuation marks come before letters; for example, "4" precedes "one"
- Numbers are sorted naïvely as strings; for example, "10" precedes "2"
An intermediate order—readily implemented—converts uppercase letters to lowercase before comparing ASCII values. Naïve number sorting can be averted by zero-filling all numbers (e.g. "02" will sort before "10" as expected), although this is an external fix and has nothing to do with the ordering itself.
- The ASCII subset of Unicode
- Template:Cite web
- Scanned copy of American Standard Code for Information Interchange ASA standard X3.4-1963
|This page uses Creative Commons Licensed content from Wikipedia (view authors).|
<ref>tags exist for a group named "lower-alpha", but no corresponding
<references group="lower-alpha"/>tag was found.