FANDOM


File:ASCII Code Chart-Quick ref card.png

This is a cut-down version of the Wikipedia page. See User_blog:Enojado271/Imp_2_Map_Gen_Keys:_In_Summary#comm-2793 for its main relevance to Imperialism and Imperialism 2.

ASCII (Template:IPAc-en Template:Respell), abbreviated from American Standard Code for Information Interchange,[1] is a character-encoding scheme. Originally based on the English alphabet, it encodes 128 specified characters into 7-bit binary integers as shown by the ASCII chart on the right.[2] The characters encoded are numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes that originated with Teletype machines, and space. For example, lowercase e would become binary 1100101.

ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding schemes are based on ASCII, though they support many additional characters.

ASCII includes definitions for 128 characters: 33 are non-printing control characters (many now obsolete)[3] that affect how text and space are processed[4] and 95 printable characters, including the space (which is considered an invisible graphic[5][6]).

The IANA prefers the name US-ASCII.[7] ASCII was the most common character encoding on the World Wide Web until December 2007, when it was surpassed by UTF-8, which includes ASCII as a subset.[8][9][10]

OrganizationEdit

The code itself was patterned so that most control codes were together, and all graphic codes were together, for ease of identification. The first two columns (32 positions) were reserved for control characters.[11] The "space" character had to come before graphics to make sorting easier, so it became position 20hex;[12] for the same reason, many special signs commonly used as separators were placed before digits. The committee decided it was important to support upper case 64-character alphabets, and chose to pattern ASCII so it could be reduced easily to a usable 64-character set of graphic codes,[13] as was done in the DEC SIXBIT code. Lower case letters were therefore not interleaved with upper case. To keep options available for lower case letters and other graphics, the special and numeric codes were arranged before the letters, and the letter "A" was placed in position 41hex to match the draft of the corresponding British standard.[14] The digits 0–9 were arranged so they correspond to values in binary prefixed with 011, making conversion with binary-coded decimal straightforward.

ASCII was incorporated into the Unicode character set as the first 128 symbols, so the 7-bit ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with 7-bit ASCII, as a UTF-8 file containing only ASCII characters is identical to an ASCII file containing the same sequence of characters. Even more importantly, forward compatibility is ensured as software that recognizes only 7-bit ASCII characters as special and does not alter bytes with the highest bit set (as is often done to support 8-bit ASCII extensions such as ISO-8859-1) will preserve UTF-8 data unchanged.[15]


ASCII control code chartEdit

Binary Oct Dec Hex Abbr [lower-alpha 1] [lower-alpha 2] [lower-alpha 3] Name
000 0000 000 0 00 NUL ^@ \0 Null character
000 0001 001 1 01 SOH ^A Start of Header
000 0010 002 2 02 STX ^B Start of Text
000 0011 003 3 03 ETX ^C End of Text
000 0100 004 4 04 EOT ^D End of Transmission
000 0101 005 5 05 ENQ ^E Enquiry
000 0110 006 6 06 ACK ^F Acknowledgment
000 0111 007 7 07 BEL ^G \a Bell
000 1000 010 8 08 BS ^H \b Backspace[lower-alpha 4][lower-alpha 5]
000 1001 011 9 09 HT ^I \t Horizontal Tab[lower-alpha 6]
000 1010 012 10 0A LF ^J \n Line feed
000 1011 013 11 0B VT ^K \v Vertical Tab
000 1100 014 12 0C FF ^L \f Form feed
000 1101 015 13 0D CR ^M \r Carriage return[lower-alpha 7]
000 1110 016 14 0E SO ^N Shift Out
000 1111 017 15 0F SI ^O Shift In
001 0000 020 16 10 DLE ^P Data Link Escape
001 0001 021 17 11 DC1 ^Q Device Control 1 (oft. XON)
001 0010 022 18 12 DC2 ^R Device Control 2
001 0011 023 19 13 DC3 ^S Device Control 3 (oft. XOFF)
001 0100 024 20 14 DC4 ^T Device Control 4
001 0101 025 21 15 NAK ^U Negative Acknowledgment
001 0110 026 22 16 SYN ^V Synchronous idle
001 0111 027 23 17 ETB ^W End of Transmission Block
001 1000 030 24 18 CAN ^X Cancel
001 1001 031 25 19 EM ^Y End of Medium
001 1010 032 26 1A SUB ^Z Substitute
001 1011 033 27 1B ESC ^[ \e[lower-alpha 8] Escape[lower-alpha 9]
001 1100 034 28 1C FS ^\ File Separator
001 1101 035 29 1D GS ^] Group Separator
001 1110 036 30 1E RS ^^[lower-alpha 10] Record Separator
001 1111 037 31 1F US ^_ Unit Separator
111 1111 177 127 7F DEL ^? Delete[lower-alpha 11][lower-alpha 5]
  1. Template:Cite web
  2. Template:Cite web
  3. Template:Cite book
  4. International Organization for Standardization (December 1, 1975). "The set of control characters for ISO 646". Internet Assigned Numbers Authority Registry. Alternate U.S. version: [1]. Accessed 2008-04-14.
  5. "RFC 20: ASCII format for Network Interchange", ANSI X3.4-1968, October 16, 1969.
  6. Mackenzie 1980, p. 223.
  7. Cite error: Invalid <ref> tag; no text was provided for refs named IANA
  8. Template:Cite web
  9. Template:Cite web
  10. Template:Cite web
  11. Mackenzie 1980, p. 220, Decisions 8,9.
  12. Mackenzie 1980, p. 237, Decision 10.
  13. Mackenzie 1980, p. 228, Decision 14.
  14. Mackenzie 1980, p. 238, Decision 18.
  15. Template:Cite web

Other representations might be used by specialist equipment, for example ISO 2047 graphics or hexadecimal numbers.

ASCII printable charactersEdit

Codes 20hex to 7Ehex, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.Template:Efn

Code 20hex, the space character, denotes the space between words, as produced by the space-bar of a keyboard. Since the space character is considered an invisible graphic (rather than a control character)[1][2] and thus would not normally be visible, it is represented here by Unicode character U+2420 "␠"; Unicode characters U+2422 "␢" and U+2423 "␣" are also available for use when a visible representation of a space is necessary.

Code 7Fhex corresponds to the non-printable "Delete" (DEL) control character and is therefore omitted from this chart; it is covered in the previous section's chart. Earlier versions of ASCII used the up-arrow instead of the caret (5Ehex) and the left-arrow instead of the underscore (5Fhex).[3]

ASCII printable code chartEdit

Binary Oct Dec Hex Glyph
010 0000 040 32 20 Template:Small
010 0001 041 33 21 !
010 0010 042 34 22 "
010 0011 043 35 23 #
010 0100 044 36 24 $
010 0101 045 37 25 %
010 0110 046 38 26 &
010 0111 047 39 27 '
010 1000 050 40 28 (
010 1001 051 41 29 )
010 1010 052 42 2A *
010 1011 053 43 2B +
010 1100 054 44 2C ,
010 1101 055 45 2D -
010 1110 056 46 2E .
010 1111 057 47 2F /
011 0000 060 48 30 0
011 0001 061 49 31 1
011 0010 062 50 32 2
011 0011 063 51 33 3
011 0100 064 52 34 4
011 0101 065 53 35 5
011 0110 066 54 36 6
011 0111 067 55 37 7
011 1000 070 56 38 8
011 1001 071 57 39 9
011 1010 072 58 3A :
011 1011 073 59 3B ;
011 1100 074 60 3C <
011 1101 075 61 3D =
011 1110 076 62 3E >
011 1111 077 63 3F ?
Binary Oct Dec Hex Glyph
100 0000 100 64 40 @
100 0001 101 65 41 A
100 0010 102 66 42 B
100 0011 103 67 43 C
100 0100 104 68 44 D
100 0101 105 69 45 E
100 0110 106 70 46 F
100 0111 107 71 47 G
100 1000 110 72 48 H
100 1001 111 73 49 I
100 1010 112 74 4A J
100 1011 113 75 4B K
100 1100 114 76 4C L
100 1101 115 77 4D M
100 1110 116 78 4E N
100 1111 117 79 4F O
101 0000 120 80 50 P
101 0001 121 81 51 Q
101 0010 122 82 52 R
101 0011 123 83 53 S
101 0100 124 84 54 T
101 0101 125 85 55 U
101 0110 126 86 56 V
101 0111 127 87 57 W
101 1000 130 88 58 X
101 1001 131 89 59 Y
101 1010 132 90 5A Z
101 1011 133 91 5B [
101 1100 134 92 5C \
101 1101 135 93 5D ]
101 1110 136 94 5E ^
101 1111 137 95 5F _
Binary Oct Dec Hex Glyph
110 0000 140 96 60 `
110 0001 141 97 61 a
110 0010 142 98 62 b
110 0011 143 99 63 c
110 0100 144 100 64 d
110 0101 145 101 65 e
110 0110 146 102 66 f
110 0111 147 103 67 g
110 1000 150 104 68 h
110 1001 151 105 69 i
110 1010 152 106 6A j
110 1011 153 107 6B k
110 1100 154 108 6C l
110 1101 155 109 6D m
110 1110 156 110 6E n
110 1111 157 111 6F o
111 0000 160 112 70 p
111 0001 161 113 71 q
111 0010 162 114 72 r
111 0011 163 115 73 s
111 0100 164 116 74 t
111 0101 165 117 75 u
111 0110 166 118 76 v
111 0111 167 119 77 w
111 1000 170 120 78 x
111 1001 171 121 79 y
111 1010 172 122 7A z
111 1011 173 123 7B {
111 1100 174 124 7C |
111 1101 175 125 7D }
111 1110 176 126 7E ~

VariantsEdit

As computer technology spread throughout the world, different standards bodies and corporations developed many variations of ASCII to facilitate the expression of non-English languages that used Roman-based alphabets. One could class some of these variations as "ASCII extensions", although some misuse that term to represent all variants, including those that do not preserve ASCII's character-map in the 7-bit range. Furthermore the ASCII extensions have also been mislabelled as ASCII.

Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc.

The PETSCII code Commodore International used for their 8-bit systems is probably unique among post-1970 codes in being based on ASCII-1963, instead of the more common ASCII-1967, such as found on the ZX Spectrum computer. Atari 8-bit computers and Galaksija computers also used ASCII variants.

8-bitEdit

Eventually, as 8-, 16- and 32-bit (and later 64-bit) computers began to replace 18- and 36-bit computers as the norm, it became common to use an 8-bit byte to store each character in memory, providing an opportunity for extended, 8-bit, relatives of ASCII. In most cases these developed as true extensions of ASCII, leaving the original character-mapping intact, but adding additional character definitions after the first 128 (i.e., 7-bit) characters.

Most early home computer systems developed their own 8-bit character sets containing line-drawing and game glyphs, and often filled in some or all of the control characters from 0-31 with more graphics. Kaypro CP/M computers used the "upper" 128 characters for the Greek alphabet. The IBM PC defined code page 437, which replaced the control-characters with graphic symbols such as smiley faces, and mapped additional graphic characters to the upper 128 positions. Operating systems such as DOS supported these code-pages, and manufacturers of IBM PCs supported them in hardware. Digital Equipment Corporation developed the Multinational Character Set (DEC-MCS) for use in the popular VT220 terminal; this was one of the first extensions designed more for international languages than for block graphics. The Macintosh defined Mac OS Roman and Postscript also defined a set, both of these contained both international letters and typographic punctuation marks instead of graphics, more like modern character sets. The ISO/IEC 8859 standard (derived from the DEC-MCS) finally provided a standard that most systems copied (at least as accurately as they copied ASCII, but with many substitutions). A popular further extension designed by Microsoft, Windows-1252 (often mislabeled as ISO-8859-1), added the typographic punctuation marks needed for traditional text printing.

ISO-8859-1, Windows-1252, and the original 7-bit ASCII were the most common character encodings until 2008 when UTF-8 became more common.[4]

UnicodeEdit

Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters, and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32).

To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards. Therefore, ASCII can be considered a 7-bit encoding scheme for a very small subset of Unicode/UCS, and ASCII (when prefixed with 0 as the eighth bit) is valid UTF-8.

OrderEdit

ASCII-code order is also called ASCIIbetical order.[5] Collation of data is sometimes done in this order rather than "standard" alphabetical order (collating sequence). The main deviations in ASCII order are:

  • All uppercase come before lowercase letters, for example, "Z" before "a"
  • Digits and many punctuation marks come before letters; for example, "4" precedes "one"
  • Numbers are sorted naïvely as strings; for example, "10" precedes "2"

An intermediate order—readily implemented—converts uppercase letters to lowercase before comparing ASCII values. Naïve number sorting can be averted by zero-filling all numbers (e.g. "02" will sort before "10" as expected), although this is an external fix and has nothing to do with the ordering itself.

See alsoEdit

NotesEdit

Template:Notelist

ReferencesEdit

Footnotes
  1. Cite error: Invalid <ref> tag; no text was provided for refs named FOOTNOTEMackenzie1980223
  2. Cite error: Invalid <ref> tag; no text was provided for refs named RFC20_1968
  3. ASA X3.4-1963.
  4. Cite error: Invalid <ref> tag; no text was provided for refs named utf-8-2008
  5. Template:Cite journal
Bibliography

External linksEdit

This page uses Creative Commons Licensed content from Wikipedia (view authors).

Cite error: <ref> tags exist for a group named "lower-alpha", but no corresponding <references group="lower-alpha"/> tag was found.

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.