'C'Character Set
Introduction:
Character Set is a collection of permissible characters that can be used in a variety of contexts by the program. In this article, we covered the history of Character encoding's. Here, we also discuss the historical encoding system known as EBCDIC, the current coding standard Unicode, and ASCIl.The character set in C programming language is a collection of characters that can be used to write code, input/output data, and manipulate strings. The C character set includes all the lowercase and uppercase letters of the English alphabet, digits from 0 to 9, and various special characters such as punctuation marks, mathematical symbols, and other symbols. The ASCII character set is used by the C programming language, which includes 128 characters. The extended ASCII character set includes 256 characters.
Read more: Learn C Programming
The C programming language also provides various escape sequences to represent special characters in strings. For example, the escape sequence "\n" represents a newline character, "\r" represents a carriage return character, and "\t" represents a tab character. These escape sequences are used to represent special characters in strings and input/output operations.Generally, there are two types of character sets in C.
Source Character Set (SCS):
Before preprocessing, SCS is used to parse the source code into an internal representation. White-space characters and the Basic Character package are included in this package. It is the collection of symbols that can be used to create source code. The initial stage of the C PreProcessor (CPP) is to translate the encoding of the source code into the Source Character Set (SCS), which is done before the preprocessing phase.
Execution Character Set (ECS):
Constants for character strings are stored in ECS. This set includes Control Characters, Escape Sequences, and the Basic Character Set. It is the set of characters that the program in use can decipher. CPP converts character and string constant encoding into the Execution Character Set (ECS) following the preprocessing phase.
The use of utility functions found in C is also used to describe the various sorts of character sets.
Both the Source Character Set and the Execution Character Set use UTF-8 encoding by default in CPP. The following compiler flags allow the user to alter them.
-finput-charset is used to set SCS.
Basic Character Set
Alphabets: It has both capital and lowercase letters. Lowercase ASCII characters fall within the range [97, 122], and uppercase ASCII characters fall within the range [65, 90].
Example: A, B, A, B, etc.
The difference between uppercase and lowercase characters is minimal.
Utility Functions:
isalpha, islower, and isupper determine whether a character is an uppercase, lowercase, or alphabet. The alphabets are changed to the proper case using tolower and toupper.Digits:
It includes numbers 0 through 9 inclusively. The range of the ASCII digits is [48, 57].Example: 0, 1, 2, etc.
Utility functions:
The function isdigit determines if the supplied character is a digit. The function isalnum determines if a character is an alphanumeric character.
Punctuation/Special Characters:
The following characters are classified as punctuation by the default C locale.
Utility functions:
Character | ASCII | Detail |
---|---|---|
! | 33 | Bang, exclamation point, or exclamation mark. |
" | 34 | Inverted commas, quote marks, or quotations. |
# | 35 | Hash, number, pound, octothorpe, or sharp. |
$ | 36 | Dollar sign or generic currency. |
% | 37 | Percent. |
& | 38 | Symbols for an ampersand, epershand, or and. |
' | 39 | single quote or an apostrophe. |
( | 40 | Open or left parenthesis. |
) | 41 | Right or close parenthesis. |
* | 42 | Asterisk, often known as a star occasionally, is a mathematical sign for multiplying two numbers. |
+ | 43 | Plus. |
, | 44 | Comma. |
- | 45 | Dash, hyphen, or minus sign. |
. | 46 | a comma, a dot, or a full stop. |
/ | 47 | Solidus, virgule, whack, forward slash, and division symbol in mathematics. |
: | 58 | Colon. |
; | 59 | Semicolon. |
< | 60 | or angle brackets for less. |
= | 61 | Equal. |
> | 62 | or angle brackets for greater than. |
? | 63 | Inquiry mark. |
@ | 64 | Arobase, asperand, at, or the at symbol. |
[ | 91 | Enable brackets. |
\ | 92 | Solidus in reverse or backslash. |
] | 93 | Open bracket. |
^ | 94 | circumflex or caret. |
_ | 95 | Underscore. |
' | 96 | A push, left or open quotation, backtick, backquote, grave, grave accent, or acute. |
{ | 123 | Open brace, squiggly brackets, or curly bracket. |
} | 125 | Close brace, squiggly brackets, or curly bracket. |
~ | 126 | Tilde. |
Control Character Set
The ASCII codes for these characters run from 0 to 31 (inclusive) and the 127th character. Although they are not visible, they still impact the program in several ways. In contrast to Backspace on the keyboard, which deletes the previous character, the a (BEL) character may create a beep sound or screen flashing when printed, and the b (BS) character moves the cursor one step back.
Utility Functions:
ASCII | Abbreviation |
---|---|
00 | NUL '\0' (null character) |
01 | SOH (start of heading) |
02 | STX (start of text) |
03 | ETX (end of text) |
04 | EOT (end of transmission) |
05 | ENQ (enquiry) |
06 | ACK (acknowledge) |
07 | BEL '\a' (bell) |
08 | BS '\b' (backspace) |
14 | SO (shift out) |
15 | SI (shift in) |
16 | DLE (data link escape) |
17 | DC1 (device control 1) |
18 | DC2 (device control 2) |
19 | DC3 (device control 3) |
20 | DC4 (device control 4) |
21 | NAK (negative ack.) |
22 | SYN (synchronous idle) |
23 | ETB (end of trans. blk) |
24 | CAN (cancel) |
25 | EM (end of medium) |
26 | SUB (substitute) |
27 | ESC (escape) |
28 | FS (file separator) |
29 | GS (group separator) |
30 | RS (record separator) |
31 | US (unit separator) |
127 | DEL (delete) |
Escape Sequences:
The Execution Character Set includes these characters. You can use the backslash (/) key to distinguish these characters. Although it consists of two or more characters, C PreProcessor only counts them as one.Example: a, b, t, etc.
Utility Functions:
The function isspace determines whether a character is a space.Character | ASCII | Detail |
---|---|---|
<space> | 32 | space (SPC) |
\t | 9 | horizontal tab (TAB) |
\n | 10 | newline (LF) |
\v | 11 | vertical tab (VT) |
\f | 12 | feed (FF) |
\r | 13 | carriage return (CR) |
Example:
Let's take an example to print all the character:
Output:
| Character | ASCII | Type |
| :-------: | ----: | :---------- |
| | 32 | Space |
| ! | 33 | Punctuation |
| " | 34 | Punctuation |
| # | 35 | Punctuation |
| $ | 36 | Punctuation |
| % | 37 | Punctuation |
| &| 38 | Punctuation |
| ' | 39 | Punctuation |
| ( | 40 | Punctuation |
| ) | 41 | Punctuation |
| * | 42 | Punctuation |
| + | 43 | Punctuation |
| , | 44 | Punctuation |
| - | 45 | Punctuation |
| . | 46 | Punctuation |
| / | 47 | Punctuation |
| 0 | 48 | Digit |
| 1 | 49 | Digit |
| 2 | 50 | Digit |
| 3 | 51 | Digit |
| 4 | 52 | Digit |
| 5 | 53 | Digit |
| 6 | 54 | Digit |
| 7 | 55 | Digit |
| 8 | 56 | Digit |
| 9 | 57 | Digit |
| : | 58 | Punctuation |
| ; | 59 | Punctuation |
| <| 60 | Punctuation |
| = | 61 | Punctuation |
| >| 62 | Punctuation |
| ? | 63 | Punctuation |
| @ | 64 | Punctuation |
| A | 65 | Alphabet |
| B | 66 | Alphabet |
| C | 67 | Alphabet |
| D | 68 | Alphabet |
| E | 69 | Alphabet |
| F | 70 | Alphabet |
| G | 71 | Alphabet |
| H | 72 | Alphabet |
| I | 73 | Alphabet |
| J | 74 | Alphabet |
| K | 75 | Alphabet |
| L | 76 | Alphabet |
| M | 77 | Alphabet |
| N | 78 | Alphabet |
| O | 79 | Alphabet |
| P | 80 | Alphabet |
| Q | 81 | Alphabet |
| R | 82 | Alphabet |
| S | 83 | Alphabet |
| T | 84 | Alphabet |
| U | 85 | Alphabet |
| V | 86 | Alphabet |
| W | 87 | Alphabet |
| X | 88 | Alphabet |
| Y | 89 | Alphabet |
| Z | 90 | Alphabet |
| [ | 91 | Punctuation |
| \ | 92 | Punctuation |
| ] | 93 | Punctuation |
| ^ | 94 | Punctuation |
| _ | 95 | Punctuation |
| ` | 96 | Punctuation |
| a | 97 | Alphabet |
| b | 98 | Alphabet |
| c | 99 | Alphabet |
| d | 100 | Alphabet |
| e | 101 | Alphabet |
| f | 102 | Alphabet |
| g | 103 | Alphabet |
| h | 104 | Alphabet |
| i | 105 | Alphabet |
| j | 106 | Alphabet |
| k | 107 | Alphabet |
| l | 108 | Alphabet |
| m | 109 | Alphabet |
| n | 110 | Alphabet |
| o | 111 | Alphabet |
| p | 112 | Alphabet |
| q | 113 | Alphabet |
| r | 114 | Alphabet |
| s | 115 | Alphabet |
| t | 116 | Alphabet |
| u | 117 | Alphabet |
| v | 118 | Alphabet |
| w | 119 | Alphabet |
| x | 120 | Alphabet |
| y | 121 | Alphabet |
| z | 122 | Alphabet |
| { | 123 | Punctuation |
| | | 124 | Punctuation |
| } | 125 | Punctuation |
| ~ | 126 | Punctuation |
Conclusion:
The Source Character Set (SCS) and Execution Character Set (ECS) are the two different character sets available in the C language.
Before preprocessing, SCS is created from source code by CPP. CPP preprocesses character and string constants before being converted into ECS. Despite appearing to be blank, space characters have an impact on the text. Despite being visually absent, control characters can execute a variety of tasks, such as making a bell ring (a), moving the pointer to the left (b), etc.
There are many useful functions to work with characters in ctype.h, such as isalpha and isdigit.
There are many useful functions to work with characters in ctype.h, such as isalpha and isdigit.
0 Comments