'C'Character Set

Introduction:

Character Set is a collection of permissible characters that can be used in a variety of contexts by the program. In this article, we covered the history of Character encoding's. Here, we also discuss the historical encoding system known as EBCDIC, the current coding standard Unicode, and ASCIl.

The character set in C programming language is a collection of characters that can be used to write code, input/output data, and manipulate strings. The C character set includes all the lowercase and uppercase letters of the English alphabet, digits from 0 to 9, and various special characters such as punctuation marks, mathematical symbols, and other symbols. The ASCII character set is used by the C programming language, which includes 128 characters. The extended ASCII character set includes 256 characters.

Source Character Set (SCS):

Before preprocessing, SCS is used to parse the source code into an internal representation. White-space characters and the Basic Character package are included in this package. It is the collection of symbols that can be used to create source code. The initial stage of the C PreProcessor (CPP) is to translate the encoding of the source code into the Source Character Set (SCS), which is done before the preprocessing phase.

Execution Character Set (ECS):

Constants for character strings are stored in ECS. This set includes Control Characters, Escape Sequences, and the Basic Character Set. It is the set of characters that the program in use can decipher. CPP converts character and string constant encoding into the Execution Character Set (ECS) following the preprocessing phase.

The use of utility functions found in C is also used to describe the various sorts of character sets.

Both the Source Character Set and the Execution Character Set use UTF-8 encoding by default in CPP. The following compiler flags allow the user to alter them.

-finput-charset is used to set SCS.

Usage: gccmain.c -finput-charset=UTF-8

-fexec-charset is used to set ECS.

Usage: gccmain.c -fexec-charset=UTF-8

Basic Character Set

Origin and Method Characters in character sets are rarely shared. The Basic Character Set refers to the collection of standard characters. Let's talk about it in more detail below:

Alphabets: It has both capital and lowercase letters. Lowercase ASCII characters fall within the range [97, 122], and uppercase ASCII characters fall within the range [65, 90].

Example: A, B, A, B, etc.

The difference between uppercase and lowercase characters is minimal.

Utility Functions:

isalpha, islower, and isupper determine whether a character is an uppercase, lowercase, or alphabet. The alphabets are changed to the proper case using tolower and toupper.

Digits:

It includes numbers 0 through 9 inclusively. The range of the ASCII digits is [48, 57].

Example: 0, 1, 2, etc.

Utility functions:

The function isdigit determines if the supplied character is a digit. The function isalnum determines if a character is an alphanumeric character.

Punctuation/Special Characters:

The following characters are classified as punctuation by the default C locale.

Utility functions:

he function ispunct determines if a character is a punctuation character. The ASCII code and usage examples for each punctuation character are included in the table below.

Character	ASCII	Detail
!	33	Bang, exclamation point, or exclamation mark.
"	34	Inverted commas, quote marks, or quotations.
#	35	Hash, number, pound, octothorpe, or sharp.
$	36	Dollar sign or generic currency.
%	37	Percent.
&	38	Symbols for an ampersand, epershand, or and.
'	39	single quote or an apostrophe.
(	40	Open or left parenthesis.
)	41	Right or close parenthesis.
*	42	Asterisk, often known as a star occasionally, is a mathematical sign for multiplying two numbers.
+	43	Plus.
,	44	Comma.
-	45	Dash, hyphen, or minus sign.
.	46	a comma, a dot, or a full stop.
/	47	Solidus, virgule, whack, forward slash, and division symbol in mathematics.
:	58	Colon.
;	59	Semicolon.
<	60	or angle brackets for less.
=	61	Equal.
>	62	or angle brackets for greater than.
?	63	Inquiry mark.
@	64	Arobase, asperand, at, or the at symbol.
[	91	Enable brackets.
\	92	Solidus in reverse or backslash.
]	93	Open bracket.
^	94	circumflex or caret.
_	95	Underscore.
'	96	A push, left or open quotation, backtick, backquote, grave, grave accent, or acute.
{	123	Open brace, squiggly brackets, or curly bracket.
}	125	Close brace, squiggly brackets, or curly bracket.
~	126	Tilde.

Control Character Set

The ASCII codes for these characters run from 0 to 31 (inclusive) and the 127th character. Although they are not visible, they still impact the program in several ways. In contrast to Backspace on the keyboard, which deletes the previous character, the a (BEL) character may create a beep sound or screen flashing when printed, and the b (BS) character moves the cursor one step back.
Utility Functions:

The function iscntrl determines if a character is a control character.
ASCII Abbreviation
00 NUL '\0' (null character)
01 SOH (start of heading)
02 STX (start of text)
03 ETX (end of text)
04 EOT (end of transmission)
05 ENQ (enquiry)
06 ACK (acknowledge)
07 BEL '\a' (bell)
08 BS '\b' (backspace)
14 SO (shift out)
15 SI (shift in)
16 DLE (data link escape)
17 DC1 (device control 1)
18 DC2 (device control 2)
19 DC3 (device control 3)
20 DC4 (device control 4)
21 NAK (negative ack.)
22 SYN (synchronous idle)
23 ETB (end of trans. blk)
24 CAN (cancel)
25 EM (end of medium)
26 SUB (substitute)
27 ESC (escape)
28 FS (file separator)
29 GS (group separator)
30 RS (record separator)
31 US (unit separator)
127 DEL (delete)
Escape Sequences:
The Execution Character Set includes these characters. You can use the backslash (/) key to distinguish these characters. Although it consists of two or more characters, C PreProcessor only counts them as one.
Example: a, b, t, etc.

White-space characters: The Source Character Set includes these individuals. They have an impact on the displayed text but are visually invisible.
Utility Functions:
The function isspace determines whether a character is a space.
Character ASCII Detail
<space> 32 space (SPC)
\t 9 horizontal tab (TAB)
\n 10 newline (LF)
\v 11 vertical tab (VT)
\f 12 feed (FF)
\r 13 carriage return (CR)

ASCII	Abbreviation
00	NUL '\0' (null character)
01	SOH (start of heading)
02	STX (start of text)
03	ETX (end of text)
04	EOT (end of transmission)
05	ENQ (enquiry)
06	ACK (acknowledge)
07	BEL '\a' (bell)
08	BS '\b' (backspace)
14	SO (shift out)
15	SI (shift in)
16	DLE (data link escape)
17	DC1 (device control 1)
18	DC2 (device control 2)
19	DC3 (device control 3)
20	DC4 (device control 4)
21	NAK (negative ack.)
22	SYN (synchronous idle)
23	ETB (end of trans. blk)
24	CAN (cancel)
25	EM (end of medium)
26	SUB (substitute)
27	ESC (escape)
28	FS (file separator)
29	GS (group separator)
30	RS (record separator)
31	US (unit separator)
127	DEL (delete)

Character	ASCII	Detail
<space>	32	space (SPC)
\t	9	horizontal tab (TAB)
\n	10	newline (LF)
\v	11	vertical tab (VT)
\f	12	feed (FF)
\r	13	carriage return (CR)

Example:

Let's take an example to print all the character:

#include <stdio.h>
#include <ctype.h>
int main() {
printf("| Character | ASCII | Type |\n");
printf("| :-------: | ----: | :---------- |\n");
for (int i = 32; i< 127; i++) {
printf("| %3c | %3d | ", i, i);
if (isalpha(i))
printf("Alphabet |\n");
else if (isdigit(i))
printf("Digit |\n");
else if (ispunct(i))
printf("Punctuation |\n");
else if (isspace(i))
printf("Space |\n");
else if (iscntrl(i))
printf("Control |\n");
}
return 0;
}

Output:

| Character | ASCII | Type        |
| :-------: | ----: | :---------- |
|           |  32   | Space       |
|    !      |  33   | Punctuation |
|    "      |  34   | Punctuation |
|    #      |  35   | Punctuation |
|    $      |  36   | Punctuation |
|    %      |  37   | Punctuation |
|    &|  38   | Punctuation |
|    '      |  39   | Punctuation |
|    (      |  40   | Punctuation |
|    )      |  41   | Punctuation |
|    *      |  42   | Punctuation |
|    +      |  43   | Punctuation |
|    ,      |  44   | Punctuation |
|    -      |  45   | Punctuation |
|   .      |  46   | Punctuation |
|    /      |  47   | Punctuation |
|    0      |  48   | Digit       |
|    1      |  49   | Digit       |
|    2      |  50   | Digit       |
|    3      |  51   | Digit       |
|    4      |  52   | Digit       |
|    5      |  53   | Digit       |
|    6      |  54   | Digit       |
|    7      |  55   | Digit       |
|    8      |  56   | Digit       |
|    9      |  57   | Digit       |
|    :      |  58   | Punctuation |
|    ;      |  59   | Punctuation |
|    <|  60   | Punctuation |
|    =      |  61   | Punctuation |
|    >|  62   | Punctuation |
|    ?      |  63   | Punctuation |
|    @      |  64   | Punctuation |
|    A      |  65   | Alphabet    |
|    B      |  66   | Alphabet    |
|    C      |  67   | Alphabet    |
|    D      |  68   | Alphabet    |
|    E      |  69   | Alphabet    |
|    F      |  70   | Alphabet    |
|    G      |  71   | Alphabet    |
|    H      |  72   | Alphabet    |
|    I      |  73   | Alphabet    |
|    J      |  74   | Alphabet    |
|    K      |  75   | Alphabet    |
|    L      |  76   | Alphabet    |
|    M      |  77   | Alphabet    |
|    N      |  78   | Alphabet    |
|    O      |  79   | Alphabet    |
|    P      |  80   | Alphabet    |
|    Q      |  81   | Alphabet    |
|    R      |  82   | Alphabet    |
|    S      |  83   | Alphabet    |
|    T      |  84   | Alphabet    |
|    U      |  85   | Alphabet    |
|    V      |  86   | Alphabet    |
|    W      |  87   | Alphabet    |
|    X      |  88   | Alphabet    |
|    Y      |  89   | Alphabet    |
|    Z      |  90   | Alphabet    |
|    [      |  91   | Punctuation |
|    \      |  92   | Punctuation |
|    ]      |  93   | Punctuation |
|    ^      |  94   | Punctuation |
|    _      |  95   | Punctuation |
|    `      |  96   | Punctuation |
|    a      |  97   | Alphabet    |
|    b      |  98   | Alphabet    |
|    c      |  99   | Alphabet    |
|    d      | 100   | Alphabet    |
|    e      | 101   | Alphabet    |
|    f      | 102   | Alphabet    |
|    g      | 103   | Alphabet    |
|    h      | 104   | Alphabet    |
|    i      | 105   | Alphabet    |
|    j      | 106   | Alphabet    |
|    k      | 107   | Alphabet    |
|    l      | 108   | Alphabet    |
|    m      | 109   | Alphabet    |
|    n      | 110   | Alphabet    |
|    o      | 111   | Alphabet    |
|    p      | 112   | Alphabet    |
|    q      | 113   | Alphabet    |
|    r      | 114   | Alphabet    |
|    s      | 115   | Alphabet    |
|    t      | 116   | Alphabet    |
|    u      | 117   | Alphabet    |
|    v      | 118   | Alphabet    |
|    w      | 119   | Alphabet    |
|    x      | 120   | Alphabet    |
|    y      | 121   | Alphabet    |
|    z      | 122   | Alphabet    |
|    {      | 123   | Punctuation |
|    |      | 124   | Punctuation |
|    }      | 125   | Punctuation |
|    ~      | 126   | Punctuation |

Conclusion:

The Source Character Set (SCS) and Execution Character Set (ECS) are the two different character sets available in the C language.

Before preprocessing, SCS is created from source code by CPP. CPP preprocesses character and string constants before being converted into ECS. Despite appearing to be blank, space characters have an impact on the text. Despite being visually absent, control characters can execute a variety of tasks, such as making a bell ring (a), moving the pointer to the left (b), etc.

'C'Character Set | Types of Character Set

'C'Character Set

Introduction:

Source Character Set (SCS):

Execution Character Set (ECS):

Basic Character Set

Utility functions:

Control Character Set

Example:

Conclusion:

The Source Character Set (SCS) and Execution Character Set (ECS) are the two different character sets available in the C language.

There are many useful functions to work with characters in ctype.h, such as isalpha and isdigit.

Post a Comment

0 Comments

Popular Posts

Learn C Programming | What is C? |Why Learn C? |Difference between C and C++

Define structure in c

Compilation process in c

'C'Character Set | Types of Character Set

Programming Guru

Contact form

About Us

Footer Menu Widget

'C'Character Set | Types of Character Set

'C'Character Set

Introduction:

Source Character Set (SCS):

Execution Character Set (ECS):

Basic Character Set

Utility functions:

Control Character Set

Example:

Conclusion:

The Source Character Set (SCS) and Execution Character Set (ECS) are the two different character sets available in the C language.

There are many useful functions to work with characters in ctype.h, such as isalpha and isdigit.

Post a Comment

0 Comments

Social Plugin

Popular Posts

Learn C Programming | What is C? |Why Learn C? |Difference between C and C++

Define structure in c

Compilation process in c

'C'Character Set | Types of Character Set

Programming Guru

Contact form

About Us

Footer Menu Widget