NTA UGC NET 2023 » NTA Study Materials » Computer Science » Methods to Represent Characters

Methods to Represent Characters

The word "character" refers to a unit of information that generally equates to a grapheme, graphene-like unit, or symbol in the written form of a natural language, such as a letter or syllabary in a computer or machine-based telecommunications context.

Using the keyboard is the most popular way for a computer user to enter a character into a computer system. Smaller devices, such as a smartphone that do not have a physical keyboard, may use a virtual keyboard to provide input. It is also feasible to enter a character by using speech recognition software. To transform them, we employ encoding. Encoding is the technique or process by which a sequence of characters, including letters, numbers, punctuation, and symbols, is converted into a specific or unique format for transmission or storage in computers.

Encoding methods

Various encoding methods are used to represent data in computers, including the American Standard Code for Information Interchange (ASCII), UTF8, Extended Binary Coded Decimal Interchange Code (EBCDIC), ISCII, and Unicode encoding schemes. To provide an example, the number 65 is represented as an A since conventional encoding techniques assign unique codes to each letter, symbol, and number that appears on a page.

Indian Standard Code for Information Interchange (ISCII)

In the case of Indian local languages, ISCII is the method for dealing with the character of such languages. This corresponds to an 8-bit coding scheme. It can Take care of 256 (28) characters. This system was developed by the Indian Department of Electronics in the years 1986-88 and has been approved by the Bureau of Indian Standards (BIS). This coding method has now been incorporated into the Unicode standard.

Encoding

Characters are represented in computers and communication equipment by a character encoding system that associates each character with something – often, an integer amount represented by a series of digits that can be stored or communicated across a network. The American Standard Code for Information Interchange (ASCII) character set and the Unicode UTF-8 encoding are two examples of common encodings. When compared to the majority of character encodings, which map characters to integers and/or bit sequences, Morse code instead encodes characters via a series of electrical impulse sequences of different duration.

Unicode

The vast majority of current computers use this coding method. Unicode is the most widely used coding method after ASCII. ASCII can only represent a maximum of 256 characters. As a result, ASCII can only handle the languages of English and European origin, not any other languages. There was a particular instance in which the ASCII character set could not represent languages such as Tamil, Malayalam, Kannada, and Telugu. As a result, the Unicode coding system was developed to accommodate all of the coding systems of Universal languages. 16-bit code has a maximum character limit of 655336 characters.

UTF-8

It is a kind of character encoding that has variable widths and is employed in the field of electronic communication. As part of the design process, the developers ensured that the encoding scheme was ASCII compatible and that the first 128 Unicode characters that are one-to-one matches for the ASCII characters are encoded using only one byte with the same Binary Coded Decimal (BCD) value as the ASCII characters.

UTF-32

UTF-32 is an abbreviation for the Unicode Transformation Format for 32-bit characters. With 32 bits per code point, it encodes Unicode code points in a fixed-length encoding with fixed-length encoding. It utilizes four bytes for each character, and we can count the number of characters in a UTF-32 string simply by counting the number of bytes in the string. The real benefit of utilizing UTF-32 is that Unicode code points may be directly indexed (however, certain characters, such as “grapheme clusters” or some emojis, cannot be directly indexed, making estimating the displayed width of a string more complicated). Finding the Nth code point in a series of code points is an example of a constant-time operation. If, on the other hand, a variable-length code is used, sequential access is required to identify the Nth code point in a sequence. Thus, UTF-32 is an easy equivalent for ASCII code in that it checks each problem in a string using integers that are increased by one for each issue found.

Conclusion

Binary Coded Decimal (BCD) numbers are a convenient way to represent a binary value. They do not adequately reflect the huge diversity of numbering systems that may be found in computers. So far, all of the numbers used in this book have been Binary Coded Decimal (BCD) numbers and will continue to be the case.

Character data and integer data are two different forms of representation in computer science. Humans deal with information in the form of symbolic alphabetic and numeric information. Computers deal with binary-coded decimal (BCD) data, where each wire can conduct an electric or non-electric current.

Frequently asked questions

Get answers to the most common queries related to the NTA Examination Preparation.

What are some examples of characters?

Ans. The following are examples of characters: letters, numerical digits, frequent punctuation marks (such as the le...Read full

What exactly do you mean by "regular characters"?

Ans. Like an atom in regular expressions, the normal character designates the singleton set of strings that contains...Read full

What is the maximum number of characters on a keyboard?

Ans. There is no universally accepted standard for the number of keys, buttons, or characters on a keyboard; neverth...Read full

What is the method of using a code character set?

Ans. When text is encoded and saved using the ASCII character set, each character is allocated a denary (decimal) ch...Read full

What is the most common method of representing computer characters or words?

Ans. When text is represented on a computer, most computers utilise the ASCII code, which makes it easy to transmit ...Read full