HTML Charset also defined as HTML Character set is used to set standard encoding text which displays the HTML page correctly. For displaying the page properly a web browser must know what character-set ( character encoding ) to use. Let us understand HTML Charset in detail.
Character Encoding in HTML
The Character set or Character Encoding has different character Encoding standards to ensure that your web page displays correctly across different web browsers and platforms.
- UTF-8 is the default character encoding for HTML 5 but before that ASCII was the character set for HTML And the IS0-8859-1 was the default character set for HTML older versions.
Lets discuss the detail of various types of Character Encoding.
ASCII Character Set
- ASCII was the first character encoding standard is known as the American Standard Code for Information Interchange.
- ASCII supports the binary number code to store the data of the Characters and it defines 128 different characters that could be used on the web browsers from numbers (0-9) to English uppercase and lower case letters (a-z, A-Z), and some special characters like ! $ + – ( ) @ < >.
- ASCII disadvantage was that it excluded the non- English from its character set.
- ASCII is nowadays used mostly into computing mainframes like C/C++ programming.
- ASCII uses the values from 0 to 31 (and 127) for control characters with 32 to 126 for letters, digits, and symbols, and values from 128 to 255 are not used.
ANSI Character Set
- ANSI Character Encoding standard is known as the American National Standards Institute.
- ANSI which was also called Windows-1252 was the default character set Microsoft Windows till Windows 95.
- ANSI is the extended version of ASCII with added Extra international Characters. It supported 256 characters using a full byte (8-bits) binary code.
- ANSI has a default set of characters for the values from 128 to 159 with the same as ASCII for the values from 0 to 127 and also the value from 160 to 255 is the same as UTF-8.
ISO-8859-1 Character Set
- ISO-8859-1is known as the International Standards Organization (ISO) is used as a default Character set for HTML 4.
- ISO-8859-1 is also the extended version of ASCII with added Extra international Characters.
- It supported 256 characters using a full byte (8-bits) binary code. HTML 4 also supports UTF-8.
- In ISO-8859-1, the characters from 128 to 159 are not defined therefore most of the websites use characters from the Windows-1252 character set to display the characters.
- ISO-8859-1 is the same as ASCII for the values from 0 to 127.and the values from 128 to 159 is not used also the value from 160 to 255 is the same as UTF-8.
Web browser must know the Character Encoding Standard used.
Character Encoding standard ISO-8859-1 in HTML 4 specified with < meta> tag of HTML document.
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
Different Character Encoding standard than ISO-8859-1 in HTML 4 can be specified with <meta> tag.
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-5">
ISO-8859 Charset Variants
UTF-8 Character Set
- UTF -8 is a default Character Encoding standard for HTML 5, it is developed by the Unicode Consortium and defined as a Unicode Translation Format. it is used to display all the different languages with this encoding method.
- All the above Character Set are limited and not so compatible with a multilingual environment therefore the Unicode standard covers all Characters and Symbols used in the web browsers.
- ANSI and ISO-8859-1 were so limited with a Character set that HTML 4 also supported UTF-8.
- UTF-8 is the same as ASCII for the values from 0 to 127 and the values from 128 to 159 are not used also ANSI and 8859-1 values from 160 to 255 are the same as UTF-8 then furthermore from the value 256 with more than 10 000 different characters used in UTF-8.
Character Encoding standard UTF-8 in HTML 5 with charset Attribute specified with < meta> tag of HTML document.
UTF Charset Variants
Basic Example of HTML Character Encoding
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> </head> <body> <h2>Example of HTML Character set</h2> <p>sample HTML text:</p> <textarea cols="45"rows="3"> WELCOME TO CODEDEC , THIS IS ONLY A SMALE EXMAPLE SHOWING CHARSET IN HTML DOCUMENT. </textarea> <br> <input type="submit" value="submit"> </body> </html>