Encode URL in HTML

URL encoding converts characters into a format that can transmit data over the Internet which is not allowed to be put into the URL directly. Let us understand HTML URL Encode.

What is the URL?

  • URL abbreviation of Uniform Resource Locator is defined as a web address for the document and other resources on the World  Wide Web.
  • URLs can only be sent over the Internet using the ASCII Character Set like alphabets A-Z and a-z, numbers 0-9, and few special characters.
  • Its main purpose is to recognize the address of the document available on the webserver.

URL Syntax

scheme://prefix.domain:port/path/filename
  • The URL starts with the scheme which identifies the protocol to be used to access the resource on the Internet. the common URL schemes used are
    1.  Http  –  HyperText Transfer Protocol.
    2.  Https – Secure HyperText Transfer Protocol.
    3.  ftp    – File Transfer Protocol.
    4.  file   – A file on the computer.
  • the prefix of the domain name is given by for eg www.codedec.com consists of the prefix www. and the domain name consists of codede.com.
  • the port number request the server to provide service which is generally hidden in the URL, the web service HTTP runs by default over port 80.
  • path defines the specific location of the resource in which the user wants to access for eg: /desktop/frames/photos/. and Filename identifies the name of the document or the resource.

What is URL Encoding?

URL encoding also known as percent-encoding is a process of encoding URL data so that it can be safely transmitted over the internet. The URL contains only the ASCII Character Set therefor other Reserved, Non-ASCII, and Unsafe Character sets are translated into the valid format accepted by the web browsers.

  •  The default character set in HTML 5 is UTF-8 and the data is encoded in this standard.
  • URL encoding replaces non-ASCII characters with a “%” followed by hexadecimal digits.
  • The Non-ASCII Characters are encoded in % HH  that is hexadecimal value format. Also, the URLs cannot contain spaces therefore it encodes and replaces the space with a plus (+) sign, or %20.

ASCII Control characters:

ASCII Characters which cannot be displayed in the URL are ranging from 00-1F in hex (0-31 decimal) and 7F (127 decimal). These characters need to be encoded.

Some ASCII Characters are given below.

Non-ASCII control characters

Non- ASCII character which cannot be displayed in the URL are set 80-FF hex (128-255 decimal) that is 128 characters. These characters need to be encoded.

Some Non- ASCII Characters are given below.

Reserved characters

Characters which can mix with the URL code and create an ambiguity in translating the information over the internet for example use of a colon( : ) in the URL  to separate scheme part from other parts of the URL, so if the data conflicts with the reserved set of character then the character is encoded.

Some Reserved Characters are given below.

Unsafe characters

Characters which can mix with URL code and create an ambiguity in translating the information over the internet. These characters can be misunderstood with the URLs for example Curly Brace, Right Curly Brace, Pipe, Backslash, Caret, Tilde, Left Square Bracket.

Some Reserved Characters are given below.

Example of HTML URL Encoding

Decode of the URL text of the HTML page of the Codedec website.

https://codedec.com/course/step-by-step-html-tutorial/

Encode the URL text of the HTML page of the Codedec website.

https%3A%2F%2Fcodedec.com%2Fcourse%2Fstep-by-step-html-tutorial%2F