Description of encodings
For the HTML entities encoding, the he “HTML entities” library by Mathias Bynens is used. The different encodings can be characterized as follows:
- URL encoding: According to the W3C/HTML standard, GET variables passed along in URLs have to be encoded using URL-encoding. For example, spaces are replaced by
%20and a lot of other characters (colons, slashes, etc.) in a similar fashion via a percentage sign and a numerical. Essentially, this is a simple 1-to-1 encoding that replaces certain special characters with a numerical code and a prefix.
- HTML encoding: The HTML web language has a number of special characters, which are relevant to distinguish between structural HTML code and actual text content. Therefore the characters
"are forbidden to be used anywhere in the HTML code, and are replaced by the named entities
"if they are required within the text content. Likewise, a large number of Unicode characters can be expressed by such named entities like
©(for the copyright symbol) or by their Unicode number using
'(representing a single apostroph).
- Base64 encoding: A single binary byte of 8-bit information can hold one of \(2^8=256\) different symbols. However, a lot of those characters have a special interpretation for plain text content, for example an end-of-line or end-of-text character. Base64 encoding splits an 8-bit binary data string into 6-bit pieces, each of which can be represented by one of the \(2^6 = 64\) printable text character (
/). Therefore, 3 binary bytes are converted to 4 text characters, and
=is used for padding the final 3-byte group. This encoding is used whenever binary data has to be included in texts, for example in DataURLs, as binary eMail attachments, etc.
- Hex encoding: An 8-bit byte can be represented as a 2-digit number in base-16, better known as “hex” or “hexadecimal”. This base-16 number is represented by the 16 printable characters (
- Binary encoding: Binary encoding expresses an 8-bit byte in terms of the eight 1s and 0s that make it up in base-2. This is essentially the natural language that every digital computer uses to represent characters and numericals inside its circuitry.
If the option to “add encoding spaces” is activated, spaces are added to the encoded string, splitting a base-64 encoded string into 4-character pieces (representing 3 of the original bytes), a hex-encoded string into 2-character pieces (representing 1 original byte) and a binary string into 8-character pieces (representing 1 original byte).