Base 32 Encoding#
Using JavaScript/Typescript and Node.js to understand the basics on how Base32 encoding
and decoding works that follows the RFC 4648 standard format.
This may be a good exercise for your programming language when you want to play around with encoding and binary data.
Quickstart#
Code snippet that you can quickly copy.
- Typescript - This works in general for both web (browser) and server-side.
- Node.js - This only works in the server / Node.js environment as it uses the
node:buffermodule.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
-
Why not use
String.fromCharCode(decimal)?Due to inaccuracy and limitations, since
String.fromCharCode(x)converts Unicode values to characters which is limited only to text data, but NOT binary data.Hence, we'll use
Uint8Arrayinstead since we're dealing with binary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
- To learn more about Buffers, see this post.
Read the section below to understand how the logic works
How it works?#
This content follows the RFC 4648 standard format.
Base 32 encoding according to RFC 4648
The following description of base 32 is derived from [11] (with
corrections). This encoding may be referred to as "base32".
The Base 32 encoding is designed to represent arbitrary sequences of
octets in a form that needs to be case insensitive but that need not
be human readable.
A 33-character subset of US-ASCII is used, enabling 5 bits to be
represented per printable character. (The extra 33rd character, "=",
is used to signify a special processing function.)
The encoding process represents 40-bit groups of input bits as output
strings of 8 encoded characters. Proceeding from left to right, a
40-bit input group is formed by concatenating 5 8bit input groups.
These 40 bits are then treated as 8 concatenated 5-bit groups, each
of which is translated into a single character in the base 32
alphabet. When a bit stream is encoded via the base 32 encoding, the
bit stream must be presumed to be ordered with the most-significant-
bit first. That is, the first bit in the stream will be the high-
order bit in the first 8bit byte, the eighth bit will be the low-
order bit in the first 8bit byte, and so on.
Josefsson Standards Track [Page 8]
RFC 4648 Base-N Encodings October 2006
Each 5-bit group is used as an index into an array of 32 printable
characters. The character referenced by the index is placed in the
output string. These characters, identified in Table 3, below, are
selected from US-ASCII digits and uppercase letters.
Table 3: The Base 32 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 9 J 18 S 27 3
1 B 10 K 19 T 28 4
2 C 11 L 20 U 29 5
3 D 12 M 21 V 30 6
4 E 13 N 22 W 31 7
5 F 14 O 23 X
6 G 15 P 24 Y (pad) =
7 H 16 Q 25 Z
8 I 17 R 26 2
Special processing is performed if fewer than 40 bits are available
at the end of the data being encoded. A full encoding quantum is
always completed at the end of a body. When fewer than 40 input bits
are available in an input group, bits with value zero are added (on
the right) to form an integral number of 5-bit groups. Padding at
the end of the data is performed using the "=" character. Since all
base 32 input is an integral number of octets, only the following
cases can arise:
(1) The final quantum of encoding input is an integral multiple of 40
bits; here, the final unit of encoded output will be an integral
multiple of 8 characters with no "=" padding.
(2) The final quantum of encoding input is exactly 8 bits; here, the
final unit of encoded output will be two characters followed by
six "=" padding characters.
(3) The final quantum of encoding input is exactly 16 bits; here, the
final unit of encoded output will be four characters followed by
four "=" padding characters.
(4) The final quantum of encoding input is exactly 24 bits; here, the
final unit of encoded output will be five characters followed by
three "=" padding characters.
(5) The final quantum of encoding input is exactly 32 bits; here, the
final unit of encoded output will be seven characters followed by
one "=" padding character.
Encoding#
Given input: IU
- Step 1: Convert each character to decimal value based on ASCII table 1
-
->
7385 - Step 2: Convert each decimal to its binary form
-
->
'01001001''01010101'
💡 Length of each binary must be 8-bits, hence, pad start zero as necessary.
- Step 3: Concatenate all bits
-
->
01001001010101010000
💡 Length must be a multiple of 5, hence, fill end with zero as necessary.
- ->
01001001010101010000 - Step 4: Split each chunk by
5to form seperate groups or a binary of 5-bits -
->
01001001010101010000 - Step 5: Convert each 5-bits to its corresponding decimal value
-
->
951016
Step 6: Convert each decimal to based on Base 32 Alphabet (1)
-
Base 32 Alphabet Table
Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y (pad) = 7 H 16 Q 25 Z 8 I 17 R 26 2
- ->
JFKQ
Step 7: Add = as padding to fill vacant spaces since Base 32 encoding
is designed to represent arbitrary sequences of octets.
Output: JFKQ====
Octets
In other words, the length should be a multiple of 8
Raw value: IU
Base 32 encoded: JFKQ====
Decoding#
To decode, simply reverse the logic.
Refer to this section to see decode implementation in Line 29.
const base32Decoded: Uint8Array = decode('JFKQ===='); // (1)!
console.log(new TextDecoder().decode(base32decoded)); // IU
- Refer to this section and see in Line 29.
const base32Decoded: Buffer = decode('JFKQ===='); // (1)!
console.log(base32Decoded.toString()); // IU
- Refer to this section and see in Line 32.
Note
The Typescript implementation can be used on both web and server-side / Node.js
since TextDecoder() is a general class JavaScript.
Conclusion#
Purpose#
The purpose of base32 encoding is primarily to represent binary data in a format that is:
- Human-Readable
-
Base32-encoded strings consist of characters that are easy to read and transmit through text-based communication channels without the risk of being misinterpreted.
- Case Insensitive
-
Base32 encoding typically uses uppercase characters only, making it case-insensitive and simplifying handling in systems where case sensitivity is a concern.
- URL-Safe
-
Base32-encoded strings can be safely included in URLs without the need for additional encoding or escaping, making them suitable for use in web applications.
Use Cases#
- URL Shortening
-
Base32 encoding can be used to generate short, human-readable identifiers for URLs in URL shortening services.
- Cryptographic Applications
-
Base32 encoding is commonly used in cryptographic applications for representing cryptographic keys, such as those used in authentication protocols like HMAC and OAuth.
- Data Integrity Verification
-
Base32 encoding can be used to represent checksums or cryptographic hashes of data in a human-readable format for verification purposes.
- Data Exchange
-
Base32 encoding can be used to represent binary data in text-based data exchange formats, such as JSON or XML, where binary data needs to be transmitted as text.
Overall, base32 encoding provides a balance between human readability and efficiency, making it suitable for various applications where binary data needs to be represented in a human-readable format.