Skip to content

Base 32 Encoding#

Using JavaScript/Typescript and Node.js to understand the basics on how Base32 encoding and decoding works that follows the RFC 4648 standard format.

This may be a good exercise for your programming language when you want to play around with encoding and binary data.

Quickstart#

Code snippet that you can quickly copy.

  • Typescript - This works in general for both web (browser) and server-side.
  • Node.js - This only works in the server / Node.js environment as it uses the node:buffer module.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
const BASE32_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';

export const encode = (input: string): string => {
  let binary = '';
  let result = '';

  for (let i = 0; i < input.length; i++) {
    const decimal = input.charCodeAt(i);
    binary += decimal.toString(2).padStart(8, '0');
  }

  while (binary.length % 5 !== 0) {
    binary += '0';
  }

  for (let i = 0; i < binary.length; i += 5) {
    const fiveBits = binary.substring(i, i + 5);
    const decimal = parseInt(fiveBits, 2);
    result += BASE32_ALPHABET[decimal];
  }

  while (result.length % 8 !== 0) {
    result += '=';
  }

  return result;
};

export const decode = (input: string): Uint8Array => {
  input = input.replace(/=/g, '');

  let binary = '';
  let result = []

  for (let i = 0; i < input.length; i++) {
    const decimal = BASE32_ALPHABET.indexOf(input[i]);
    binary += decimal.toString(2).padStart(5, '0');
  }

  for (let i = 0; i < binary.length; i += 8) {
    const eightBits = binary.substring(i, i + 8);
    const decimal = parseInt(eightBits, 2);
    result.push(decimal); // (1)!
  }

  return new Uint8Array(result);
};
  1. Why not use String.fromCharCode(decimal)?

    Due to inaccuracy and limitations, since String.fromCharCode(x) converts Unicode values to characters which is limited only to text data, but NOT binary data.

    Hence, we'll use Uint8Array instead since we're dealing with binary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import { Buffer } from 'node:buffer'; // (1)!

const BASE32_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';

export const encode = (input: string): string => {
  let binary = '';
  let result = '';

  const buffer = Buffer.from(input);

  for (const decimal of buffer) {
    binary += decimal.toString(2).padStart(8, '0');
  }

  while (binary.length % 5 !== 0) {
    binary += '0';
  }

  for (let i = 0; i < binary.length; i += 5) {
    const fiveBits = binary.substring(i, i + 5);
    const decimal = parseInt(fiveBits, 2);
    result += BASE32_ALPHABET[decimal];
  }

  while (result.length % 8 !== 0) {
    result += '=';
  }

  return result;
};

export const decode = (input: string): Buffer => {
  input = input.replace(/=/g, '');

  let binary = '';
  let buffer = Buffer.alloc(0);

  for (let i = 0; i < input.length; i++) {
    const decimal = BASE32_ALPHABET.indexOf(input[i]);
    binary += decimal.toString(2).padStart(5, '0');
  }

  for (let i = 0; i < binary.length; i += 8) {
    const eightBits = binary.substring(i, i + 8);
    const decimal = parseInt(eightBits, 2);
    buffer = Buffer.concat([buffer, Buffer.from([decimal])]);
  }

  return buffer;
};
  1. To learn more about Buffers, see this post.

Read the section below to understand how the logic works

How it works?#

This content follows the RFC 4648 standard format.

Base 32 encoding according to RFC 4648
The following description of base 32 is derived from [11] (with
corrections).  This encoding may be referred to as "base32".

The Base 32 encoding is designed to represent arbitrary sequences of
octets in a form that needs to be case insensitive but that need not
be human readable.

A 33-character subset of US-ASCII is used, enabling 5 bits to be
represented per printable character.  (The extra 33rd character, "=",
is used to signify a special processing function.)

The encoding process represents 40-bit groups of input bits as output
strings of 8 encoded characters.  Proceeding from left to right, a
40-bit input group is formed by concatenating 5 8bit input groups.
These 40 bits are then treated as 8 concatenated 5-bit groups, each
of which is translated into a single character in the base 32
alphabet.  When a bit stream is encoded via the base 32 encoding, the
bit stream must be presumed to be ordered with the most-significant-
bit first.  That is, the first bit in the stream will be the high-
order bit in the first 8bit byte, the eighth bit will be the low-
order bit in the first 8bit byte, and so on.







Josefsson                   Standards Track                     [Page 8]

RFC 4648                    Base-N Encodings                October 2006


Each 5-bit group is used as an index into an array of 32 printable
characters.  The character referenced by the index is placed in the
output string.  These characters, identified in Table 3, below, are
selected from US-ASCII digits and uppercase letters.

                    Table 3: The Base 32 Alphabet

    Value Encoding  Value Encoding  Value Encoding  Value Encoding
        0 A             9 J            18 S            27 3
        1 B            10 K            19 T            28 4
        2 C            11 L            20 U            29 5
        3 D            12 M            21 V            30 6
        4 E            13 N            22 W            31 7
        5 F            14 O            23 X
        6 G            15 P            24 Y         (pad) =
        7 H            16 Q            25 Z
        8 I            17 R            26 2

Special processing is performed if fewer than 40 bits are available
at the end of the data being encoded.  A full encoding quantum is
always completed at the end of a body.  When fewer than 40 input bits
are available in an input group, bits with value zero are added (on
the right) to form an integral number of 5-bit groups.  Padding at
the end of the data is performed using the "=" character.  Since all
base 32 input is an integral number of octets, only the following
cases can arise:

(1) The final quantum of encoding input is an integral multiple of 40
    bits; here, the final unit of encoded output will be an integral
    multiple of 8 characters with no "=" padding.

(2) The final quantum of encoding input is exactly 8 bits; here, the
    final unit of encoded output will be two characters followed by
    six "=" padding characters.

(3) The final quantum of encoding input is exactly 16 bits; here, the
    final unit of encoded output will be four characters followed by
    four "=" padding characters.

(4) The final quantum of encoding input is exactly 24 bits; here, the
    final unit of encoded output will be five characters followed by
    three "=" padding characters.

(5) The final quantum of encoding input is exactly 32 bits; here, the
    final unit of encoded output will be seven characters followed by
    one "=" padding character.

Encoding#

Given input: IU

Step 1: Convert each character to decimal value based on ASCII table 1

-> 73 85

Step 2: Convert each decimal to its binary form

-> '01001001' '01010101'

💡 Length of each binary must be 8-bits, hence, pad start zero as necessary.

Step 3: Concatenate all bits

-> 0100100101010101 0000

💡 Length must be a multiple of 5, hence, fill end with zero as necessary.

-> 01001001010101010000
Step 4: Split each chunk by 5 to form seperate groups or a binary of 5-bits

-> 01001 00101 01010 10000

Step 5: Convert each 5-bits to its corresponding decimal value

-> 9 5 10 16

Step 6: Convert each decimal to based on Base 32 Alphabet (1)

  1. Base 32 Alphabet Table

    Value Encoding  Value Encoding  Value Encoding  Value Encoding
        0 A             9 J            18 S            27 3
        1 B            10 K            19 T            28 4
        2 C            11 L            20 U            29 5
        3 D            12 M            21 V            30 6
        4 E            13 N            22 W            31 7
        5 F            14 O            23 X
        6 G            15 P            24 Y         (pad) =
        7 H            16 Q            25 Z
        8 I            17 R            26 2
    
-> J F K Q

Step 7: Add = as padding to fill vacant spaces since Base 32 encoding is designed to represent arbitrary sequences of octets.

Output: JFKQ====

Octets

In other words, the length should be a multiple of 8

Raw value: IU
Base 32 encoded: JFKQ====

Decoding#

To decode, simply reverse the logic.

Refer to this section to see decode implementation in Line 29.

const base32Decoded: Uint8Array = decode('JFKQ===='); // (1)!
console.log(new TextDecoder().decode(base32decoded)); // IU
  1. Refer to this section and see in Line 29.
const base32Decoded: Buffer = decode('JFKQ===='); // (1)!
console.log(base32Decoded.toString()); // IU
  1. Refer to this section and see in Line 32.

Note

The Typescript implementation can be used on both web and server-side / Node.js since TextDecoder() is a general class JavaScript.

Conclusion#

Purpose#

The purpose of base32 encoding is primarily to represent binary data in a format that is:

Human-Readable

Base32-encoded strings consist of characters that are easy to read and transmit through text-based communication channels without the risk of being misinterpreted.

Case Insensitive

Base32 encoding typically uses uppercase characters only, making it case-insensitive and simplifying handling in systems where case sensitivity is a concern.

URL-Safe

Base32-encoded strings can be safely included in URLs without the need for additional encoding or escaping, making them suitable for use in web applications.

Use Cases#

URL Shortening

Base32 encoding can be used to generate short, human-readable identifiers for URLs in URL shortening services.

Cryptographic Applications

Base32 encoding is commonly used in cryptographic applications for representing cryptographic keys, such as those used in authentication protocols like HMAC and OAuth.

Data Integrity Verification

Base32 encoding can be used to represent checksums or cryptographic hashes of data in a human-readable format for verification purposes.

Data Exchange

Base32 encoding can be used to represent binary data in text-based data exchange formats, such as JSON or XML, where binary data needs to be transmitted as text.

Overall, base32 encoding provides a balance between human readability and efficiency, making it suitable for various applications where binary data needs to be represented in a human-readable format.

Comments