Binary, Hex and Character Encoding

We’re used to counting in a base-10 numbering system, called the decimal system. In the decimal system each place is 10 times larger than the next place and there are 10 number symbols (0 through 9).

Consider the number: 231.

\begin{align} 10^2 \end{align}

\begin{align} 10^1 \end{align}

\begin{align} 10^0 \end{align}

100’s place

10’s place

1’s place

2

3

1

Binary is a base-2 counting system, where there are just two number symbols, 0 and 1. Binary is the simplest possible counting system. It works for today’s computers because 0 represents “off” and 1 represnts “on”, the two states of a CMOS gate.

Here’s the number 231 in binary.

\begin{align} 2^7 \end{align}

\begin{align} 2^6 \end{align}

\begin{align} 2^5 \end{align}

\begin{align} 2^4 \end{align}

\begin{align} 2^3 \end{align}

\begin{align} 2^2 \end{align}

\begin{align} 2^1 \end{align}

\begin{align} 2^0 \end{align}

128’s place

64’s place

32’s place

16’s place

8’s place

4’s place

2’s place

1’s place

1

1

1

0

0

1

1

1

As you can see it takes more digits to represent a number in binary than it does in decimal. The word “bit” is short for “binary digit”.

Converting from Decimal to Binary

Converting from decimal to binary is easy once you know the trick. You do it with a two nstep algorithm. Follow these steps.

  1. Is your number even or odd? If it’s odd write a 1. If it’s even write a 0.

  2. Divide your number by 2, ignore any fractional part.

  3. Repeat until you get to zero!

Here’s how to convert 231.

Step

Number

Even or Odd?

Bits so far

1

231

Odd

1

2

115

Odd

11

3

57

Odd

111

4

28

Even

0111

5

14

Even

00111

6

7

Odd

100111

7

3

Odd

1100111

8

1

Odd

11100111

9

0

Stop!

11100111

Converting from Binary to Decimal

Converting from binary to decimal is also easy. Write your binary number and add together all of the place values where your number has a one. Ignore the zeros.

128

64

32

16

8

4

2

1

1

1

1

0

0

1

1

1

128

64

32

4

2

1

128 + 64 + 32 + 4 + 2 + 1 = 231

Exercise

Pick a few numbers and convert them. Use my program to check your work.

[ ]:
import ipywidgets
from p4e.widgets import bind
from IPython.display import HTML

def convert(number):
    """Show the conversion of a number to binary."""
    binary = ""
    html = """<table><tr><th>Step</th><th>Binary</th></tr>"""
    while number > 0:
        if number % 2 == 0:
            binary = '0' + binary
            html += f"<tr><td>{number} is even.</td><td>{binary}</td></tr>"
        else:
            binary = '1' + binary
            html += f"<tr><td>{number} is odd.</td><td>{binary}</td></tr>"
        number = int(number / 2)
    html += "</table>"
    return HTML(html)

num_widget = ipywidgets.IntText(
    description='Number:',
)

display(num_widget, bind('convert', {'number': num_widget}))

Hexadecimal

Binary is hard to work with because it takes a lot of bits to write most numbers. Decimal is hard to work with because it’s cumbersome to convert between binary and decimal. So what’s a nerd to do?

The hexadecimal counting system is a base-16 counting system. It has 16 number symbols. Hexadecimal borrows the letters A through F to represent number values.

Symbol

Value

0-9

Same as decimal

a

10

b

11

c

12

d

13

e

14

f

15

The great thing about hexadecimal is that you can convert four bits to a decimal digit easily. Here’s how.

Bits

Hex

0000

0

0001

1

0010

2

0011

3

0100

4

0101

5

0110

6

0111

7

1000

8

1001

9

1010

a

1011

b

1100

c

1101

d

1110

e

1111

f

Hexadecimal numbers are often written with a “0x” at the beginning. That makes it harder to confuse them with decimal numbers. So when you see “10” think ten and when you see “0x10” think sixteen. When you convert binary to hexadecimal split your binary number into groups of four bits.

Here’s how to convert 231:

1110

0111

e

7

So…

230 = 11100111 (binary) = 0xe7

[ ]:

Notice what the special characters do?

Character Encoding

To the computer all data is a bunch of binary numbers, including strings, pictures and videos. So how do you get from a bunch of numbers to a string? There is an agreed upon way to convert the numbers into letters. The agreed upon way is called a character encoding.

The oldest character encoding that’s still in use is called ASCII, short for American Standard Code for Information Interchange. ASCII uses seven bits to represent 128 different characters. Why so many? To the computer upper case letters are different from lower case letters and there are a whole bunch of special characters. Special characters control how a string looks or is printed.

There are a few special characters that are important to know.

Binary

Hex

Decimal

String Notation

Meaning

000 0000

0x00

0

\0

Null character. There must be one at the end of every string.

000 1001

0x09

9

\t

Tab character. Adds an adjustable amount of whitespace.

000 1010

0x0a

10

\n

Line feed. Starts a new line of text on UNIX.

000 1101

0x0d

13

\r

Carriage return. On Windows \rn together start a new line of text.

Many of the special characters are leftovers from the days when there were no computer monitors. In those days the UNIX command line was typed out using a specially modifed typewriter called a Teletype.

It’s pretty commont to use \t and \n in strings:

print('Special characters:\nNo tab\n\tOne tab\n\t\tTwo tabs\n\t\t\tThree tabs')

Enter the print statement into the cell below.

UTF Encoding

There’s a problem with ASCII. What about if you want to interchange information with non-Americans? ASCII only contains the English alphabet, so you can’t write things in other languages. In 1960 when ASCII was developed few other coutries had computers. Now, people all over the world have computers in their pocket. To address this problem the UTF family of encodings was created. UTF is short for Unicode Transformation Format. UTF encodings contain enough characters for every human language and emojis!

In ASCII every character is one byte (8-bits, with only 7 used). UTF characters can be between one and four bytes (8 to 32 bits). The ord and chr functions are only work on single-byte characters so the new functions encode and decode are used to do the same job.

Byte Strings

In order to understand what encode and decode do you need to know about a type of string that I haven’t mentioned yet. The b-string is used to express raw bytes. Raw bytes are a bunch of numbers. They can be converted to strings using an encoding.

Here’s how you specify a b-string. Each hexadecimal value in the b-string starts with \x. So \x01 is how you write hexadecimal 0x01.

bytes = b'\xF0\x9F\x98\xB8'

The decode function applies a character encoding to bytes and returns a string. See what happens when you encode the bytes from the previous example:

print(bytes.decode('utf-8'))

Enter the example lines to see the decoded bytes:

[ ]:

Meow! The encode and decode functions cannot automatically determine the character encoding, you have to tell them. See what happens to the code above if you try to decode the bytes as “ascii”.

The encode() function does the reverse. Here’s how to convert the smiling cat face emoji to the ‘utf-32’ format.

[ ]:
'🤖'.encode('utf-32')

Check out your favorite emojis at https://getemoji.com/. Try encoding them to see what they look like in bytes. After that take a look at the standard encodings that are built into Python. Try encoding your favorite emoji with different codecs (short for coder-decoder).