Binary, Hex and Character Encoding#
We’re used to counting in a base10 numbering system, called the decimal system. In the decimal system each place is 10 times larger than the next place and there are 10 number symbols (0 through 9).
Consider the number: 231.
\begin{align} 10^2 \end{align} 
\begin{align} 10^1 \end{align} 
\begin{align} 10^0 \end{align} 

100’s place 
10’s place 
1’s place 
2 
3 
1 
Binary is a base2 counting system, where there are just two number symbols, 0 and 1. Binary is the simplest possible counting system. It works for today’s computers because 0 represents “off” and 1 represnts “on”, the two states of a CMOS gate.
Here’s the number 231 in binary.
\begin{align} 2^7 \end{align} 
\begin{align} 2^6 \end{align} 
\begin{align} 2^5 \end{align} 
\begin{align} 2^4 \end{align} 
\begin{align} 2^3 \end{align} 
\begin{align} 2^2 \end{align} 
\begin{align} 2^1 \end{align} 
\begin{align} 2^0 \end{align} 

128’s place 
64’s place 
32’s place 
16’s place 
8’s place 
4’s place 
2’s place 
1’s place 
1 
1 
1 
0 
0 
1 
1 
1 
As you can see it takes more digits to represent a number in binary than it does in decimal. The word “bit” is short for “binary digit”.
Converting from Decimal to Binary#
Converting from decimal to binary is easy once you know the trick. You do it with a two nstep algorithm. Follow these steps.
Is your number even or odd? If it’s odd write a 1. If it’s even write a 0.
Divide your number by 2, ignore any fractional part.
Repeat until you get to zero!
Here’s how to convert 231.
Step 
Number 
Even or Odd? 
Bits so far 

1 
231 
Odd 
1 
2 
115 
Odd 
11 
3 
57 
Odd 
111 
4 
28 
Even 
0111 
5 
14 
Even 
00111 
6 
7 
Odd 
100111 
7 
3 
Odd 
1100111 
8 
1 
Odd 
11100111 
9 
0 
Stop! 
11100111 
Converting from Binary to Decimal#
Converting from binary to decimal is also easy. Write your binary number and add together all of the place values where your number has a one. Ignore the zeros.
128 
64 
32 
16 
8 
4 
2 
1 

1 
1 
1 
0 
0 
1 
1 
1 
128 
64 
32 


4 
2 
1 
128 + 64 + 32 + 4 + 2 + 1 = 231
Exercise#
Pick a few numbers and convert them. Use my program to check your work.
import ipywidgets
from p4e.widgets import bind
from IPython.display import HTML
def convert(number):
"""Show the conversion of a number to binary."""
binary = ""
html = """<table><tr><th>Step</th><th>Binary</th></tr>"""
while number > 0:
if number % 2 == 0:
binary = '0' + binary
html += f"<tr><td>{number} is even.</td><td>{binary}</td></tr>"
else:
binary = '1' + binary
html += f"<tr><td>{number} is odd.</td><td>{binary}</td></tr>"
number = int(number / 2)
html += "</table>"
return HTML(html)
num_widget = ipywidgets.IntText(
description='Number:',
)
display(num_widget, bind('convert', {'number': num_widget}))
Hexadecimal#
Binary is hard to work with because it takes a lot of bits to write most numbers. Decimal is hard to work with because it’s cumbersome to convert between binary and decimal. So what’s a nerd to do?
The hexadecimal counting system is a base16 counting system. It has 16 number symbols. Hexadecimal borrows the letters A through F to represent number values.
Symbol 
Value 

09 
Same as decimal 
a 
10 
b 
11 
c 
12 
d 
13 
e 
14 
f 
15 
The great thing about hexadecimal is that you can convert four bits to a decimal digit easily. Here’s how.
Bits 
Hex 

0000 
0 
0001 
1 
0010 
2 
0011 
3 
0100 
4 
0101 
5 
0110 
6 
0111 
7 
1000 
8 
1001 
9 
1010 
a 
1011 
b 
1100 
c 
1101 
d 
1110 
e 
1111 
f 
Hexadecimal numbers are often written with a “0x” at the beginning. That makes it harder to confuse them with decimal numbers. So when you see “10” think ten and when you see “0x10” think sixteen. When you convert binary to hexadecimal split your binary number into groups of four bits.
Here’s how to convert 231:
1110 
0111 

e 
7 
So…
230 = 11100111 (binary) = 0xe7
Notice what the special characters do?
Character Encoding#
To the computer all data is a bunch of binary numbers, including strings, pictures and videos. So how do you get from a bunch of numbers to a string? There is an agreed upon way to convert the numbers into letters. The agreed upon way is called a character encoding.
The oldest character encoding that’s still in use is called ASCII, short for American Standard Code for Information Interchange. ASCII uses seven bits to represent 128 different characters. Why so many? To the computer upper case letters are different from lower case letters and there are a whole bunch of special characters. Special characters control how a string looks or is printed.
There are a few special characters that are important to know.
Binary 
Hex 
Decimal 
String Notation 
Meaning 

000 0000 
0x00 
0 
\0 
Null character. There must be one at the end of every string. 
000 1001 
0x09 
9 
\t 
Tab character. Adds an adjustable amount of whitespace. 
000 1010 
0x0a 
10 
\n 
Line feed. Starts a new line of text on UNIX. 
000 1101 
0x0d 
13 
\r 
Carriage return. On Windows \r\n together start a new line of text. 
Many of the special characters are leftovers from the days when there were no computer monitors. In those days the UNIX command line was typed out using a specially modifed typewriter called a Teletype.
It’s pretty commont to use \t and \n in strings:
print('Special characters:\nNo tab\n\tOne tab\n\t\tTwo tabs\n\t\t\tThree tabs')
Enter the print statement into the cell below.
UTF Encoding#
There’s a problem with ASCII. What about if you want to interchange information with nonAmericans? ASCII only contains the English alphabet, so you can’t write things in other languages. In 1960 when ASCII was developed few other coutries had computers. Now, people all over the world have computers in their pocket. To address this problem the UTF family of encodings was created. UTF is short for Unicode Transformation Format. UTF encodings contain enough characters for every human language and emojis!
In ASCII every character is one byte (8bits, with only 7 used). UTF characters can be between one and four bytes (8 to 32 bits). The ord
and chr
functions are only work on singlebyte characters so the new functions encode
and decode
are used to do the same job.
Byte Strings#
In order to understand what encode
and decode
do you need to know about a type of string that I haven’t mentioned yet. The bstring is used to express raw bytes. Raw bytes are a bunch of numbers. They can be converted to strings using an encoding.
Here’s how you specify a bstring. Each hexadecimal value in the bstring starts with \x
. So \x01
is how you write hexadecimal 0x01.
bytes = b'\xF0\x9F\x98\xB8'
The decode
function applies a character encoding to bytes and returns a string. See what happens when you encode the bytes from the previous example:
print(bytes.decode('utf8'))
Enter the example lines to see the decoded bytes:
Meow! The encode
and decode
functions cannot automatically determine the character encoding, you have to tell them. See what happens to the code above if you try to decode the bytes as “ascii”.
The encode()
function does the reverse. Here’s how to convert the smiling cat face emoji to the ‘utf32’ format.
'🤖'.encode('utf32')
Check out your favorite emojis at https://getemoji.com/. Try encoding them to see what they look like in bytes. After that take a look at the standard encodings that are built into Python. Try encoding your favorite emoji with different codecs (short for coderdecoder).