Binary, Hex and Character Encoding#
We’re used to counting in a base-10 numbering system, called the decimal system. In the decimal system each place is 10 times larger than the next place and there are 10 number symbols (0 through 9).
Consider the number: 231.
\begin{align} 10^2 \end{align} |
\begin{align} 10^1 \end{align} |
\begin{align} 10^0 \end{align} |
---|---|---|
100’s place |
10’s place |
1’s place |
2 |
3 |
1 |
Binary is a base-2 counting system, where there are just two number symbols, 0 and 1. Binary is the simplest possible counting system. It works for today’s computers because 0 represents “off” and 1 represnts “on”, the two states of a CMOS gate.
Here’s the number 231 in binary.
\begin{align} 2^7 \end{align} |
\begin{align} 2^6 \end{align} |
\begin{align} 2^5 \end{align} |
\begin{align} 2^4 \end{align} |
\begin{align} 2^3 \end{align} |
\begin{align} 2^2 \end{align} |
\begin{align} 2^1 \end{align} |
\begin{align} 2^0 \end{align} |
---|---|---|---|---|---|---|---|
128’s place |
64’s place |
32’s place |
16’s place |
8’s place |
4’s place |
2’s place |
1’s place |
1 |
1 |
1 |
0 |
0 |
1 |
1 |
1 |
As you can see it takes more digits to represent a number in binary than it does in decimal. The word “bit” is short for “binary digit”.
Converting from Decimal to Binary#
Converting from decimal to binary is easy once you know the trick. You do it with a two nstep algorithm. Follow these steps.
Is your number even or odd? If it’s odd write a 1. If it’s even write a 0.
Divide your number by 2, ignore any fractional part.
Repeat until you get to zero!
Here’s how to convert 231.
Step |
Number |
Even or Odd? |
Bits so far |
---|---|---|---|
1 |
231 |
Odd |
1 |
2 |
115 |
Odd |
11 |
3 |
57 |
Odd |
111 |
4 |
28 |
Even |
0111 |
5 |
14 |
Even |
00111 |
6 |
7 |
Odd |
100111 |
7 |
3 |
Odd |
1100111 |
8 |
1 |
Odd |
11100111 |
9 |
0 |
Stop! |
11100111 |
Converting from Binary to Decimal#
Converting from binary to decimal is also easy. Write your binary number and add together all of the place values where your number has a one. Ignore the zeros.
128 |
64 |
32 |
16 |
8 |
4 |
2 |
1 |
---|---|---|---|---|---|---|---|
1 |
1 |
1 |
0 |
0 |
1 |
1 |
1 |
128 |
64 |
32 |
4 |
2 |
1 |
128 + 64 + 32 + 4 + 2 + 1 = 231
Exercise#
Pick a few numbers and convert them. Use my program to check your work.
[ ]:
import ipywidgets
from p4e.widgets import bind
from IPython.display import HTML
def convert(number):
"""Show the conversion of a number to binary."""
binary = ""
html = """<table><tr><th>Step</th><th>Binary</th></tr>"""
while number > 0:
if number % 2 == 0:
binary = '0' + binary
html += f"<tr><td>{number} is even.</td><td>{binary}</td></tr>"
else:
binary = '1' + binary
html += f"<tr><td>{number} is odd.</td><td>{binary}</td></tr>"
number = int(number / 2)
html += "</table>"
return HTML(html)
num_widget = ipywidgets.IntText(
description='Number:',
)
display(num_widget, bind('convert', {'number': num_widget}))
Hexadecimal#
Binary is hard to work with because it takes a lot of bits to write most numbers. Decimal is hard to work with because it’s cumbersome to convert between binary and decimal. So what’s a nerd to do?
The hexadecimal counting system is a base-16 counting system. It has 16 number symbols. Hexadecimal borrows the letters A through F to represent number values.
Symbol |
Value |
---|---|
0-9 |
Same as decimal |
a |
10 |
b |
11 |
c |
12 |
d |
13 |
e |
14 |
f |
15 |
The great thing about hexadecimal is that you can convert four bits to a decimal digit easily. Here’s how.
Bits |
Hex |
---|---|
0000 |
0 |
0001 |
1 |
0010 |
2 |
0011 |
3 |
0100 |
4 |
0101 |
5 |
0110 |
6 |
0111 |
7 |
1000 |
8 |
1001 |
9 |
1010 |
a |
1011 |
b |
1100 |
c |
1101 |
d |
1110 |
e |
1111 |
f |
Hexadecimal numbers are often written with a “0x” at the beginning. That makes it harder to confuse them with decimal numbers. So when you see “10” think ten and when you see “0x10” think sixteen. When you convert binary to hexadecimal split your binary number into groups of four bits.
Here’s how to convert 231:
1110 |
0111 |
---|---|
e |
7 |
So…
230 = 11100111 (binary) = 0xe7
[ ]:
Notice what the special characters do?
Character Encoding#
To the computer all data is a bunch of binary numbers, including strings, pictures and videos. So how do you get from a bunch of numbers to a string? There is an agreed upon way to convert the numbers into letters. The agreed upon way is called a character encoding.
The oldest character encoding that’s still in use is called ASCII, short for American Standard Code for Information Interchange. ASCII uses seven bits to represent 128 different characters. Why so many? To the computer upper case letters are different from lower case letters and there are a whole bunch of special characters. Special characters control how a string looks or is printed.
There are a few special characters that are important to know.
Binary |
Hex |
Decimal |
String Notation |
Meaning |
---|---|---|---|---|
000 0000 |
0x00 |
0 |
\0 |
Null character. There must be one at the end of every string. |
000 1001 |
0x09 |
9 |
\t |
Tab character. Adds an adjustable amount of whitespace. |
000 1010 |
0x0a |
10 |
\n |
Line feed. Starts a new line of text on UNIX. |
000 1101 |
0x0d |
13 |
\r |
Carriage return. On Windows \rn together start a new line of text. |
Many of the special characters are leftovers from the days when there were no computer monitors. In those days the UNIX command line was typed out using a specially modifed typewriter called a Teletype.
It’s pretty commont to use \t and \n in strings:
print('Special characters:\nNo tab\n\tOne tab\n\t\tTwo tabs\n\t\t\tThree tabs')
Enter the print statement into the cell below.
UTF Encoding#
There’s a problem with ASCII. What about if you want to interchange information with non-Americans? ASCII only contains the English alphabet, so you can’t write things in other languages. In 1960 when ASCII was developed few other coutries had computers. Now, people all over the world have computers in their pocket. To address this problem the UTF family of encodings was created. UTF is short for Unicode Transformation Format. UTF encodings contain enough characters for every human language and emojis!
In ASCII every character is one byte (8-bits, with only 7 used). UTF characters can be between one and four bytes (8 to 32 bits). The ord
and chr
functions are only work on single-byte characters so the new functions encode
and decode
are used to do the same job.
Byte Strings#
In order to understand what encode
and decode
do you need to know about a type of string that I haven’t mentioned yet. The b-string is used to express raw bytes. Raw bytes are a bunch of numbers. They can be converted to strings using an encoding.
Here’s how you specify a b-string. Each hexadecimal value in the b-string starts with \x
. So \x01
is how you write hexadecimal 0x01.
bytes = b'\xF0\x9F\x98\xB8'
The decode
function applies a character encoding to bytes and returns a string. See what happens when you encode the bytes from the previous example:
print(bytes.decode('utf-8'))
Enter the example lines to see the decoded bytes:
[ ]:
Meow! The encode
and decode
functions cannot automatically determine the character encoding, you have to tell them. See what happens to the code above if you try to decode the bytes as “ascii”.
The encode()
function does the reverse. Here’s how to convert the smiling cat face emoji to the ‘utf-32’ format.
[ ]:
'🤖'.encode('utf-32')
Check out your favorite emojis at https://getemoji.com/. Try encoding them to see what they look like in bytes. After that take a look at the standard encodings that are built into Python. Try encoding your favorite emoji with different codecs (short for coder-decoder).