Reading and Writing Files

Computers have two kinds of memory, volatile memory that is maintained as long as there is electricity, and non-volatile memory which is maintained when the power is off. Variables are a form of volatile memory, they only last as long as the program is running while to computer is on. Files are a form of non-volatile memory, they are kept when the computer is off.

Files can be small or very, very large, much larger than the amount of RAM memory in a computer. Becuase files can be so large they are accessed differently than memory. Like a book, files are read (and written) one line at a time. When you want to access a file you create a special kind of variable called a file handle.

File Paths

Every file has a unique location in the file system. On modern computers we’re used to navigating the file system visually as a set of files nested into folders. To use a file in your program you have to specify the file location by it’s file path. Think of a file path as a set of instructions for finding a file, each instruction is separated by a foreslash (/) character. There are two kinds of file paths:

  1. Absolute paths begin with a / and identify a file independent of the location of the notebook.

  2. Relative paths don’t begin with a / and identify a file starting from the location of the notebook.

Here are some examples:

[ ]:
from PIL import Image

Image.open("images/python_list.png")
[ ]:
Image.open("../Labs/files/maze.png")

File Handles

A file handle is a variable that gives us access to a file. A file handle is created with the open() function. The open() function takes at least one argument, the name of the file to open. The file handle is returned so you have to assign it to a variable. For example:

file_handle = open('files/example.txt')

Enter the code in the next cell. What type is file_handle?

[ ]:

The variable file_handle can be used just like any other Python variable but it has a type you haven’t seen before:

print('The type of file_handle is:', type(file_handle))
[ ]:

The file handle gives your program a way to access the contents of the file, it does not contain the file’s data. A variable that contains a file handle has functions that are needed to read and write the file. In the next sections you’ll see how to use functions of a file handle to access files.

Reading Text Files

There are two ways to read a file in Python, one line at a time and the whole file at once. Try this, run the cell below to open the file example.txt:

[ ]:
file_handle = open('files/example.txt')

Now execute the code in this cell over and over:

[ ]:
file_handle.readline()

What do you notice? The file is read one line at a time, each time you execute readline() the next line is returned. When the file reaches the end empty lines are returned. Try doing it again but re-running the open() function. Now see what happens when you use the read() function:

[ ]:
file_handle.read()

What happens when you mix read() and readline()?

Do you understand how calling ``readline`` changes the place in the file?

In a practical program you want to do something with the data you get in the file. The exmple below opens the file and reads the first four lines into four different variables.

file_handle = open('files/example.txt')
line1 = file_handle.readline()
line2 = file_handle.readline()
line3 = file_handle.readline()
line4 = file_handle.readline()
print("Lines:", line1, line2, line3, line4)

Enter the program into the next cell and run it:

[ ]:

Confused? Use the debugger!

Reading Lines Into a List

You can read the entire contents of a file into a list of lines. That can be extremely useful when you want to be able to easily access the file in random order, but beware, the entire file contents will be read into the computer memory so you have to be careful of how big the file is. This code example reads all of the lines of the example file into a list called lines.

file_handle = open('files/example.txt')
lines = list(file_handle)
file_handle.close()
print(lines)

Try the code in the next cell:

[ ]:

Writing Text Files

When you open a file you have to decide whether to read or write the file (you can do both, more on that later). If you only give one argument to open() the file is opened for reading. If you want to write to a file here’s how you start:

file_handle = open('output.txt', 'w')

Watch Out! Opening a file for writing erases the contents of the file.

Type the open command into the next cell:

[ ]:

When the second argument to open() is 'w' the file is open for writing. To write a file use the write function. Execute the code to write to the file. Once the file is open you can write to it using the write function. For example:

file_handle.write('Hello file world!\n')

Add the write command to the next cell and run it:

[ ]:

Now find the file. Does it contain what you expect?

Notice that the newline character is in the string. Unlike print() the write() function does not add a newline to the end of the line. The write function also does not take multiple arguments like the print() function. If you want to mix variables and words in the write function use a f-string. Update the code cell with this code and execute it:

name = 'Your Name Here'
file_handle.write(f'Hello, my name is {name}\n')

Notice that write returns a number? The number is the total number of bytes written to the file. That number can be useful because (as in the example above) when you write variables to a file you might not know in advance how much data is in the variable. Here’s a complete program that writes a file:

name = "Your name here"
file_handle = open('greeting.txt', 'w')
wrote = file_handle.write(f"Hello, my name is {name}\n")
print(f"I wrote {wrote} bytes to the file")
file_handle.close()

Enter the complete program into the next cell:

[ ]:

Run the program and look inside of greeting.txt

Closing the File Handle

If you open the file that we’ve just written you will notice the contents aren’t there yet. When you write to a file the information you write is temporary held in memory to improve the performance of your program. When your program is done reading or writing a file you have to close the file handle using the close function:

file_handle.close()

It’s essential that you always remember to close the files you’ve opened. A program can only hold a fixed number of files open, if your code “leaks” file handles by forgetting about them it’s possible that you will exhaust that number and the open() function will fail.

File Types

So far you’ve seen how to make plain text files. Text files are as old as computers themselves. Most files these days have a specific format that contains non-text information like images, video and structured data. Files have a dot extension that typically identify what type of file they are. Here are some file types that you can use in the notebook:

Extension

Type

Open With

.txt

Plain text.

open(...)

.png .gif.jpg .jpeg

Uncompressed image data.Compressed image data.

Image.open(...)Turtle(background='...')

.wav.mp3

Uncompressed audio.Compressed audio.

Audio(...)

.csv

Simple spreadsheet structured data.

pandas.read_csv(...)

.xlsx

Microsoft Excel spreadsheet structured data.

pandas.read_excel(...)

Image Files

Image files contain two-dimensional data that can be embedded into the notebook. The Python Image Library (PIL) let’s you load and manipulate images. Here’s an example of how to load an image using PIL:

from PIL import Image
pic = Image.open('../Labs/files/maze.png')
pic

If you want to draw over the image with the Turtle you can use my Turtle library too:

from p4e.drawing import Turtle
tu = Turtle(image="../Labs/files/maze.png")
tu

Try both:

[ ]:

Audio Files

Audio files contain recorded (or generated) sounds, often music. You can embed an audio player in the notebook using the Audio display class like this:

from IPython.display import Audio
sound = Audio('files/conga_groove.wav')
sound
[ ]:

Spreadsheets

Spreadsheets are data structured into a table, with a heading in the first row and data in the rest. In a future lesson we’ll use spreadsheet data to do analysis. Spreadsheets can be loaded using the Pandas library. Here’s an example:

import pandas
df = pandas.read_csv('files/mlb_players.csv')
df
[ ]:

The Pandas library comes with powerful selection and analysis functions. Here are some examples to try:

Do a statistical analysis of all of the numerical columns in the data:

df.describe()

Show the rows where the Team column is equal to SF (the San Francisco Giants):

df[df['Team'] == 'SF']

Show the tallest player:

df[ df['Height'] == df['Height'].max() ]

Plot a histogram of player heights:

df['Height'].hist().plot()
[ ]: