Reading and Writing Files

Computers have two kinds of memory, volatile memory that is maintained as long as there is electricity, and non-volatile memory which is maintained when the power is off. Variables are a form of volatile memory, they only last as long as the program is running while to computer is on. Files are a form of non-volatile memory, they are kept when the computer is off.

Files can be small or very, very large, much larger than the amount of RAM memory in a computer. Becuase files can be so large they are accessed differently than memory. Like a book, files are read (and written) one line at a time. When you want to access a file you create a special kind of variable called a file handle.

File Handles

A file handle is a variable that gives us access to a file. A file handle is created with the open() function. The open() function takes at least one argument, the name of the file to open. The file handle is returned so you have to assign it to a variable. For example:

file_handle = open('files/example.txt')

Enter the code in the next cell. What type is file_handle?

[ ]:

The variable file_handle can be used just like any other Python variable but it has a type you haven’t seen before:

print('The type of file_handle is:', type(file_handle))
[ ]:

The file handle gives your program a way to access the contents of the file, it does not contain the file’s data. A variable that contains a file handle has functions that are needed to read and write the file. In the next sections you’ll see how to use functions of a file handle to access files.

Reading Files

There are two ways to read a file in Python, one line at a time and the whole file at once. Try this, run the cell below to open the file example.txt:

[ ]:
file_handle = open('files/example.txt')

Now execute the code in this cell over and over:

[ ]:
file_handle.readline()

What do you notice? The file is read one line at a time, each time you execute readline() the next line is returned. When the file reaches the end empty lines are returned. Try doing it again but re-running the open() function. Now see what happens when you use the read() function:

[ ]:
file_handle.read()

What happens when you mix read() and readline()?

Do you understand how calling ``readline`` changes the place in the file?

In a practical program you want to do something with the data you get in the file. The exmple below opens the file and reads the first four lines into four different variables.

file_handle = open('files/example.txt')
line1 = file_handle.readline()
line2 = file_handle.readline()
line3 = file_handle.readline()
line4 = file_handle.readline()
print("Lines:", line1, line2, line3, line4)

Enter the program into the next cell and run it:

[ ]:

Confused? Use the debugger!

Writing Files

When you open a file you have to decide whether to read or write the file (you can do both, more on that later). If you only give one argument to open() the file is opened for reading. If you want to write to a file here’s how you start:

file_handle = open('output.txt', 'w')

Watch Out! Opening a file for writing erases the contents of the file.

Type the open command into the next cell:

[ ]:

When the second argument to open() is 'w' the file is open for writing. To write a file use the write function. Execute the code to write to the file. Once the file is open you can write to it using the write function. For example:

file_handle.write('Hello file world!\n')

Add the write command to the next cell and run it:

[ ]:

Now find the file. Does it contain what you expect?

Notice that the newline character is in the string. Unlike print() the write() function does not add a newline to the end of the line. The write function also does not take multiple arguments like the print() function. If you want to mix variables and words in the write function use a f-string. Update the code cell with this code and execute it:

name = 'Your Name Here'
file_handle.write(f'Hello, my name is {name}\n')

Notice that write returns a number? The number is the total number of bytes written to the file. That number can be useful because (as in the example above) when you write variables to a file you might not know in advance how much data is in the variable. Here’s a complete program that writes a file:

name = "Your name here"
file_handle = open('greeting.txt', 'w')
wrote = file_handle.write(f"Hello, my name is {name}\n")
print(f"I wrote {wrote} bytes to the file")
file_handle.close()

Enter the complete program into the next cell:

[ ]:

Run the program and look inside of greeting.txt

Closing the File Handle

If you open the file that we’ve just written you will notice the contents aren’t there yet. When you write to a file the information you write is temporary held in memory to improve the performance of your program. When your program is done reading or writing a file you have to close the file handle using the close function:

file_handle.close()

It’s essential that you always remember to close the files you’ve opened. A program can only hold a fixed number of files open, if your code “leaks” file handles by forgetting about them it’s possible that you will exhaust that number and the open() function will fail.

Seeking in a File

When you read or write a file it’s like reading and writing a book, as you read (or write) you advance the place in the file. The seek() function is like turning back or forward to a particular page. The argument to the seek() function takes you to a particular byte number in the file. Try the following:

file_handle = open('files/example.txt')
print("The first line is:", file_handle.readline())
file_handle.seek(0)
print("(again) The first line is:", file_handle.readline())
file_handle.close()
[ ]:

Notice that the first line is repeated. Do you understand why?

What happens when you change the ``0`` in ``seek`` to a different number?

Seeking and Lines

The seek function seeks to a byte number, not a line. Seeking lines is not as easy as it seems because a “line” can be any number of bytes and the only way to know where lines are is to read the file. Try updating the code in the prvious cell to seek to the second line. It takes some trial and error. If you want to read a particular line from a file it takes some syntax we haven’t covered yet. In case you’re interested here’s an example:

file_name = "files/example.txt"
line_number = 3
with open(file_name) as fh:
    for _ in range(line_number):
        line = fh.readline()

print("The line is:", line)

Notice I didn’t use ``close``? Keep reading to find out why.

[ ]:

Appending a File

Appending a file means writing to the end of the file. Recall that opening a file for writing erases the contents of the file if it already exists. That’s not always a good idea. The open function has a mode that lets you append to the file without erasing its contents. Here’s how to do that using open():

file_handle = open('output.txt', 'a')
file_handle.write('Put this at the end\n')
file_handle.close()

Enter the program in to the next cell:

[ ]:

Now take a look at output.txt. There’s a new line at the end of the file and the original contents are still in place.

Python 3’s Context Managers

Python 3 has a feature called a context manager. I’ll say more about context managers in a future lecture, but you can use them to help you remember to close a file. If you use open() as a context manager the code that is indented inside of open has access to the file. Once execution leaves the indented code the file handle is automatically closed.

Here’s code that reads the first line of a file:

with open('example.txt') as file_handle:
    first_line = file_handle.readline()

# The file is automatically closed!
print(first_line)
[ ]:

Here’s code that reads the entire contents of a file without a context manager:

file_handle = open('example.txt')
file_contents = file_handle.read()
file_handle.close() # Never forget!
print(file_contents)

Here’s the same code using the context manager:

with open('example.txt') as file_handle:
    file_contents = file_handle.read()
print(file_contents)

Try typing the context manager version into the next cell.

[ ]:

Notice the indentation? The indentation tells Python that the indented statements are inside the open block. This is an important concept that we will spend a lot more time on later in the course. For now, try running the code with and without the indentation. Does it work both ways?

The Extra Newline (using strip)

Notice that when you use readline there’s an extra newline (\n) character? Re-run this example from earlier in the notebook:

file_handle = open('files/example.txt')
line1 = file_handle.readline()
line2 = file_handle.readline()
line3 = file_handle.readline()
line4 = file_handle.readline()
print("Lines:", line1, line2, line3, line4)

Notice that the output doesn’t line up? It can sometimes be annoying to have the extra newline when you read input from a file. The strip function in Python removes excess space from the beginning and ending of a string. A newline is considered whitespace. The strip function operates on a string. Update the code example to this:

file_handle = open('files/example.txt')
line1 = file_handle.readline().strip()
line2 = file_handle.readline().strip()
line3 = file_handle.readline().strip()
line4 = file_handle.readline().strip()
print("Lines:", line1, line2, line3, line4)
[ ]:

The strip function works on a string and returns a new string. If you want to strip an existing string and store the stripped version back into the same variable do it like this:

foo = "    This is foo     "
print(f'foo is "{foo}"')
foo = foo.strip()
print(f'foo is "{foo}"')

Try entering the example into the next cell:

[ ]:

You can also stip at the same time you print a string. For example:

foo = "    This is foo     "
print(f'foo is "{foo.strip()}"')
[ ]:

Remember the strip function. You’ll need it when you want to make your program’s output perfect.