IO Objects

Indices and tables

Definition

An IO object represents an open file.

The location of that file may be one of:

  • On-disk
  • Standard input
  • Standard output
  • Standard error
  • In-memory buffer
  • Sockets
  • Pipes
  • etc…

IO objects are also referred to as, “file objects”, “file-like objects” or “streams”.

There are 3 basic types of IO objects you can work with:

Text

Reads and writes string objects.

Bytes from the backing store are decoded back to string on read and encoded into bytes on write. Newlines are optionally translated.

Binary

Reads and writes bytes objects.

No encoding, decoding or newline translation is performed.

Raw

Also called unbuffered IO.

This is a low-level building block class. It is rarely needed and won’t be discussed further.

The io library contains the API definitions for the different types of IO objects.

Python pre-initializes 3 text streams, for you:

Common IO Operations

Stream Opening

The canonical way to open a file is with the built-in function open(). Some, but not all, of its arguments are described below:

open(file, mode='r', encoding='', newline='')

file is a string or bytes object giving the path name to the file to open.

mode is optional and is a sequence of 1 or more characters that specifies how the file is opened. Refer to Table 15

Table 15 Table of File Open Modes
Character Meaning
‘r’ Open for reading (default)
‘w’ Open for writing, truncating the file first
‘x’ Like ‘w’ but fails if file already exists
‘a’ Open for appending to the end of the file
‘b’ Open in binary mode
‘t’ Open in text mode (default)
‘+’ Open for reading and writing (updating)

encoding names the encoding to use to encode/decode the file (e.g. ‘utf-8’). The codecs module lists all the valid encodings you can use. If not given, uses the current locale encoding. Only used with text files.

newline controls how line seperators are handled. By default universal newlines mode is enable, which means:

  • On input, lines can end in ‘\n’, ‘\r’ or ‘\r\n’ and they will be translated into ‘\n’.
  • On output, any ‘\n’ is translated into the OS default line separator.

If this behavior is unsuitable for you, look into the newline argument to open().

Below are some typical calls to open():

  • To open a file and read text from it using universal newlines:

    >>> fh = open('foo.txt')
    
  • To open a file and write bytes into it:

    >>> fh = open('foo.bin', mode='bw')
    

Tip

There are many scenarios when dealing with files for exceptions (specifically OSError) to be generated. You should always wrap file operations in a try or with constructs to avoid these exceptions from crashing your script.

Once the stream is open, you can get the name of it (i.e. the file name) from the name attribute:

>>> fh = open('foo.txt')
>>> fh.name
'foo.txt'

Stream Iterating

Text and binary streams support iterating over their lines using a for loop. The line separator for binary streams is always ‘\n’; for text streams it depends on the newline argument to open().

For example, assuming the text file foo.txt has the following content:

The first line.
The second line.
The third line.
And done!

You can iterate over the lines of the file:

>>> fh = open('foo.txt')
>>> for line in fh:
...     print(line, end='')
The first line.
The second line
The third line.
And done!

If the binary file foo.bin has the following content:

\x10\x11\n\x12\x13

You can iterate over the bytes of the file:

>>> fh = open('foo.bin', mode='br')
>>> for byte in fh:
...     print(byte)
b'\x10\x11\n'
b'\x12\x13'

Stream Flushing

By default, streams opened using open() are buffered and may not appear on disk immediately. To force the data out to disk, call flush() on the file file object, as in:

>>> fh = open('bar.txt', mode='w')
>>> fh.write('Hello World')
11
>>> fh.flush()

Stream Closing

A stream remains open until you close it by calling close on the file handle. This will flush and close the stream.

You can call close as many times as you want on a stream. However, calling any other operations on a closed stream raise a ValueError.

When using the try construct, put the call to close() in the finally section so it always gets called.

When using the with construct, close() is always called for you, even if there is an exception.

You can check the closed attribute to see if the stream is already closed.

For example:

>>> fh = open('bar.txt', mode='w')
>>> fh.write('Hello World')
11
>>> fh.close()
>>> fh.closed
True

Text Stream Operations

The text stream defines the following operations. For the examples that follow, assume the text file foo.txt has the following content:

The first line.
The second line.
The third line.
And done!
read(size)

Read and return at most size characters from the stream as a single str. If size is negative or None, reads until EOF.

>>> fh = open('foo.txt')
>>> fh.read()
'The first line.\nThe second line\nThe third line.\nAnd done!'
>>> fh.seek(0)  # Go back to start of stream
0
>>> fh.read(6)
'The fi'
>>> fh.read(6)
'rst li'
readline(size=-1)
Read until newline or EOF and return a single str. If the stream is already at EOF, an empty string is returned. If size is specified, at most size characters will be read.
>>> fh = open('foo.txt')
>>> fh.readline()
'The first line.\n'
>>> fh.readline()
'The second line\n'

Tip

It’s generally better to use a for loop and iterate over the lines then to use the readline() method.

readlines(hint=-1)

Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

>>> fh = open('foo.txt')
>>> fh.readlines()
['The first line.\n', 'The second line\n', 'The third line.\n', 'And done!']
>>> fh.seek(0)        # Go back to start of stream
0
>>> fh.readlines(30)
['The first line.\n', 'The second line\n']

Tip

It’s generally better to use a for loop and iterate over the lines then to use the readline() method.

writable()

Return True if the stream supports writing.

>>> fh = open('bar.txt', mode='w')
>>> fh.writable()
True
write(s)

Write the string s to the stream and return the number of characters written.

>>> fh = open('bar.txt', mode='w')
>>> fh.write('Hello World')
11
>>> fh.close()

The text file bar.txt contains the following:

Hello World
Goodbye World
writelines(lines)

Write a list of lines to the stream. Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

>>> fh = open('bar.txt', mode='w')
>>> msg_lines = ['Hello World\n', 'Goodbye World\n']
>>> fh.writelines(msg_lines)
>>> fh.close()

The text file bar.txt contains the following:

Hello World
Goodbye World

Text streams don’t support full random access. However, they do allow you to query the current position in the stream and return to the start of the stream.

seek(offset)

Change the stream position to the given offset. Don’t assume offset is in bytes or characters (which makes this unsuitable for true random access). Return the new absolute position as an opaque number.

>>> fh = open('foo.txt')
>>> fh.read(6)
'The fi'
>>> fh.read(6)
'rst li'
>>> fh.seek(0)  # Go back to start of stream
0
>>> fh.read(6)
'The fi'
tell()

Return the current stream position as an opaque number. The number does not usually represent a number of bytes in the underlying binary storage.

>>> fh = open('foo.txt')
>>> fh.read(6)
'The fi'
>>> fh.tell()
6

Binary Stream Operations

The binary stream defines the following operations. For the examples that follow, assume the text file foo.bin has the following content:

\x10\x11\x12\x13\n\x14\x15\x16\x17
peek([size])

Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested.

>>> fh = open('foo.bin', mode='br')
>>> fh.peek()
b'\x10\x11\x12\x13\n\x14\x15\x16\x17'
>>> fh.peek()
b'\x10\x11\x12\x13\n\x14\x15\x16\x17'
read([size])

Read and return size bytes, or if size is not given or negative, until EOF or if the read call would block in non-blocking mode.

>>> fh = open('foo.bin', mode='br')
>>> fh.read()
b'\x10\x11\x12\x13\n\x14\x15\x16\x17'
>>> fh.read()
b''
>>> fh.seek(0)  # Go back to start of stream
0
>>> fh.read(2)
b'\x10\x11'
write(b)

Write the bytes-like object, b, and return the number of bytes written.

>>> fh = open('bar.bin', mode='bw')
>>> fh.write(b'\x20\x21\x22\x23')
4
>>> fh.close()
seek(offset[, whence])

Change the stream position to the given byte offset. offset is interpreted relative to the position indicated by whence. The default value for whence is SEEK_SET. Values for whence are:

  • SEEK_SET or 0 – start of the stream (the default); offset should be zero or positive
  • SEEK_CUR or 1 – current stream position; offset may be negative
  • SEEK_END or 2 – end of the stream; offset is usually negative

Return the new absolute position.

>>> fh = open('foo.bin', mode='br')
>>> fh.seek(2)
2
>>> fh.read(2)
b'\x12\x13'
>>> fh.seek(3, io.SEEK_CUR)
7
>>> fh.read(2)
b'\x16\x17'
>>> fh.seek(-4, io.SEEK_END)
5
>>> fh.read(2)
b'\x14\x15'
tell()

Return the current stream position.

>>> fh = open('foo.bin', mode='br')
>>> fh.tell()
0
>>> fh.seek(2, io.SEEK_CUR)
2
>>> fh.tell()
2
>>> fh.read(2)
b'\x12\x13'
>>> fh.tell()
4

Try it!

Try the following:

  • Create a file and write the following multi-lined message into it:

    "Hello World
     Viva la Pluto"
    
  • Add another line to the same file without erasing the old message:

    "supercalifragilisticexpialidocious"
    
  • Close the file.

  • Re-open the file in read mode and print the contents.