IO Objects¶
Indices and tables¶
Definition¶
An IO object represents an open file.
The location of that file may be one of:
- On-disk
- Standard input
- Standard output
- Standard error
- In-memory buffer
- Sockets
- Pipes
- etc…
IO objects are also referred to as, “file objects”, “file-like objects” or “streams”.
There are 3 basic types of IO objects you can work with:
- Text
Reads and writes string objects.
Bytes from the backing store are decoded back to string on read and encoded into bytes on write. Newlines are optionally translated.
- Binary
Reads and writes bytes objects.
No encoding, decoding or newline translation is performed.
- Raw
Also called unbuffered IO.
This is a low-level building block class. It is rarely needed and won’t be discussed further.
The io
library contains the API definitions for the different types of IO objects.
Python pre-initializes 3 text streams, for you:
sys.stdin
- standard input streamsys.stdout
- standard output streamsys.stderr
- standard error stream
Common IO Operations¶
Stream Opening¶
The canonical way to open a file is with the built-in function open()
. Some, but not all, of its arguments are described below:
-
open
(file, mode='r', encoding='', newline='')¶
file
is a string or bytes object giving the path name to the file to open.
mode
is optional and is a sequence of 1 or more characters that specifies how the file is opened. Refer to Table 15
Character | Meaning |
---|---|
‘r’ | Open for reading (default) |
‘w’ | Open for writing, truncating the file first |
‘x’ | Like ‘w’ but fails if file already exists |
‘a’ | Open for appending to the end of the file |
‘b’ | Open in binary mode |
‘t’ | Open in text mode (default) |
‘+’ | Open for reading and writing (updating) |
encoding
names the encoding to use to encode/decode the file (e.g. ‘utf-8’). The codecs
module lists all the valid encodings you can use. If not given, uses the current locale encoding. Only used with text files.
newline
controls how line seperators are handled. By default universal newlines mode is enable, which means:
- On input, lines can end in ‘\n’, ‘\r’ or ‘\r\n’ and they will be translated into ‘\n’.
- On output, any ‘\n’ is translated into the OS default line separator.
If this behavior is unsuitable for you, look into the newline
argument to open()
.
Below are some typical calls to open()
:
To open a file and read text from it using universal newlines:
>>> fh = open('foo.txt')To open a file and write bytes into it:
>>> fh = open('foo.bin', mode='bw')
Tip
There are many scenarios when dealing with files for exceptions (specifically OSError
) to be generated. You should always wrap file operations in a try
or with
constructs to avoid these exceptions from crashing your script.
Once the stream is open, you can get the name of it (i.e. the file name) from the name
attribute:
>>> fh = open('foo.txt')
>>> fh.name
'foo.txt'
Stream Iterating¶
Text and binary streams support iterating over their lines using a for
loop. The line separator for binary streams is always ‘\n’; for text streams it depends on the newline
argument to open()
.
For example, assuming the text file foo.txt
has the following content:
The first line.
The second line.
The third line.
And done!
You can iterate over the lines of the file:
>>> fh = open('foo.txt')
>>> for line in fh:
... print(line, end='')
The first line.
The second line
The third line.
And done!
If the binary file foo.bin
has the following content:
\x10\x11\n\x12\x13
You can iterate over the bytes of the file:
>>> fh = open('foo.bin', mode='br')
>>> for byte in fh:
... print(byte)
b'\x10\x11\n'
b'\x12\x13'
Stream Flushing¶
By default, streams opened using open()
are buffered and may not appear on disk immediately. To force the data out to disk, call flush()
on the file file object, as in:
>>> fh = open('bar.txt', mode='w')
>>> fh.write('Hello World')
11
>>> fh.flush()
Stream Closing¶
A stream remains open until you close it by calling close on the file handle. This will flush and close the stream.
You can call close as many times as you want on a stream. However, calling any other operations on a closed stream raise a ValueError
.
When using the try
construct, put the call to close()
in the finally section so it always gets called.
When using the with
construct, close()
is always called for you, even if there is an exception.
You can check the closed
attribute to see if the stream is already closed.
For example:
>>> fh = open('bar.txt', mode='w')
>>> fh.write('Hello World')
11
>>> fh.close()
>>> fh.closed
True
Text Stream Operations¶
The text stream defines the following operations. For the examples that follow, assume the text file foo.txt
has the following content:
The first line.
The second line.
The third line.
And done!
-
read
(size)¶ Read and return at most
size
characters from the stream as a single str. If size is negative orNone
, reads until EOF.>>> fh = open('foo.txt') >>> fh.read() 'The first line.\nThe second line\nThe third line.\nAnd done!' >>> fh.seek(0) # Go back to start of stream 0 >>> fh.read(6) 'The fi' >>> fh.read(6) 'rst li'
-
readline
(size=-1)¶ >>> fh = open('foo.txt') >>> fh.readline() 'The first line.\n' >>> fh.readline() 'The second line\n'
Tip
It’s generally better to use a
for
loop and iterate over the lines then to use thereadline()
method.
-
readlines
(hint=-1)¶ Read and return a list of lines from the stream.
hint
can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceedshint
.>>> fh = open('foo.txt') >>> fh.readlines() ['The first line.\n', 'The second line\n', 'The third line.\n', 'And done!'] >>> fh.seek(0) # Go back to start of stream 0 >>> fh.readlines(30) ['The first line.\n', 'The second line\n']
Tip
It’s generally better to use a
for
loop and iterate over the lines then to use thereadline()
method.
-
writable
()¶ Return
True
if the stream supports writing.>>> fh = open('bar.txt', mode='w') >>> fh.writable() True
-
write
(s)¶ Write the string
s
to the stream and return the number of characters written.>>> fh = open('bar.txt', mode='w') >>> fh.write('Hello World') 11 >>> fh.close()
The text file
bar.txt
contains the following:Hello World Goodbye World
-
writelines
(lines)¶ Write a list of lines to the stream. Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.
>>> fh = open('bar.txt', mode='w') >>> msg_lines = ['Hello World\n', 'Goodbye World\n'] >>> fh.writelines(msg_lines) >>> fh.close()
The text file
bar.txt
contains the following:Hello World Goodbye World
Text streams don’t support full random access. However, they do allow you to query the current position in the stream and return to the start of the stream.
-
seek
(offset)¶ Change the stream position to the given
offset
. Don’t assumeoffset
is in bytes or characters (which makes this unsuitable for true random access). Return the new absolute position as an opaque number.>>> fh = open('foo.txt') >>> fh.read(6) 'The fi' >>> fh.read(6) 'rst li' >>> fh.seek(0) # Go back to start of stream 0 >>> fh.read(6) 'The fi'
-
tell
()¶ Return the current stream position as an opaque number. The number does not usually represent a number of bytes in the underlying binary storage.
>>> fh = open('foo.txt') >>> fh.read(6) 'The fi' >>> fh.tell() 6
Binary Stream Operations¶
The binary stream defines the following operations. For the examples that follow, assume the text file foo.bin
has the following content:
\x10\x11\x12\x13\n\x14\x15\x16\x17
-
peek
([size])¶ Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested.
>>> fh = open('foo.bin', mode='br') >>> fh.peek() b'\x10\x11\x12\x13\n\x14\x15\x16\x17' >>> fh.peek() b'\x10\x11\x12\x13\n\x14\x15\x16\x17'
-
read
([size]) Read and return
size
bytes, or ifsize
is not given or negative, until EOF or if the read call would block in non-blocking mode.>>> fh = open('foo.bin', mode='br') >>> fh.read() b'\x10\x11\x12\x13\n\x14\x15\x16\x17' >>> fh.read() b'' >>> fh.seek(0) # Go back to start of stream 0 >>> fh.read(2) b'\x10\x11'
-
write
(b) Write the bytes-like object,
b
, and return the number of bytes written.>>> fh = open('bar.bin', mode='bw') >>> fh.write(b'\x20\x21\x22\x23') 4 >>> fh.close()
-
seek
(offset[, whence]) Change the stream position to the given byte
offset
.offset
is interpreted relative to the position indicated bywhence
. The default value forwhence
is SEEK_SET. Values for whence are:- SEEK_SET or 0 – start of the stream (the default);
offset
should be zero or positive - SEEK_CUR or 1 – current stream position;
offset
may be negative - SEEK_END or 2 – end of the stream;
offset
is usually negative
Return the new absolute position.
>>> fh = open('foo.bin', mode='br') >>> fh.seek(2) 2 >>> fh.read(2) b'\x12\x13' >>> fh.seek(3, io.SEEK_CUR) 7 >>> fh.read(2) b'\x16\x17' >>> fh.seek(-4, io.SEEK_END) 5 >>> fh.read(2) b'\x14\x15'
- SEEK_SET or 0 – start of the stream (the default);
-
tell
() Return the current stream position.
>>> fh = open('foo.bin', mode='br') >>> fh.tell() 0 >>> fh.seek(2, io.SEEK_CUR) 2 >>> fh.tell() 2 >>> fh.read(2) b'\x12\x13' >>> fh.tell() 4
Try it!
Try the following:
Create a file and write the following multi-lined message into it:
"Hello World Viva la Pluto"
Add another line to the same file without erasing the old message:
"supercalifragilisticexpialidocious"
Close the file.
Re-open the file in read mode and print the contents.