IO Objects ========== .. toctree:: :maxdepth: 1 Indices and tables ------------------ * :ref:`genindex` * :ref:`modindex` * :ref:`search` Definition ---------- An IO object represents an open file. The location of that file may be one of: * On-disk * Standard input * Standard output * Standard error * In-memory buffer * Sockets * Pipes * etc... IO objects are also referred to as, "file objects", "file-like objects" or "streams". There are 3 basic types of IO objects you can work with: Text Reads and writes string objects. Bytes from the backing store are decoded back to string on read and encoded into bytes on write. Newlines are optionally translated. Binary Reads and writes bytes objects. No encoding, decoding or newline translation is performed. Raw Also called unbuffered IO. This is a low-level building block class. It is rarely needed and won't be discussed further. The :py:mod:`io` library contains the API definitions for the different types of IO objects. Python pre-initializes 3 text streams, for you: * :py:obj:`sys.stdin` - standard input stream * :py:obj:`sys.stdout` - standard output stream * :py:obj:`sys.stderr` - standard error stream .. _section_heading-Common_IO_Operations: Common IO Operations -------------------- .. _section_heading-Stream_Opening: Stream Opening ^^^^^^^^^^^^^^ The canonical way to open a file is with the built-in function :py:func:`open`. Some, but not all, of its arguments are described below: .. function:: open(file, mode='r', encoding='', newline='') ``file`` is a string or bytes object giving the path name to the file to open. ``mode`` is optional and is a sequence of 1 or more characters that specifies how the file is opened. Refer to :numref:`table-File_Open_Modes` .. _table-File_Open_Modes: .. table:: Table of File Open Modes +-----------+---------------------------------------------+ | Character | Meaning | +===========+=============================================+ | 'r' | Open for reading (default) | +-----------+---------------------------------------------+ | 'w' | Open for writing, truncating the file first | +-----------+---------------------------------------------+ | 'x' | Like 'w' but fails if file already exists | +-----------+---------------------------------------------+ | 'a' | Open for appending to the end of the file | +-----------+---------------------------------------------+ | 'b' | Open in binary mode | +-----------+---------------------------------------------+ | 't' | Open in text mode (default) | +-----------+---------------------------------------------+ | '+' | Open for reading and writing (updating) | +-----------+---------------------------------------------+ ``encoding`` names the encoding to use to encode/decode the file (e.g. 'utf-8'). The :py:mod:`codecs` module lists all the valid encodings you can use. If not given, uses the current locale encoding. Only used with text files. ``newline`` controls how line seperators are handled. By default universal newlines mode is enable, which means: * On input, lines can end in '\\n', '\\r' or '\\r\\n' and they will be translated into '\\n'. * On output, any '\\n' is translated into the OS default line separator. If this behavior is unsuitable for you, look into the ``newline`` argument to :py:func:`open`. Below are some typical calls to :py:func:`open`: * To open a file and read text from it using universal newlines: >>> fh = open('foo.txt') * To open a file and write bytes into it: >>> fh = open('foo.bin', mode='bw') .. tip:: There are many scenarios when dealing with files for exceptions (specifically :py:exc:`OSError`) to be generated. You should always wrap file operations in a :keyword:`try` or :keyword:`with` constructs to avoid these exceptions from crashing your script. Once the stream is open, you can get the name of it (i.e. the file name) from the :py:attr:`~io.TextIOWrapper.name` attribute: >>> fh = open('foo.txt') >>> fh.name 'foo.txt' .. _section_heading-Stream_Iterating: Stream Iterating ^^^^^^^^^^^^^^^^ Text and binary streams support iterating over their lines using a :keyword:`for` loop. The line separator for binary streams is always '\\n'; for text streams it depends on the ``newline`` argument to :py:func:`open`. For example, assuming the text file ``foo.txt`` has the following content: .. code-block:: text The first line. The second line. The third line. And done! You can iterate over the lines of the file: >>> fh = open('foo.txt') >>> for line in fh: ... print(line, end='') The first line. The second line The third line. And done! If the binary file ``foo.bin`` has the following content: .. code-block:: text \x10\x11\n\x12\x13 You can iterate over the bytes of the file: >>> fh = open('foo.bin', mode='br') >>> for byte in fh: ... print(byte) b'\x10\x11\n' b'\x12\x13' .. _section_heading-Stream_Flushing: Stream Flushing ^^^^^^^^^^^^^^^ By default, streams opened using :py:func:`open` are buffered and may not appear on disk immediately. To force the data out to disk, call :py:meth:`~io.IOBase.flush` on the file file object, as in: >>> fh = open('bar.txt', mode='w') >>> fh.write('Hello World') 11 >>> fh.flush() .. _section_heading-Stream_Closing: Stream Closing ^^^^^^^^^^^^^^ A stream remains open until you close it by calling close on the file handle. This will flush and close the stream. You can call close as many times as you want on a stream. However, calling any other operations on a closed stream raise a :py:exc:`ValueError`. When using the :keyword:`try` construct, put the call to :py:meth:`~io.IOBase.close` in the finally section so it always gets called. When using the :keyword:`with` construct, :py:meth:`~io.IOBase.close` is always called for you, even if there is an exception. You can check the :py:attr:`~io.IOBase.closed` attribute to see if the stream is already closed. For example: >>> fh = open('bar.txt', mode='w') >>> fh.write('Hello World') 11 >>> fh.close() >>> fh.closed True .. _section_heading-Text_Stream_Operations: Text Stream Operations ---------------------- The text stream defines the following operations. For the examples that follow, assume the text file ``foo.txt`` has the following content: .. code-block:: text The first line. The second line. The third line. And done! .. function:: read(size) Read and return at most ``size`` characters from the stream as a single str. If size is negative or :py:obj:`None`, reads until :py:term:`EOF`. >>> fh = open('foo.txt') >>> fh.read() 'The first line.\nThe second line\nThe third line.\nAnd done!' >>> fh.seek(0) # Go back to start of stream 0 >>> fh.read(6) 'The fi' >>> fh.read(6) 'rst li' .. function:: readline(size=-1) Read until newline or :py:term:`EOF` and return a single str. If the stream is already at :py:term:`EOF`, an empty string is returned. If ``size`` is specified, at most ``size`` characters will be read. >>> fh = open('foo.txt') >>> fh.readline() 'The first line.\n' >>> fh.readline() 'The second line\n' .. tip:: It's generally better to use a :keyword:`for` loop and iterate over the lines then to use the :py:meth:`~io.TextIOBase.readline` method. .. function:: readlines(hint=-1) Read and return a list of lines from the stream. ``hint`` can be specified to control the number of lines read: no more lines will be read if the total size (in **bytes/characters**) of all lines so far exceeds ``hint``. >>> fh = open('foo.txt') >>> fh.readlines() ['The first line.\n', 'The second line\n', 'The third line.\n', 'And done!'] >>> fh.seek(0) # Go back to start of stream 0 >>> fh.readlines(30) ['The first line.\n', 'The second line\n'] .. tip:: It's generally better to use a :keyword:`for` loop and iterate over the lines then to use the :py:meth:`~io.TextIOBase.readline` method. .. function:: writable() Return :py:obj:`True` if the stream supports writing. >>> fh = open('bar.txt', mode='w') >>> fh.writable() True .. function:: write(s) Write the string ``s`` to the stream and return the number of characters written. >>> fh = open('bar.txt', mode='w') >>> fh.write('Hello World') 11 >>> fh.close() The text file ``bar.txt`` contains the following: .. code-block:: text Hello World Goodbye World .. function:: writelines(lines) Write a list of lines to the stream. Line separators are **not** added, so it is usual for each of the lines provided to have a line separator at the end. >>> fh = open('bar.txt', mode='w') >>> msg_lines = ['Hello World\n', 'Goodbye World\n'] >>> fh.writelines(msg_lines) >>> fh.close() The text file ``bar.txt`` contains the following: .. code-block:: text Hello World Goodbye World Text streams don't support full random access. However, they do allow you to query the current position in the stream and return to the start of the stream. .. function:: seek(offset) Change the stream position to the given ``offset``. Don't assume ``offset`` is in bytes or characters (which makes this unsuitable for true random access). Return the new absolute position as an opaque number. >>> fh = open('foo.txt') >>> fh.read(6) 'The fi' >>> fh.read(6) 'rst li' >>> fh.seek(0) # Go back to start of stream 0 >>> fh.read(6) 'The fi' .. function:: tell() Return the current stream position as an opaque number. The number does **not** usually represent a number of bytes in the underlying binary storage. >>> fh = open('foo.txt') >>> fh.read(6) 'The fi' >>> fh.tell() 6 .. _section_heading-Binary_Stream_Operations: Binary Stream Operations ------------------------ The binary stream defines the following operations. For the examples that follow, assume the text file ``foo.bin`` has the following content: .. code-block:: text \x10\x11\x12\x13\n\x14\x15\x16\x17 .. function:: peek([size]) Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested. >>> fh = open('foo.bin', mode='br') >>> fh.peek() b'\x10\x11\x12\x13\n\x14\x15\x16\x17' >>> fh.peek() b'\x10\x11\x12\x13\n\x14\x15\x16\x17' .. function:: read([size]) Read and return ``size`` bytes, or if ``size`` is not given or negative, until :py:term:`EOF` or if the read call would block in non-blocking mode. >>> fh = open('foo.bin', mode='br') >>> fh.read() b'\x10\x11\x12\x13\n\x14\x15\x16\x17' >>> fh.read() b'' >>> fh.seek(0) # Go back to start of stream 0 >>> fh.read(2) b'\x10\x11' .. function:: write(b) Write the bytes-like object, ``b``, and return the number of bytes written. >>> fh = open('bar.bin', mode='bw') >>> fh.write(b'\x20\x21\x22\x23') 4 >>> fh.close() .. function:: seek(offset[, whence]) Change the stream position to the given byte ``offset``. ``offset`` is interpreted relative to the position indicated by ``whence``. The default value for ``whence`` is SEEK_SET. Values for whence are: * SEEK_SET or 0 – start of the stream (the default); ``offset`` should be zero or positive * SEEK_CUR or 1 – current stream position; ``offset`` may be negative * SEEK_END or 2 – end of the stream; ``offset`` is usually negative Return the new absolute position. >>> fh = open('foo.bin', mode='br') >>> fh.seek(2) 2 >>> fh.read(2) b'\x12\x13' >>> fh.seek(3, io.SEEK_CUR) 7 >>> fh.read(2) b'\x16\x17' >>> fh.seek(-4, io.SEEK_END) 5 >>> fh.read(2) b'\x14\x15' .. function:: tell() Return the current stream position. >>> fh = open('foo.bin', mode='br') >>> fh.tell() 0 >>> fh.seek(2, io.SEEK_CUR) 2 >>> fh.tell() 2 >>> fh.read(2) b'\x12\x13' >>> fh.tell() 4 .. admonition:: Try it! :class: TryIt Try the following: * Create a file and write the following multi-lined message into it: .. code-block:: text "Hello World Viva la Pluto" * Add another line to the same file without erasing the old message: .. code-block:: text "supercalifragilisticexpialidocious" * Close the file. * Re-open the file in read mode and print the contents.