Sequence Types - String
=======================


.. toctree::
   :maxdepth: 1


Indices and tables
------------------

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`


.. _section_heading-String:

String
------
**MUTABILITY:** Immutable

Text sequences can be constructed in Python3 using the :py:class:`str` datatype. The full Unicode character set (U+0000 to U+10FFFF) is supported in Python3 (unicode was not supported in Python2).

There is no char type for single characters. Just create a string.

.. note:: There is the :py:class:`str` class and there is also an importable package called :py:mod:`string`. While it is confusing, the definition and functionality of strings are defined in :py:class:`str`. The :py:mod:`string` package contains some additional constants and functions that can be used with strings but does not in itself define the string class.

String literals can be created in a variety of ways:

* Using single or double quotes (use the style opposite any embedded quotes). For example:

  >>> foo = "I often hear, 'Python is Fun', which is true."
  >>> foo
  "I often hear, 'Python is Fun', which is true."
  >>> foo = 'I often hear, "Python is Fun", which is true.'
  >>> foo
  'I often hear, "Python is Fun", which is true.'


* Using triple quotes (either ``"""`` or ``'''``), which has auto-line continuation (allowing you to break up a string across multiple lines without having to worry about closing and opening quotes) and which allows you to freely use both single and double quotes, simultaneously. For example:

   >>> foo = '''This is the 'first' line.
       This is the "second" line'''
   >>> foo
   'This is the "first" line.\nThis is the "second" line'

  If you don't want the new lines (``\n``) embedded in the string, put a ``\`` at the end of each line segment. For example:

  >>> foo = '''This is the 'first' line. \
      This is the "second" line'''
  >>> foo
  'This is the "first" line.This is the "second" line'


* Using single, double or triple quoted strings **inside** parenthesis (either of ``()``, ``{}`` or ``[]``). When done using parenthesis, Python treats the string like an expression and the new-line as a statement terminator is ignored. For example:

   >>> foo = ('abcd'
              'efgh')
   >>> foo
   'abcdefgh'

  If you need a new-line inside a string constructed this way, add a ``\n`` where appropriate. For example:

  >>> foo = ("Error number: 5\n"
             "Error message: Can't open file")
  >>> foo
  "Error number: 5\nError message: Can't open file"

  .. tip:: This method of creating strings is VERY handy when you need to improve the readability of the source code or make it fit within a certain line length in a clean way.


* Using :py:func:`chr`, which returns the character given by the Unicode code point passed to it.

   >>> chr(65)
   'A'
   >>> chr(0x470)
   'Ѱ'

  .. tip:: Use the :py:func:`ord` function to convert a character into it's Unicode code point (i.e. the opposite of :py:func:`chr`).


* Using the :py:class:`str` constructor, which returns the string version of the object passed to it. Used this way, it is a type conversion. For example:

   >>> foo = str(1234)
   >>> foo
   '1234'


* From a Bytes object (a sequence of integers in the range 0 to 255; covered later in this chapter), using the :py:func:`bytes.decode` method. For example:

  >>> my_bytes = b'0xDEADBEAF'
  >>> foo = bytes.decode(my_bytes)
  >>> foo
  '0xDEADBEAF'
  >>> type(foo)
  str

  .. note:: Bytes objects look like strings but are preceded by the character ``b`` or ``B``. They can also sometimes be a sequence of byte-sized character escapes. For example:

     >>> 'Ѱ'.encode()
     b'\xd1\xb0'

  .. tip:: You can get bytes from a string object using the :py:func:`str.encode` method. For example:

     >>> my_str = '0xDEADBEAF'
     >>> foo = str.encode(my_str, encoding='utf-8')
     >>> foo
     b'0xDEADBEAF'
     >>> type(foo)
     bytes


Python supports the typical set of escape sequences (character sequences that start with a ``\``) that you will find in other languages. Using escape sequences, you can insert Unicode characters in string literals:

.. _table-Python_String_Literal_Character_Escapes:

.. table:: Table of Python String Literal Character Escapes

   +-------------+------------------------------------------------------+
   | Escape      | Description                                          |
   +=============+======================================================+
   | \\xhh       | Insert the character with the given 8-bit hex value  |
   +-------------+------------------------------------------------------+
   | \\uhhhh     | Insert the character with the given 16-bit hex value |
   +-------------+------------------------------------------------------+
   | \\Uhhhhhhhh | Insert the character with the given 32-bit hex value |
   +-------------+------------------------------------------------------+
   | \\N{name}   | Insert the character with the given Unicode name     |
   +-------------+------------------------------------------------------+

For example:

   >>> foo = "\x41 \u27F0 \U0001F61B \N{arc}"
   >>> foo
   'A ⟰ 😛 ⌒'

If you want to prevent the parser from expanding escape sequences or tripping up on embedded slashes that you want taken literally, you can create a raw string by placing an ``r`` before the opening quote that defines the string. For example:

   >>> foo = "a \x62 c \N{yin yang}"
   >>> foo
   'a b c ☯'
   >>> foo = r"a \x62 c \N{yin yang}"
   >>> foo
   'a \\x62 c \\N{yin yang}'

.. tip:: Raw strings are very useful when writing regular expressions, which often have many embedded escape sequences.

.. admonition:: Try it!
   :class: TryIt

   * Write a string of your choice using single and/or double quotes.

   * Try embedding strings inside that string, using the alternate style of quotes.

   * Try writing a long, multi-line string, using triple quotes. Use a ``\`` on some lines to see the effect.

   * Using parentheses to concatenate multiple strings into one.

   * Get the ASCII character that corresponds to the value of 72.

   * Get the value of the ASCII character "Z".

   * Explicitly convert an integer to a string.

   * Create a single string using character escape sequences from the values 0x48, 0x65, 0x6C, 0x6C and 0x6F. Then make it into a raw string. Notice the difference.


.. _section_heading-String_Specific_Methods:

String Specific Methods
^^^^^^^^^^^^^^^^^^^^^^^

The string class has a rich set of functions which offload the programmer from having to do common, mundane, and sometimes error-prone operations with strings. Below is a brief description of most of them. For full syntax information refer to the Python string method `documentation <PythonStringMethods_>`_.


.. py:function:: str.capitalize()

   Return a copy of the string with its first character capitalized and the rest lowercased.

   >>> 'foo'.capitalize()
   'Foo'

.. py:function:: str.center(width[, fillchar])

   Return the original string, centered in a string of ``width`` characters wide. Padding is done using the specified ``fillchar`` (default is an ASCII space).

   >>> 'foo'.center(5)
   ' foo '
   >>> 'foo'.center(5, '-')
   '-foo-'

.. py:function:: str.count(sub[, start[, end]])

   Return the number of non-overlapping occurrences of substring ``sub`` in the range [start, end].

   >>> 'ababbbabcd'.count('ab')
   3
   >>> 'ababbbabcd'.count('ab', 4)
   1

.. py:function:: str.encode(encoding=”utf-8”, errors=”strict”)

   Return an encoded version of the string as a :py:class:`bytes` object. Default ``encoding`` is 'utf-8'.

   >>> 'foo'.encode()
   b'foo'
   >>> 'Ѱ'.encode()
   b'\xd1\xb0'


.. py:function:: str.endswith(suffix[, start[, end]])

   Return :py:obj:`True` if the string ends with the specified ``suffix``, otherwise return :py:obj:`False`. suffix can also be a tuple of suffixes to look for.

   >>> 'foo'.endswith(('oo', 'ar'))
   True
   >>> 'bar'.endswith(('oo', 'ar'))
   True
   >>> 'baz'.endswith(('oo', 'ar'))
   False


.. py:function:: str.expandtabs(tabsize=8)

   Return a copy of the string where all tab characters are replaced by one or more spaces, depending on the current column and the given ``tabsize``.

   >>> 'foo\tbar\tbaz'.expandtabs(4)
   'foo bar baz'
   >>> 'foo\tbar\tbaz'.expandtabs(6)
   'foo   bar   baz'


.. py:function:: str.find(sub[, start[, end]])

   Return the lowest **index** in the string where substring ``sub`` is found within the slice s[start:end].

   >>> 'foobarbazbob'.find('ob')
   2

   .. tip:: Use the :keyword:`in` operator (described later) to determine if the sub-string exists at all in the string. Only use :py:meth:`str.find` to get the index. Membership testing is faster with the :keyword:`in` operator and it is more intuitive to read.


.. py:function:: str.format(*args, **kwargs)

   Returns a copy of the string formatted according to the given arguments.

   >>> 'foo{0}{1}'.format('bar', 'baz')
   'foobarbaz'


.. py:function:: str.index(sub[, start[, end]])

   Like :py:meth:`str.find`, but raises :py:exc:`ValueError` when the substring is not found.

   >>> 'foobarbazbob'.index('cc')
   Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   ValueError: substring not found

   >>> 'foobarbazbob'.find('cc')
   -1

.. py:function:: str.isalnum()

   Return :py:obj:`True` if all characters in the string are alphanumeric and there is at least one character, :py:obj:`False` otherwise.

   >>> 'foo'.isalnum()
   True
   >>> 'foo3'.isalnum()
   True
   >>> '-foo3'.isalnum()
   False


.. py:function:: str.isalpha()

   Return :py:obj:`True` if all characters in the string are alphabetic and there is at least one character, :py:obj:`False` otherwise.

   >>> 'foo'.isalpha()
   True
   >>> 'foo3'.isalpha()
   False
   >>> '-foo3'.isalpha()
   False


.. py:function:: str.isdecimal()

   Return :py:obj:`True` if all characters in the string can be used to form a base 10 value and there is at least one character, :py:obj:`False` otherwise. For info on Unicode decimals vs digits, refer `here <NumeralsByNumericProperty_>`_.

   >>> '123'.isdecimal()
   True
   >>> '123٣'.isdecimal()
   True
   >>> '123٣①'.isdecimal()
   False


.. py:function:: str.isdigit()

   Return :py:obj:`True` if all characters in the string are digits (decimal characters and characters that need special handling) and there is at least one character, :py:obj:`False` otherwise. For info on Unicode decimals vs digits, refer `here <NumeralsByNumericProperty_>`_.

   >>> '123'.isdigit()
   True
   >>> '123٣'.isdigit()
   True
   >>> '123٣①'.isdigit()
   True


.. py:function:: str.isidentifier()

   Return :py:obj:`True` if the string is a valid identifier according to the language definition.

   >>> 'foo'.isidentifier()
   True
   >>> '0foo'.isidentifier()
   False


.. py:function:: str.islower()

   Return :py:obj:`True` if all cased characters in the string are lowercase and there is at least one cased character, :py:obj:`False` otherwise.

   >>> 'foo'.islower()
   True
   >>> 'Foo'.islower()
   False


.. py:function:: str.isnumeric()

   Return :py:obj:`True` if all characters in the string are numeric characters, and there is at least one character, :py:obj:`False` otherwise.

   >>> '123'.isnumeric()
   True
   >>> '123f'.isnumeric()
   False


.. py:function:: str.isprintable()

   Return :py:obj:`True` if all characters in the string are printable or the string is empty, :py:obj:`False` otherwise.

   >>> 'foo'.isprintable()
   True
   >>> 'foo\x01'.isprintable()
   False


.. py:function:: str.isspace()

   Return :py:obj:`True` if there are only whitespace characters in the string and there is at least one character, :py:obj:`False` otherwise.

   >>> '\t\n '.isspace()
   True
   >>> '\t\n -'.isspace()
   False


.. py:function:: str.istitle()

   Return :py:obj:`True` if the string is a titlecased string and there is at least one character, :py:obj:`False` otherwise.

   >>> 'Foo'.istitle()
   True
   >>> 'FOo'.istitle()
   False


.. py:function:: str.isupper()

   Return :py:obj:`True` if all cased characters in the string are uppercase and there is at least one cased character, :py:obj:`False` otherwise.

   >>> 'FOO'.isupper()
   True
   >>> 'FOo'.isupper()
   False


.. py:function:: str.join(iterable)

   Return a string which is the concatenation of the strings in ``iterable``. A :py:exc:`TypeError` will be raised if there are any non-string values in iterable, including :py:class:`bytes` objects. The separator between elements is the string providing this method.

   >>> ''.join(['Foo', 'Bar'])
   'FooBar'
   >>> '-'.join(['Foo', 'Bar'])
   'Foo-Bar'
   >>> 'then'.join(['Foo', 'Bar'])
   'FoothenBar'


.. py:function:: str.ljust(width[, fillchar])

   Return the string left justified in a string of ``width`` characters wide. Padding is done using the specified ``fillchar`` (default is an ASCII space).

   >>> 'foo'.ljust(5)
   'foo  '
   >>> 'foo'.ljust(5, '-')
   'foo--'


.. py:function:: str.lower()

   Return a copy of the string with all the cased characters converted to lowercase.

   >>> 'FOO'.lower()
   'foo'
   >>> 'Foo'.lower()
   'foo'


.. py:function:: str.lstrip([chars])

   Return a copy of the string with leading characters removed. The ``chars`` argument is a string specifying the set of characters to be removed. If omitted or :py:obj:`None`, the ``chars`` argument defaults to removing whitespace. The ``chars`` argument is not a prefix; rather, all combinations of its values are stripped.

   >>> '  foobarbaz  '.lstrip()
   'foobarbaz  '
   >>> '  foobarbaz  '.lstrip(' fo')
   'barbaz  '
   >>> '  foobarbaz  '.lstrip(' fob')
   'arbaz  '


.. py:staticmethod:: str.maketrans(x[, y[, z]])

   Used to create a translation **table** that maps characters to characters. The actual translation of a string using this table is done by the :py:meth:`str.translate` method. A small, by no means exhaustive example of this method is shown below:

   >>> tbl = str.maketrans("abc", "ghi")
   >>> "aabbcc 123 xyz".translate(tbl)
   'gghhii 123 xyz'


.. py:function:: str.partition(sep)

   Split the string at the first occurrence of ``sep``, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.

   >>> 'foo-bar-baz'.partition('-')
   ('foo', '-', 'bar-baz')


.. py:function:: str.replace(old, new[, count])

   Return a copy of the string with all occurrences of substring ``old`` replaced by ``new``. If the optional argument ``count`` is given, only the first ``count`` occurrences are replaced.

   >>> 'Foo McFoo lived in a Bar'.replace('Foo', 'Baz')
   'Baz McBaz lived in a Bar'
   >>> 'Foo McFoo lived in a Bar'.replace('Foo', 'Baz', 1)
   'Baz McFoo lived in a Bar'


.. py:function:: str.rfind(sub[, start[, end]])

   Return the highest index in the string where substring ``sub`` is found, such that ``sub`` is contained within s[start:end].

   >>> 'foobarbazbob'.rfind('ob')
   10

   .. tip:: Use the :keyword:`in` operator (described later) to determine if the sub-string exists at all in the string. Only use :py:meth:`str.find` to get the index. Membership testing is faster with the :keyword:`in` operator and it is more intuitive to read.


.. py:function:: str.rindex(sub[, start[, end]])

   Like :py:meth:`rfind` but raises :py:exc:`ValueError` when the substring ``sub`` is not found.

   >>> 'foobarbazbob'.rindex('cc')
   Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   ValueError: substring not found

   >>> 'foobarbazbob'.rfind('cc')
   -1


.. py:function:: str.rpartition(sep)

   Split the string at the last occurrence of ``sep``, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.

   >>> 'foo-bar-baz'.rpartition('-')
   ('foo-bar', '-', 'baz')


.. py:function:: str.rsplit(sep=None, maxsplit=-1)

   Return a list of the words in the string, using ``sep`` as the delimiter string. If ``maxsplit`` is given, at most ``maxsplit`` splits are done, the rightmost ones. If sep is not specified or :py:obj:`None`, any whitespace string is a separator.

   >>> 'foo-bar-baz'.rsplit('-')
   ['foo', 'bar', 'baz']
   >>> 'foo-bar-baz'.rsplit('-', 1)
   ['foo-bar', 'baz']


.. py:function:: str.rstrip([chars])

   Return a copy of the string with trailing characters removed. The ``chars`` argument is a string specifying the set of characters to be removed. If omitted or :py:obj:`None`, the ``chars`` argument defaults to removing whitespace. The ``chars`` argument is not a suffix; rather, all combinations of its values are stripped.

   >>> '  foobarbaz  '.rstrip()
   '  foobarbaz'
   >>> '  foobarbaz  '.rstrip(' za')
   '  foobarb'
   >>> '  foobarbaz  '.rstrip(' zba')
   '  foobar'


.. py:function:: str.split(sep=None, maxsplit=-1)

   Return a list of the words in the string, using ``sep`` as the delimiter string. If ``maxsplit`` is given, at most ``maxsplit`` splits are done.

   >>> 'foo-bar-baz'.split("-")
   ['foo', 'bar', 'baz']
   >>> 'foo--bar-baz'.split("-")
   ['foo', '', 'bar', 'baz']
   >>> 'foo-bar-baz'.split("-", 1)
   ['foo', 'bar-baz']

   If sep is not specified or is :py:obj:`None`, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a :py:obj:`None` separator returns ``[]``.

   >>> 'foo bar baz'.split()
   ['foo', 'bar', 'baz']
   >>> 'foo  bar  baz'.split()
   ['foo', 'bar', 'baz']
   >>> 'foo  bar        baz'.split()
   ['foo', 'bar', 'baz']
   >>> '       foo  bar        baz      '.split()
   ['foo', 'bar', 'baz']


   .. tip:: The :py:meth:`split` method is that friend of yours who pays for your apartment, buys your food and introduces you to the person who becomes your spouse later in life. Get to know it :)


.. py:function:: str.splitlines([keepends])

   Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless ``keepends`` is given and :py:obj:`True`.

   >>> 'foo\nbar\rbaz\r\nbif\ffaf'.splitlines()
   ['foo', 'bar', 'baz', 'bif', 'faf']
   >>> 'foo\nbar\rbaz\r\nbif\ffaf'.splitlines(True)
   ['foo\n', 'bar\r', 'baz\r\n', 'bif\x0c', 'faf']


.. py:function:: str.startswith(prefix[, start[, end]])

   Return :py:obj:`True` if string starts with the prefix, otherwise return :py:obj:`False`. ``prefix`` can also be a tuple of prefixes to look for. With optional ``start``, test string beginning at that position. With optional ``end``, stop comparing string at that position.

   >>> 'foobarbaz'.startswith(('foo', 'baz'))
   True
   >>> 'barbazfoo'.startswith(('foo', 'baz'))
   False
   >>> 'bazfoobar'.startswith(('foo', 'baz'))
   True


.. py:function:: str.strip([chars])

   Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped. The outermost leading and trailing chars argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. A similar action takes place on the trailing end.

   >>> '  foobarbaz  '.strip()
   'foobarbaz'
   >>> 'foobarbaz  '.strip(' za')
   'foobarb'
   >>> '  foobarbaz  '.strip(' zba')
   'foobar'


.. py:function:: str.swapcase()

   Return a copy of the string with uppercase characters converted to lowercase and vice versa.

   >>> 'FooBarBaz'.swapcase()
   'fOObARbAZ'


.. py:function:: str.title()

   Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase.

   >>> 'foo bar baz'.title()
   'Foo Bar Baz'

   .. warning:: Unfortunately, this method does not work well with, for example apostrophes.

      >>> "it's mine".title()
      "It'S Mine"

      Refer `here <StringTitleWorkaround_>`_ for discussion and workaround.


.. py:function:: str.translate(table)

   Return a copy of the string in which each character has been mapped through the given translation table. The table is generated using :py:meth:`str.maketrans`. A small, by no means exhaustive example of this method is shown below:

   >>> tbl = str.maketrans("abc", "ghi")
   >>> "aabbcc 123 xyz".translate(tbl)
   'gghhii 123 xyz'


.. py:function:: str.upper()

   Return a copy of the string with all the cased characters converted to uppercase.

   >>> 'foo'.upper()
   'FOO'


.. py:function:: str.zfill(width)

   Return a copy of the string left filled with ASCII '0' digits to make a string of ``width`` characters wide. A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to len(s).

   >>> '649'.zfill(8)
   '00000649'
   >>> '-649'.zfill(8)
   '-0000649'


.. admonition:: Try it!
   :class: TryIt

   Take some time to experiment with some of the string specific methods that look interesting to you. Use the strings provided in the examples, or, try your own!


.. _PythonStringMethods: https://docs.python.org/3.5/library/stdtypes.html#string-methods
.. _NumeralsByNumericProperty: https://en.wikipedia.org/wiki/Numerals_in_Unicode#Numerals_by_numeric_property

.. _StringTitleWorkaround: https://docs.python.org/3.5/library/stdtypes.html#str.title