Sequence Types - String¶

Indices and tables¶

String¶

MUTABILITY: Immutable

Text sequences can be constructed in Python3 using the str datatype. The full Unicode character set (U+0000 to U+10FFFF) is supported in Python3 (unicode was not supported in Python2).

There is no char type for single characters. Just create a string.

Note

There is the str class and there is also an importable package called string. While it is confusing, the definition and functionality of strings are defined in str. The string package contains some additional constants and functions that can be used with strings but does not in itself define the string class.

String literals can be created in a variety of ways:

Using single or double quotes (use the style opposite any embedded quotes). For example:

>>> foo = "I often hear, 'Python is Fun', which is true."
>>> foo
"I often hear, 'Python is Fun', which is true."
>>> foo = 'I often hear, "Python is Fun", which is true.'
>>> foo
'I often hear, "Python is Fun", which is true.'

Using triple quotes (either """ or '''), which has auto-line continuation (allowing you to break up a string across multiple lines without having to worry about closing and opening quotes) and which allows you to freely use both single and double quotes, simultaneously. For example:
```
>>> foo = '''This is the 'first' line.
    This is the "second" line'''
>>> foo
'This is the "first" line.\nThis is the "second" line'
```
If you don’t want the new lines (\n) embedded in the string, put a \ at the end of each line segment. For example:
```
>>> foo = '''This is the 'first' line. \
    This is the "second" line'''
>>> foo
'This is the "first" line.This is the "second" line'
```
Using single, double or triple quoted strings inside parenthesis (either of (), {} or []). When done using parenthesis, Python treats the string like an expression and the new-line as a statement terminator is ignored. For example:
```
>>> foo = ('abcd'
           'efgh')
>>> foo
'abcdefgh'
```
If you need a new-line inside a string constructed this way, add a \n where appropriate. For example:
```
>>> foo = ("Error number: 5\n"
           "Error message: Can't open file")
>>> foo
"Error number: 5\nError message: Can't open file"
```
Tip

This method of creating strings is VERY handy when you need to improve the readability of the source code or make it fit within a certain line length in a clean way.
Using chr(), which returns the character given by the Unicode code point passed to it.
```
>>> chr(65)
'A'
>>> chr(0x470)
'Ѱ'
```
Tip

Use the ord() function to convert a character into it’s Unicode code point (i.e. the opposite of chr()).
Using the str constructor, which returns the string version of the object passed to it. Used this way, it is a type conversion. For example:
```
>>> foo = str(1234)
>>> foo
'1234'
```
From a Bytes object (a sequence of integers in the range 0 to 255; covered later in this chapter), using the bytes.decode() method. For example:
```
>>> my_bytes = b'0xDEADBEAF'
>>> foo = bytes.decode(my_bytes)
>>> foo
'0xDEADBEAF'
>>> type(foo)
str
```
Note

Bytes objects look like strings but are preceded by the character b or B. They can also sometimes be a sequence of byte-sized character escapes. For example:
```
>>> 'Ѱ'.encode()
b'\xd1\xb0'
```
Tip

You can get bytes from a string object using the str.encode() method. For example:
```
>>> my_str = '0xDEADBEAF'
>>> foo = str.encode(my_str, encoding='utf-8')
>>> foo
b'0xDEADBEAF'
>>> type(foo)
bytes
```

Python supports the typical set of escape sequences (character sequences that start with a \) that you will find in other languages. Using escape sequences, you can insert Unicode characters in string literals:

Table 2 Table of Python String Literal Character Escapes¶
Escape	Description
\xhh	Insert the character with the given 8-bit hex value
\uhhhh	Insert the character with the given 16-bit hex value
\Uhhhhhhhh	Insert the character with the given 32-bit hex value
\N{name}	Insert the character with the given Unicode name

For example:

>>> foo = "\x41 \u27F0 \U0001F61B \N{arc}"
>>> foo
'A ⟰ 😛 ⌒'

If you want to prevent the parser from expanding escape sequences or tripping up on embedded slashes that you want taken literally, you can create a raw string by placing an r before the opening quote that defines the string. For example:

>>> foo = "a \x62 c \N{yin yang}"
>>> foo
'a b c ☯'
>>> foo = r"a \x62 c \N{yin yang}"
>>> foo
'a \\x62 c \\N{yin yang}'

Tip

Raw strings are very useful when writing regular expressions, which often have many embedded escape sequences.

Try it!

Write a string of your choice using single and/or double quotes.
Try embedding strings inside that string, using the alternate style of quotes.
Try writing a long, multi-line string, using triple quotes. Use a \ on some lines to see the effect.
Using parentheses to concatenate multiple strings into one.
Get the ASCII character that corresponds to the value of 72.
Get the value of the ASCII character “Z”.
Explicitly convert an integer to a string.
Create a single string using character escape sequences from the values 0x48, 0x65, 0x6C, 0x6C and 0x6F. Then make it into a raw string. Notice the difference.

String Specific Methods¶

The string class has a rich set of functions which offload the programmer from having to do common, mundane, and sometimes error-prone operations with strings. Below is a brief description of most of them. For full syntax information refer to the Python string method documentation.

str.capitalize()¶

Return a copy of the string with its first character capitalized and the rest lowercased.

>>> 'foo'.capitalize()
'Foo'

str.center(width[, fillchar])¶

Return the original string, centered in a string of width characters wide. Padding is done using the specified fillchar (default is an ASCII space).

>>> 'foo'.center(5)
' foo '
>>> 'foo'.center(5, '-')
'-foo-'

str.count(sub[, start[, end]])¶

Return the number of non-overlapping occurrences of substring sub in the range [start, end].

>>> 'ababbbabcd'.count('ab')
3
>>> 'ababbbabcd'.count('ab', 4)
1

str.encode(encoding=”utf-8”, errors=”strict”)¶

Return an encoded version of the string as a bytes object. Default encoding is ‘utf-8’.

>>> 'foo'.encode()
b'foo'
>>> 'Ѱ'.encode()
b'\xd1\xb0'

str.endswith(suffix[, start[, end]])¶

Return True if the string ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for.

>>> 'foo'.endswith(('oo', 'ar'))
True
>>> 'bar'.endswith(('oo', 'ar'))
True
>>> 'baz'.endswith(('oo', 'ar'))
False

str.expandtabs(tabsize=8)¶

Return a copy of the string where all tab characters are replaced by one or more spaces, depending on the current column and the given tabsize.

>>> 'foo\tbar\tbaz'.expandtabs(4)
'foo bar baz'
>>> 'foo\tbar\tbaz'.expandtabs(6)
'foo   bar   baz'

str.find(sub[, start[, end]])¶

Return the lowest index in the string where substring sub is found within the slice s[start:end].

>>> 'foobarbazbob'.find('ob')
2

Tip

Use the in operator (described later) to determine if the sub-string exists at all in the string. Only use str.find() to get the index. Membership testing is faster with the in operator and it is more intuitive to read.

str.format(*args, **kwargs)¶

Returns a copy of the string formatted according to the given arguments.

>>> 'foo{0}{1}'.format('bar', 'baz')
'foobarbaz'

str.index(sub[, start[, end]])¶

Like str.find(), but raises ValueError when the substring is not found.

>>> 'foobarbazbob'.index('cc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found

>>> 'foobarbazbob'.find('cc')
-1

str.isalnum()¶

Return True if all characters in the string are alphanumeric and there is at least one character, False otherwise.

>>> 'foo'.isalnum()
True
>>> 'foo3'.isalnum()
True
>>> '-foo3'.isalnum()
False

str.isalpha()¶

Return True if all characters in the string are alphabetic and there is at least one character, False otherwise.

>>> 'foo'.isalpha()
True
>>> 'foo3'.isalpha()
False
>>> '-foo3'.isalpha()
False

str.isdecimal()¶

Return True if all characters in the string can be used to form a base 10 value and there is at least one character, False otherwise. For info on Unicode decimals vs digits, refer here.

>>> '123'.isdecimal()
True
>>> '123٣'.isdecimal()
True
>>> '123٣①'.isdecimal()
False

str.isdigit()¶

Return True if all characters in the string are digits (decimal characters and characters that need special handling) and there is at least one character, False otherwise. For info on Unicode decimals vs digits, refer here.

>>> '123'.isdigit()
True
>>> '123٣'.isdigit()
True
>>> '123٣①'.isdigit()
True

str.isidentifier()¶

Return True if the string is a valid identifier according to the language definition.

>>> 'foo'.isidentifier()
True
>>> '0foo'.isidentifier()
False

str.islower()¶

Return True if all cased characters in the string are lowercase and there is at least one cased character, False otherwise.

>>> 'foo'.islower()
True
>>> 'Foo'.islower()
False

str.isnumeric()¶

Return True if all characters in the string are numeric characters, and there is at least one character, False otherwise.

>>> '123'.isnumeric()
True
>>> '123f'.isnumeric()
False

str.isprintable()¶

Return True if all characters in the string are printable or the string is empty, False otherwise.

>>> 'foo'.isprintable()
True
>>> 'foo\x01'.isprintable()
False

str.isspace()¶

Return True if there are only whitespace characters in the string and there is at least one character, False otherwise.

>>> '\t\n '.isspace()
True
>>> '\t\n -'.isspace()
False

str.istitle()¶

Return True if the string is a titlecased string and there is at least one character, False otherwise.

>>> 'Foo'.istitle()
True
>>> 'FOo'.istitle()
False

str.isupper()¶

Return True if all cased characters in the string are uppercase and there is at least one cased character, False otherwise.

>>> 'FOO'.isupper()
True
>>> 'FOo'.isupper()
False

str.join(iterable)¶

Return a string which is the concatenation of the strings in iterable. A TypeError will be raised if there are any non-string values in iterable, including bytes objects. The separator between elements is the string providing this method.

>>> ''.join(['Foo', 'Bar'])
'FooBar'
>>> '-'.join(['Foo', 'Bar'])
'Foo-Bar'
>>> 'then'.join(['Foo', 'Bar'])
'FoothenBar'

str.ljust(width[, fillchar])¶

Return the string left justified in a string of width characters wide. Padding is done using the specified fillchar (default is an ASCII space).

>>> 'foo'.ljust(5)
'foo  '
>>> 'foo'.ljust(5, '-')
'foo--'

str.lower()¶

Return a copy of the string with all the cased characters converted to lowercase.

>>> 'FOO'.lower()
'foo'
>>> 'Foo'.lower()
'foo'

str.lstrip([chars])¶

Return a copy of the string with leading characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix; rather, all combinations of its values are stripped.

>>> '  foobarbaz  '.lstrip()
'foobarbaz  '
>>> '  foobarbaz  '.lstrip(' fo')
'barbaz  '
>>> '  foobarbaz  '.lstrip(' fob')
'arbaz  '

static str.maketrans(x[, y[, z]])¶

Used to create a translation table that maps characters to characters. The actual translation of a string using this table is done by the str.translate() method. A small, by no means exhaustive example of this method is shown below:

>>> tbl = str.maketrans("abc", "ghi")
>>> "aabbcc 123 xyz".translate(tbl)
'gghhii 123 xyz'

str.partition(sep)¶

Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.

>>> 'foo-bar-baz'.partition('-')
('foo', '-', 'bar-baz')

str.replace(old, new[, count])¶

Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

>>> 'Foo McFoo lived in a Bar'.replace('Foo', 'Baz')
'Baz McBaz lived in a Bar'
>>> 'Foo McFoo lived in a Bar'.replace('Foo', 'Baz', 1)
'Baz McFoo lived in a Bar'

str.rfind(sub[, start[, end]])¶

Return the highest index in the string where substring sub is found, such that sub is contained within s[start:end].

>>> 'foobarbazbob'.rfind('ob')
10

Tip

Use the in operator (described later) to determine if the sub-string exists at all in the string. Only use str.find() to get the index. Membership testing is faster with the in operator and it is more intuitive to read.

str.rindex(sub[, start[, end]])¶

Like rfind() but raises ValueError when the substring sub is not found.

>>> 'foobarbazbob'.rindex('cc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found

>>> 'foobarbazbob'.rfind('cc')
-1

str.rpartition(sep)¶

Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.

>>> 'foo-bar-baz'.rpartition('-')
('foo-bar', '-', 'baz')

str.rsplit(sep=None, maxsplit=-1)¶

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done, the rightmost ones. If sep is not specified or None, any whitespace string is a separator.

>>> 'foo-bar-baz'.rsplit('-')
['foo', 'bar', 'baz']
>>> 'foo-bar-baz'.rsplit('-', 1)
['foo-bar', 'baz']

str.rstrip([chars])¶

Return a copy of the string with trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a suffix; rather, all combinations of its values are stripped.

>>> '  foobarbaz  '.rstrip()
'  foobarbaz'
>>> '  foobarbaz  '.rstrip(' za')
'  foobarb'
>>> '  foobarbaz  '.rstrip(' zba')
'  foobar'

str.split(sep=None, maxsplit=-1)¶

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done.

>>> 'foo-bar-baz'.split("-")
['foo', 'bar', 'baz']
>>> 'foo--bar-baz'.split("-")
['foo', '', 'bar', 'baz']
>>> 'foo-bar-baz'.split("-", 1)
['foo', 'bar-baz']

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

>>> 'foo bar baz'.split()
['foo', 'bar', 'baz']
>>> 'foo  bar  baz'.split()
['foo', 'bar', 'baz']
>>> 'foo  bar        baz'.split()
['foo', 'bar', 'baz']
>>> '       foo  bar        baz      '.split()
['foo', 'bar', 'baz']

Tip

The split() method is that friend of yours who pays for your apartment, buys your food and introduces you to the person who becomes your spouse later in life. Get to know it :)

str.splitlines([keepends])¶

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and True.

>>> 'foo\nbar\rbaz\r\nbif\ffaf'.splitlines()
['foo', 'bar', 'baz', 'bif', 'faf']
>>> 'foo\nbar\rbaz\r\nbif\ffaf'.splitlines(True)
['foo\n', 'bar\r', 'baz\r\n', 'bif\x0c', 'faf']

str.startswith(prefix[, start[, end]])¶

Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

>>> 'foobarbaz'.startswith(('foo', 'baz'))
True
>>> 'barbazfoo'.startswith(('foo', 'baz'))
False
>>> 'bazfoobar'.startswith(('foo', 'baz'))
True

str.strip([chars])¶

Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped. The outermost leading and trailing chars argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. A similar action takes place on the trailing end.

>>> '  foobarbaz  '.strip()
'foobarbaz'
>>> 'foobarbaz  '.strip(' za')
'foobarb'
>>> '  foobarbaz  '.strip(' zba')
'foobar'

str.swapcase()¶

Return a copy of the string with uppercase characters converted to lowercase and vice versa.

>>> 'FooBarBaz'.swapcase()
'fOObARbAZ'

str.title()¶

Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase.

>>> 'foo bar baz'.title()
'Foo Bar Baz'

Warning

Unfortunately, this method does not work well with, for example apostrophes.

>>> "it's mine".title()
"It'S Mine"

Refer here for discussion and workaround.

str.translate(table)¶

Return a copy of the string in which each character has been mapped through the given translation table. The table is generated using str.maketrans(). A small, by no means exhaustive example of this method is shown below:

>>> tbl = str.maketrans("abc", "ghi")
>>> "aabbcc 123 xyz".translate(tbl)
'gghhii 123 xyz'

str.upper()¶

Return a copy of the string with all the cased characters converted to uppercase.

>>> 'foo'.upper()
'FOO'

str.zfill(width)¶

Return a copy of the string left filled with ASCII ‘0’ digits to make a string of width characters wide. A leading sign prefix (‘+’/’-‘) is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to len(s).

>>> '649'.zfill(8)
'00000649'
>>> '-649'.zfill(8)
'-0000649'

Try it!

Take some time to experiment with some of the string specific methods that look interesting to you. Use the strings provided in the examples, or, try your own!