Sequence Types - String¶
Indices and tables¶
String¶
MUTABILITY: Immutable
Text sequences can be constructed in Python3 using the str
datatype. The full Unicode character set (U+0000 to U+10FFFF) is supported in Python3 (unicode was not supported in Python2).
There is no char type for single characters. Just create a string.
Note
There is the str
class and there is also an importable package called string
. While it is confusing, the definition and functionality of strings are defined in str
. The string
package contains some additional constants and functions that can be used with strings but does not in itself define the string class.
String literals can be created in a variety of ways:
Using single or double quotes (use the style opposite any embedded quotes). For example:
>>> foo = "I often hear, 'Python is Fun', which is true." >>> foo "I often hear, 'Python is Fun', which is true." >>> foo = 'I often hear, "Python is Fun", which is true.' >>> foo 'I often hear, "Python is Fun", which is true.'
Using triple quotes (either
"""
or'''
), which has auto-line continuation (allowing you to break up a string across multiple lines without having to worry about closing and opening quotes) and which allows you to freely use both single and double quotes, simultaneously. For example:>>> foo = '''This is the 'first' line. This is the "second" line''' >>> foo 'This is the "first" line.\nThis is the "second" line'
If you don’t want the new lines (
\n
) embedded in the string, put a\
at the end of each line segment. For example:>>> foo = '''This is the 'first' line. \ This is the "second" line''' >>> foo 'This is the "first" line.This is the "second" line'
Using single, double or triple quoted strings inside parenthesis (either of
()
,{}
or[]
). When done using parenthesis, Python treats the string like an expression and the new-line as a statement terminator is ignored. For example:>>> foo = ('abcd' 'efgh') >>> foo 'abcdefgh'
If you need a new-line inside a string constructed this way, add a
\n
where appropriate. For example:>>> foo = ("Error number: 5\n" "Error message: Can't open file") >>> foo "Error number: 5\nError message: Can't open file"
Tip
This method of creating strings is VERY handy when you need to improve the readability of the source code or make it fit within a certain line length in a clean way.
Using
chr()
, which returns the character given by the Unicode code point passed to it.>>> chr(65) 'A' >>> chr(0x470) 'Ѱ'
Using the
str
constructor, which returns the string version of the object passed to it. Used this way, it is a type conversion. For example:>>> foo = str(1234) >>> foo '1234'
From a Bytes object (a sequence of integers in the range 0 to 255; covered later in this chapter), using the
bytes.decode()
method. For example:>>> my_bytes = b'0xDEADBEAF' >>> foo = bytes.decode(my_bytes) >>> foo '0xDEADBEAF' >>> type(foo) str
Note
Bytes objects look like strings but are preceded by the character
b
orB
. They can also sometimes be a sequence of byte-sized character escapes. For example:>>> 'Ѱ'.encode() b'\xd1\xb0'
Tip
You can get bytes from a string object using the
str.encode()
method. For example:>>> my_str = '0xDEADBEAF' >>> foo = str.encode(my_str, encoding='utf-8') >>> foo b'0xDEADBEAF' >>> type(foo) bytes
Python supports the typical set of escape sequences (character sequences that start with a \
) that you will find in other languages. Using escape sequences, you can insert Unicode characters in string literals:
Escape | Description |
---|---|
\xhh | Insert the character with the given 8-bit hex value |
\uhhhh | Insert the character with the given 16-bit hex value |
\Uhhhhhhhh | Insert the character with the given 32-bit hex value |
\N{name} | Insert the character with the given Unicode name |
For example:
>>> foo = "\x41 \u27F0 \U0001F61B \N{arc}"
>>> foo
'A ⟰ 😛 ⌒'
If you want to prevent the parser from expanding escape sequences or tripping up on embedded slashes that you want taken literally, you can create a raw string by placing an r
before the opening quote that defines the string. For example:
>>> foo = "a \x62 c \N{yin yang}"
>>> foo
'a b c ☯'
>>> foo = r"a \x62 c \N{yin yang}"
>>> foo
'a \\x62 c \\N{yin yang}'
Tip
Raw strings are very useful when writing regular expressions, which often have many embedded escape sequences.
Try it!
- Write a string of your choice using single and/or double quotes.
- Try embedding strings inside that string, using the alternate style of quotes.
- Try writing a long, multi-line string, using triple quotes. Use a
\
on some lines to see the effect. - Using parentheses to concatenate multiple strings into one.
- Get the ASCII character that corresponds to the value of 72.
- Get the value of the ASCII character “Z”.
- Explicitly convert an integer to a string.
- Create a single string using character escape sequences from the values 0x48, 0x65, 0x6C, 0x6C and 0x6F. Then make it into a raw string. Notice the difference.
String Specific Methods¶
The string class has a rich set of functions which offload the programmer from having to do common, mundane, and sometimes error-prone operations with strings. Below is a brief description of most of them. For full syntax information refer to the Python string method documentation.
-
str.
capitalize
()¶ Return a copy of the string with its first character capitalized and the rest lowercased.
>>> 'foo'.capitalize() 'Foo'
-
str.
center
(width[, fillchar])¶ Return the original string, centered in a string of
width
characters wide. Padding is done using the specifiedfillchar
(default is an ASCII space).>>> 'foo'.center(5) ' foo ' >>> 'foo'.center(5, '-') '-foo-'
-
str.
count
(sub[, start[, end]])¶ Return the number of non-overlapping occurrences of substring
sub
in the range [start, end].>>> 'ababbbabcd'.count('ab') 3 >>> 'ababbbabcd'.count('ab', 4) 1
-
str.
encode
(encoding=”utf-8”, errors=”strict”)¶ Return an encoded version of the string as a
bytes
object. Defaultencoding
is ‘utf-8’.>>> 'foo'.encode() b'foo' >>> 'Ѱ'.encode() b'\xd1\xb0'
-
str.
endswith
(suffix[, start[, end]])¶ Return
True
if the string ends with the specifiedsuffix
, otherwise returnFalse
. suffix can also be a tuple of suffixes to look for.>>> 'foo'.endswith(('oo', 'ar')) True >>> 'bar'.endswith(('oo', 'ar')) True >>> 'baz'.endswith(('oo', 'ar')) False
-
str.
expandtabs
(tabsize=8)¶ Return a copy of the string where all tab characters are replaced by one or more spaces, depending on the current column and the given
tabsize
.>>> 'foo\tbar\tbaz'.expandtabs(4) 'foo bar baz' >>> 'foo\tbar\tbaz'.expandtabs(6) 'foo bar baz'
-
str.
find
(sub[, start[, end]])¶ Return the lowest index in the string where substring
sub
is found within the slice s[start:end].>>> 'foobarbazbob'.find('ob') 2
Tip
Use the
in
operator (described later) to determine if the sub-string exists at all in the string. Only usestr.find()
to get the index. Membership testing is faster with thein
operator and it is more intuitive to read.
-
str.
format
(*args, **kwargs)¶ Returns a copy of the string formatted according to the given arguments.
>>> 'foo{0}{1}'.format('bar', 'baz') 'foobarbaz'
-
str.
index
(sub[, start[, end]])¶ Like
str.find()
, but raisesValueError
when the substring is not found.>>> 'foobarbazbob'.index('cc') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not found
>>> 'foobarbazbob'.find('cc') -1
-
str.
isalnum
()¶ Return
True
if all characters in the string are alphanumeric and there is at least one character,False
otherwise.>>> 'foo'.isalnum() True >>> 'foo3'.isalnum() True >>> '-foo3'.isalnum() False
-
str.
isalpha
()¶ Return
True
if all characters in the string are alphabetic and there is at least one character,False
otherwise.>>> 'foo'.isalpha() True >>> 'foo3'.isalpha() False >>> '-foo3'.isalpha() False
-
str.
isdecimal
()¶ Return
True
if all characters in the string can be used to form a base 10 value and there is at least one character,False
otherwise. For info on Unicode decimals vs digits, refer here.>>> '123'.isdecimal() True >>> '123٣'.isdecimal() True >>> '123٣①'.isdecimal() False
-
str.
isdigit
()¶ Return
True
if all characters in the string are digits (decimal characters and characters that need special handling) and there is at least one character,False
otherwise. For info on Unicode decimals vs digits, refer here.>>> '123'.isdigit() True >>> '123٣'.isdigit() True >>> '123٣①'.isdigit() True
-
str.
isidentifier
()¶ Return
True
if the string is a valid identifier according to the language definition.>>> 'foo'.isidentifier() True >>> '0foo'.isidentifier() False
-
str.
islower
()¶ Return
True
if all cased characters in the string are lowercase and there is at least one cased character,False
otherwise.>>> 'foo'.islower() True >>> 'Foo'.islower() False
-
str.
isnumeric
()¶ Return
True
if all characters in the string are numeric characters, and there is at least one character,False
otherwise.>>> '123'.isnumeric() True >>> '123f'.isnumeric() False
-
str.
isprintable
()¶ Return
True
if all characters in the string are printable or the string is empty,False
otherwise.>>> 'foo'.isprintable() True >>> 'foo\x01'.isprintable() False
-
str.
isspace
()¶ Return
True
if there are only whitespace characters in the string and there is at least one character,False
otherwise.>>> '\t\n '.isspace() True >>> '\t\n -'.isspace() False
-
str.
istitle
()¶ Return
True
if the string is a titlecased string and there is at least one character,False
otherwise.>>> 'Foo'.istitle() True >>> 'FOo'.istitle() False
-
str.
isupper
()¶ Return
True
if all cased characters in the string are uppercase and there is at least one cased character,False
otherwise.>>> 'FOO'.isupper() True >>> 'FOo'.isupper() False
-
str.
join
(iterable)¶ Return a string which is the concatenation of the strings in
iterable
. ATypeError
will be raised if there are any non-string values in iterable, includingbytes
objects. The separator between elements is the string providing this method.>>> ''.join(['Foo', 'Bar']) 'FooBar' >>> '-'.join(['Foo', 'Bar']) 'Foo-Bar' >>> 'then'.join(['Foo', 'Bar']) 'FoothenBar'
-
str.
ljust
(width[, fillchar])¶ Return the string left justified in a string of
width
characters wide. Padding is done using the specifiedfillchar
(default is an ASCII space).>>> 'foo'.ljust(5) 'foo ' >>> 'foo'.ljust(5, '-') 'foo--'
-
str.
lower
()¶ Return a copy of the string with all the cased characters converted to lowercase.
>>> 'FOO'.lower() 'foo' >>> 'Foo'.lower() 'foo'
-
str.
lstrip
([chars])¶ Return a copy of the string with leading characters removed. The
chars
argument is a string specifying the set of characters to be removed. If omitted orNone
, thechars
argument defaults to removing whitespace. Thechars
argument is not a prefix; rather, all combinations of its values are stripped.>>> ' foobarbaz '.lstrip() 'foobarbaz ' >>> ' foobarbaz '.lstrip(' fo') 'barbaz ' >>> ' foobarbaz '.lstrip(' fob') 'arbaz '
-
static
str.
maketrans
(x[, y[, z]])¶ Used to create a translation table that maps characters to characters. The actual translation of a string using this table is done by the
str.translate()
method. A small, by no means exhaustive example of this method is shown below:>>> tbl = str.maketrans("abc", "ghi") >>> "aabbcc 123 xyz".translate(tbl) 'gghhii 123 xyz'
-
str.
partition
(sep)¶ Split the string at the first occurrence of
sep
, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.>>> 'foo-bar-baz'.partition('-') ('foo', '-', 'bar-baz')
-
str.
replace
(old, new[, count])¶ Return a copy of the string with all occurrences of substring
old
replaced bynew
. If the optional argumentcount
is given, only the firstcount
occurrences are replaced.>>> 'Foo McFoo lived in a Bar'.replace('Foo', 'Baz') 'Baz McBaz lived in a Bar' >>> 'Foo McFoo lived in a Bar'.replace('Foo', 'Baz', 1) 'Baz McFoo lived in a Bar'
-
str.
rfind
(sub[, start[, end]])¶ Return the highest index in the string where substring
sub
is found, such thatsub
is contained within s[start:end].>>> 'foobarbazbob'.rfind('ob') 10
Tip
Use the
in
operator (described later) to determine if the sub-string exists at all in the string. Only usestr.find()
to get the index. Membership testing is faster with thein
operator and it is more intuitive to read.
-
str.
rindex
(sub[, start[, end]])¶ Like
rfind()
but raisesValueError
when the substringsub
is not found.>>> 'foobarbazbob'.rindex('cc') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not found
>>> 'foobarbazbob'.rfind('cc') -1
-
str.
rpartition
(sep)¶ Split the string at the last occurrence of
sep
, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.>>> 'foo-bar-baz'.rpartition('-') ('foo-bar', '-', 'baz')
-
str.
rsplit
(sep=None, maxsplit=-1)¶ Return a list of the words in the string, using
sep
as the delimiter string. Ifmaxsplit
is given, at mostmaxsplit
splits are done, the rightmost ones. If sep is not specified orNone
, any whitespace string is a separator.>>> 'foo-bar-baz'.rsplit('-') ['foo', 'bar', 'baz'] >>> 'foo-bar-baz'.rsplit('-', 1) ['foo-bar', 'baz']
-
str.
rstrip
([chars])¶ Return a copy of the string with trailing characters removed. The
chars
argument is a string specifying the set of characters to be removed. If omitted orNone
, thechars
argument defaults to removing whitespace. Thechars
argument is not a suffix; rather, all combinations of its values are stripped.>>> ' foobarbaz '.rstrip() ' foobarbaz' >>> ' foobarbaz '.rstrip(' za') ' foobarb' >>> ' foobarbaz '.rstrip(' zba') ' foobar'
-
str.
split
(sep=None, maxsplit=-1)¶ Return a list of the words in the string, using
sep
as the delimiter string. Ifmaxsplit
is given, at mostmaxsplit
splits are done.>>> 'foo-bar-baz'.split("-") ['foo', 'bar', 'baz'] >>> 'foo--bar-baz'.split("-") ['foo', '', 'bar', 'baz'] >>> 'foo-bar-baz'.split("-", 1) ['foo', 'bar-baz']
If sep is not specified or is
None
, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with aNone
separator returns[]
.>>> 'foo bar baz'.split() ['foo', 'bar', 'baz'] >>> 'foo bar baz'.split() ['foo', 'bar', 'baz'] >>> 'foo bar baz'.split() ['foo', 'bar', 'baz'] >>> ' foo bar baz '.split() ['foo', 'bar', 'baz']
Tip
The
split()
method is that friend of yours who pays for your apartment, buys your food and introduces you to the person who becomes your spouse later in life. Get to know it :)
-
str.
splitlines
([keepends])¶ Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless
keepends
is given andTrue
.>>> 'foo\nbar\rbaz\r\nbif\ffaf'.splitlines() ['foo', 'bar', 'baz', 'bif', 'faf'] >>> 'foo\nbar\rbaz\r\nbif\ffaf'.splitlines(True) ['foo\n', 'bar\r', 'baz\r\n', 'bif\x0c', 'faf']
-
str.
startswith
(prefix[, start[, end]])¶ Return
True
if string starts with the prefix, otherwise returnFalse
.prefix
can also be a tuple of prefixes to look for. With optionalstart
, test string beginning at that position. With optionalend
, stop comparing string at that position.>>> 'foobarbaz'.startswith(('foo', 'baz')) True >>> 'barbazfoo'.startswith(('foo', 'baz')) False >>> 'bazfoobar'.startswith(('foo', 'baz')) True
-
str.
strip
([chars])¶ Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped. The outermost leading and trailing chars argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. A similar action takes place on the trailing end.
>>> ' foobarbaz '.strip() 'foobarbaz' >>> 'foobarbaz '.strip(' za') 'foobarb' >>> ' foobarbaz '.strip(' zba') 'foobar'
-
str.
swapcase
()¶ Return a copy of the string with uppercase characters converted to lowercase and vice versa.
>>> 'FooBarBaz'.swapcase() 'fOObARbAZ'
-
str.
title
()¶ Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase.
>>> 'foo bar baz'.title() 'Foo Bar Baz'
Warning
Unfortunately, this method does not work well with, for example apostrophes.
>>> "it's mine".title() "It'S Mine"
Refer here for discussion and workaround.
-
str.
translate
(table)¶ Return a copy of the string in which each character has been mapped through the given translation table. The table is generated using
str.maketrans()
. A small, by no means exhaustive example of this method is shown below:>>> tbl = str.maketrans("abc", "ghi") >>> "aabbcc 123 xyz".translate(tbl) 'gghhii 123 xyz'
-
str.
upper
()¶ Return a copy of the string with all the cased characters converted to uppercase.
>>> 'foo'.upper() 'FOO'
-
str.
zfill
(width)¶ Return a copy of the string left filled with ASCII ‘0’ digits to make a string of
width
characters wide. A leading sign prefix (‘+’/’-‘) is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to len(s).>>> '649'.zfill(8) '00000649' >>> '-649'.zfill(8) '-0000649'
Try it!
Take some time to experiment with some of the string specific methods that look interesting to you. Use the strings provided in the examples, or, try your own!