Python string literals

I’ve been working on moving a codebase from python 2 to 3 recently and decided to take a fresh look at string literals. In python, string literals can be enclosed by a pair of unescaped single quotes ', a pair of double quotes ", or a pair of triple double quotes """ or single quotes '''. If you want to include a single quote in a single quoted string or a double quote in a double quoted string, you’ll escape them with a \. Triple quotes allow you to include line breaks in the string. So this might look like any of the following:

print('Hello there! How are you?')
print('I\'m fine, how are you?')
print("Not bad. What's the newest thing you're working on?")
print("""Oh, I'm taking a look at something I've used for a while...

String literals.""")

Easy enough. Using ' or " or """ will get you pretty far. But say you’re a Windows user and you try to open a local file from your C: drive.

import pandas as pd
pd.read_csv('C:\Users\matt\data.csv')
  File "<ipython-input-68-d5e41ba94fa6>", line 1
    pd.read_csv('C:\Users\matt\data.csv')
               ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

You’ll see a pretty cryptic error message. The issue here is the \U in the file path. So what’s really going on here? A string escape sequence is a character preceded by a backslash \. There are a number of different string escape sequences that can appear in a string. The listing of standard C escape sequences are listed in the documentation, and include an escaped single quote \', escaped backslash \\, tab\t, linefeed\n, and octal \ooo and hex \xhh values. These can appear in any string or byte literal and will be interpreted as their escaped value. I’ll ignore byte literals (literals preceded by a b) in this post. For string literals only, there are three more special escape sequences.

First, \N{name} will escape a character name in the the Unicode database. For example:

print("\N{PILCROW SIGN} \N{LATIN CAPITAL LETTER A WITH CIRCUMFLEX}")
¶ Â

You can also use 16 bit hex values \uxxxx or 32 bit hex values \Uxxxxxx. For example, to get the same values as above:

print("\U000000B6 \u00C2")

Now let’s talk about literal string prefixes. In the grammar you see the following:

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "u" | "R" | "U" | "f" | "F"
                     | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"

What do each of those string prefixes mean? Let’s go through them all.

Raw string literals

Raw string literals are preceded by an r or R, and you can see that it can be combined with the other prefixes of u or f, in any order or combination of case. A raw string literal will treat a backslash \as a literal character instead of an escape sequence, allowing you to solve the above problem of opening a file in python in C:\Users:

pd.read_csv(r'C:\Users\matt\data.csv')

Explicit Unicode string literals

In Python 3.3, PEP 414 adds support for Python 2’s Unicode literal syntax. This was done so that Python 2 code that does support unicode would not need to be altered to run correctly on Python 3. Note that in Python 2, in order to use the Unicode escape sequences as we showed above, you had to prefix a string literal with a u or U. By adding this back into Python 3, code would not need to be changed, but the existence of the prefix is not required in Python 3 for Unicode escapes to function.

In Python 2:

>>> print "\u00c3"
\u00c3
>>> print u"\u00c3"
Ã

In Python 3, either works:

>>> print("\u00c3")
Ã

>>> print(u"\u00c3")
Ã

Formatted string literals

Finally, the most exciting addition to string literals in Python 3.6 is the formatted string literal. When a string literal is prefixed by an f or F, replacement fields in the string designated by curly braces {} will be evaluated at run time. This makes for a very powerful and easy-to-read way to create strings.

Compare the following:

>>> name='Matt'
>>> hobby='gardening'
>>> print("My name is %s and my hobby is %s" % (name, hobby))
My name is Matt and my hobby is gardening
>>> print(f"My name is {name} and my hobby is {hobby}")
My name is Matt and my hobby is gardening

Which is more readable? There is a lot more to formatted string literals, you can learn more by reading PEP 498. If your codebase is on Python 3.6 or above, you’ll probably want to make use of this option going forward for string formatting.

Leave a Reply

Your email address will not be published. Required fields are marked *