Python string literals

I’ve been working on moving a codebase from python 2 to 3 recently and decided to take a fresh look at string literals. In python, string literals can be enclosed by a pair of unescaped single quotes ', a pair of double quotes ", or a pair of triple double quotes """ or single quotes '''. If you want to include a single quote in a single quoted string or a double quote in a double quoted string, you’ll escape them with a \. Triple quotes allow you to include line breaks in the string. So this might look like any of the following:

print('Hello there! How are you?')
print('I\'m fine, how are you?')
print("Not bad. What's the newest thing you're working on?")
print("""Oh, I'm taking a look at something I've used for a while...

String literals.""")

Easy enough. Using ' or " or """ will get you pretty far. But say you’re a Windows user and you try to open a local file from your C: drive.

import pandas as pd
pd.read_csv('C:\Users\matt\data.csv')
  File "<ipython-input-68-d5e41ba94fa6>", line 1
    pd.read_csv('C:\Users\matt\data.csv')
               ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

You’ll see a pretty cryptic error message. The issue here is the \U in the file path. So what’s really going on here? A string escape sequence is a character preceded by a backslash \. There are a number of different string escape sequences that can appear in a string. The listing of standard C escape sequences are listed in the documentation, and include an escaped single quote \', escaped backslash \\, tab\t, linefeed\n, and octal \ooo and hex \xhh values. These can appear in any string or byte literal and will be interpreted as their escaped value. I’ll ignore byte literals (literals preceded by a b) in this post. For string literals only, there are three more special escape sequences.

First, \N{name} will escape a character name in the the Unicode database. For example:

print("\N{PILCROW SIGN} \N{LATIN CAPITAL LETTER A WITH CIRCUMFLEX}")
¶ Â

You can also use 16 bit hex values \uxxxx or 32 bit hex values \Uxxxxxx. For example, to get the same values as above:

print("\U000000B6 \u00C2")

Now let’s talk about literal string prefixes. In the grammar you see the following:

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "u" | "R" | "U" | "f" | "F"
                     | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"

What do each of those string prefixes mean? Let’s go through them all.

Raw string literals

Raw string literals are preceded by an r or R, and you can see that it can be combined with the other prefixes of u or f, in any order or combination of case. A raw string literal will treat a backslash \as a literal character instead of an escape sequence, allowing you to solve the above problem of opening a file in python in C:\Users:

pd.read_csv(r'C:\Users\matt\data.csv')

Explicit Unicode string literals

In Python 3.3, PEP 414 adds support for Python 2’s Unicode literal syntax. This was done so that Python 2 code that does support unicode would not need to be altered to run correctly on Python 3. Note that in Python 2, in order to use the Unicode escape sequences as we showed above, you had to prefix a string literal with a u or U. By adding this back into Python 3, code would not need to be changed, but the existence of the prefix is not required in Python 3 for Unicode escapes to function.

Indexing in pandas can be so confusing

There are so many ways to do the same thing! What is the difference between .loc, .iloc, .ix, and []?  You can read the official documentation but there's so much of it and it seems so confusing. You can ask a question on Stack Overflow, but you're just as likely to get too many different and confusing answers as no answer at all. And existing answers don't fit your scenario.

You just need to get started with the basics.

What if you could quickly learn the basics of indexing and selecting data in pandas with clear examples and instructions on why and when you should use each one? What if the examples were all consistent, used realistic data, and included extra relevant background information?

Master the basics of pandas indexing with my free ebook. You'll learn what you need to get comfortable with pandas indexing. Covered topics include:

  • what an index is and why it is needed
  • how to select data in both a Series and DataFrame.
  • the difference between .loc, .iloc, .ix, and [] and when (and if) you should use them.
  • slicing, and how pandas slicing compares to regular Python slicing
  • boolean indexing
  • selecting via callable
  • how to use where and mask.
  • how to use query, and how it can help performance
  • time series indexing

Because it's highly focused, you'll learn the basics of indexing and be able to fall back on this knowledge time and again as you use other features in pandas.

Just give me your email and you'll get the free 57 page e-book, along with helpful articles about Python, pandas, and related technologies once or twice a month. Unsubscribe at any time.

Invalid email address

In Python 2:

>>> print "\u00c3"
\u00c3
>>> print u"\u00c3"
Ã

In Python 3, either works:

>>> print("\u00c3")
Ã

>>> print(u"\u00c3")
Ã

Formatted string literals

Finally, the most exciting addition to string literals in Python 3.6 is the formatted string literal. When a string literal is prefixed by an f or F, replacement fields in the string designated by curly braces {} will be evaluated at run time. This makes for a very powerful and easy-to-read way to create strings.

Compare the following:

>>> name='Matt'
>>> hobby='gardening'
>>> print("My name is %s and my hobby is %s" % (name, hobby))
My name is Matt and my hobby is gardening
>>> print(f"My name is {name} and my hobby is {hobby}")
My name is Matt and my hobby is gardening

Which is more readable? There is a lot more to formatted string literals, you can learn more by reading PEP 498. If your codebase is on Python 3.6 or above, you’ll probably want to make use of this option going forward for string formatting.

Don't miss any articles!

If you like this article, give me your email and I'll send you my latest articles along with other helpful links and tips with a focus on Python, pandas, and related tools.

Invalid email address
I promise not to spam you, and you can unsubscribe at any time.

Have anything to say about this topic?