Jupytext – Jupyter notebooks as Markdown documents or Python scripts

Jupyter notebooks are a great way to interactively write Python code and include documentation, program output, and data visualization inline with the code that produced it. Many IDEs support Jupyter notebooks natively, and the Jupyter notebook server and JupyterLab environments are effective ways to write notebooks. But under the hood, a Jupyter notebook is just a JSON document, and the content of that document is often not very human readable. Because of this, it can produce messy diffs in your version control system. Jupytext is a Jupyter plugin that automatically saves Jupyter notebooks in a variety of human readable (and editable) outputs. It also allows for changes in these other documents to be synced back to the notebook file (the .ipynb file) itself.

Why would you want to use Jupytext?

There are several good reasons to consider using Jupytext. First, you may be struggling with properly doing version control in your notebooks. My article on version control describes the situation and gives some background and good solutions for this issue, but they may not be perfect for every situation. Using a specialized diff tool like nbdime will make the diffs easier to navigate, but in the end, the single notebook file (i.e. the .ipynb file) contains code, output, and metadata. All of these may change and pollute your diff and make versioning a challenge.

A second reason to consider Jupytext is if you prefer to work outside the standard Jupyter notebook authoring environments. Maybe you are most comfortable writing code in an IDE like PyCharm or Visual Studio Code. Or perhaps you use a text editor like Vim or Emacs and prefer the full power of your favorite editor. Maybe you write and test bits of code in an IPython session and prefer that to a notebook where code cells can easily get run out of order. You also may want to work on notebooks in a terminal (maybe over an SSH connection) where you don’t have a web browser handy.

A third reason is to be able to work more effectively with notebooks and the notebook content, specifically the Python source code. For example, if source is stored in a more common format like a Python file, many tools are available to check code, including linters and formatters/beautifiers.

We’ll look at a few examples of how Jupytext supports these three scenarios.

Installation and Setup

Jupytext is easy to install with pip.

pip install jupytext --upgrade

Or if you’re using anaconda:

conda install jupytext -c conda-forge

You’ll most likely be using the Jupyter Notebook or Lab environment as well. If so, restart your process to pick up the Jupytext extension in the front end.

Basic use with Notebook or Lab

The easiest way to see how Jupytext works is to start with a simple example. In the previous article on notebook version control, we used this notebook as an example. This is just a simple notebook that includes a plot using matplotlib. After you setup a Jupyter notebook (or JupyterLab) environment with matplotlib installed, you can open the notebook in Jupyter notebook (run jupyter notebook). When you do, you should see a Jupytext entry in the File menu. Check the values as shown below to sync your notebook into a Python file:

jupytext options in File menu with Autosave turned off and precent format selected
Jupytext adds menu options, select as shown to follow the example

First, if you want to work mostly in a script or Markdown file (I’ll talk about all the formats in a bit), you should turn off the Jupyter Autosave feature. If you want to mostly work in Jupyter notebook and just check in the script file when you are done, you can leave Autosave enabled.

As soon as the notebook is paired with a script output, the file will be created in the same directory as the notebook. In my case, that means the file jupyter_git_example.py is created. It looks like this:

# ---
# jupyter:
#   jupytext:
#     formats: ipynb,py:percent
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.13.0
#   kernelspec:
#     display_name: Python 3
#     language: python
#     name: python3
# ---

# %%
import matplotlib.pyplot as plt
plt.plot([x**2 for x in range(100)])

# %%

This format is called the percent format, those special comments (# %%) denote cells for the notebook.

Round trip

You should note a few things about this file. Jupytext will try to take the most recent version of either file and use it to generate the other. So, for example, if you update the notebook and then manually save it (since you turned off the Autosave feature), Jupytext will refresh the .py file. The opposite is also true, if you edit the .py file, it will update the matching cells in the notebook. Try it: make a small edit to the .py file in a text editor and save it (change the plot to use 0.5 instead of 2, for example). Then, in the notebook click the Save icon. Jupyter will warn you that the file has changed on disk, and give three options:

  • Cancel – go back to what you were already looking at, but it doesn’t match what is saved on disk.
  • Reload – reloads the notebook with what is saved to disk (which now matches what was in the .py file).
  • Overwrite – will save your notebook over the updated .ipynb file that was just updated by Jupytext.

In this case, you want to Reload from disk. The code in the cell will update to match your edits. However, you need to know that it doesn’t execute that cell. The output will still reflect x**2 instead of x**0.5. Also, your running Python session doesn’t update any variables since that code hasn’t been executed. You can re-execute the cell to pick up the changes in your running instance. This example above might seem confusing, but I think it demonstrates very effectively how to think about Jupytext usage scenarios.

Let’s consider the three usage scenarios in more detail.

Indexing in pandas can be so confusing

There are so many ways to do the same thing! What is the difference between .loc, .iloc, .ix, and []?  You can read the official documentation but there's so much of it and it seems so confusing. You can ask a question on Stack Overflow, but you're just as likely to get too many different and confusing answers as no answer at all. And existing answers don't fit your scenario.

You just need to get started with the basics.

What if you could quickly learn the basics of indexing and selecting data in pandas with clear examples and instructions on why and when you should use each one? What if the examples were all consistent, used realistic data, and included extra relevant background information?

Master the basics of pandas indexing with my free ebook. You'll learn what you need to get comfortable with pandas indexing. Covered topics include:

  • what an index is and why it is needed
  • how to select data in both a Series and DataFrame.
  • the difference between .loc, .iloc, .ix, and [] and when (and if) you should use them.
  • slicing, and how pandas slicing compares to regular Python slicing
  • boolean indexing
  • selecting via callable
  • how to use where and mask.
  • how to use query, and how it can help performance
  • time series indexing

Because it's highly focused, you'll learn the basics of indexing and be able to fall back on this knowledge time and again as you use other features in pandas.

Just give me your email and you'll get the free 57 page e-book, along with helpful articles about Python, pandas, and related technologies once or twice a month. Unsubscribe at any time.

Invalid email address

Version control

First, if you are looking for an effective option for notebook version control, you can simply install Jupytext, pair it with the output format you want to use, and check in the generated file with each commitable change. You’ll get clean diffs for history tracking.

In more complicated scenarios like branching and merging, you can easily do the merge of the generated script or Markdown first, then regenerate the output notebook using Jupytext. Jupytext includes a command line utility, so updating files outside a notebook environment is easy.

jupytext --to notebook notebook.py  # generates notebook.ipynb from notebook.py, using comment markers

I’ll emphasize here that when you regenerate the .ipynb file, it will not contain any outputs. You still have to decide whether you want to check in the notebook file with outputs. If you do, you need to re-execute the notebook (for example, by using Jupyter notebook, or jupytext --execute, or papermill) before committing to version control.

Coding with other tools

The second reason to prefer to use Jupytext is to do coding and editing in an IDE or text editor. In this case, your script or Markdown file will be the primary file you work with, and the notebook can just be automatically or manually generated and executed as needed. Using this approach you get all the benefits of clean diffs, and if you prefer using your IDE or are more comfortable in a Markdown environment, you can still use the notebook format for distributing results to others. It’s the best of both worlds.

Code quality tools

The third area where using Jupytext excels is in automating code checks and other QA tools. Since you can convert notebooks into regular Python code, you can automatically run linters/validators like pylint, flake8, or black. If Python code hides in a notebook file, it is harder to verify that it meets your organization’s coding standards.

Jupytext’s documentation also describes integration with common pre-commit hooks using the pre-commit framework. You can ensure that every time notebook code is committed to git, it will be verified.

Jupytext supports a lot of formats, not just Markdown

The example above synced the notebook file to a Python source file, but there are many other format options.

There are multiple markdown formats supported:

  • Jupytext Markdown – a simple Markdown format
  • R Markdown – the format in RStudio
  • MyST – Markedly Structured Text
  • Pandoc Markdown – for use with Pandoc, the universal file converter. It can also convert notebooks (like the one I used to write this article!).
  • Quarto – a scientific publishing system based on Pandoc

Jupytext also supports multiple types of script output, and multiple languages, not just Python. This allows for regular code files to generate notebooks. Jupytext parses special comments as instructions and then will generate separate notebook cells with metadata as specified in the script. There are pros and cons to using each format, and most of them support a full round trip conversion, as we discussed. Jupytext understands the following script formats:

  • light – a format created for the Jupytext project, cell start and end markers are # + and # -
  • nomarker – a version of light, but with not markers at all. This format can’t be roundtripped.
  • percent – markers are put in code, with this format: # %% Optional title [cell type] key="value"
  • hydrogen – very similar to percent, but it doesn’t comment out Jupyter magics

Possible issues

One of the main issues with adding Jupytext to your configuration is just having one more piece of complexity. If you want to check in and version control completed notebooks with output, you now need to commit two files, not one. This may not may not be worth it for you, depending on your environment.

The other issue is that Jupytext is supported from the command line and the offical Jupyter authoring tools, but not fully supported by all other IDEs, so if you’re using a different tool, you’ll have to be comfortable with doing the conversions on the command line. In almost all cases, I would say it’s worth learning how to do that if you plan on doing more work in Jupyter.

Last, as always, you need to be rigorous about ensuring your notebook output cells match the code that generated them. The best way to guarantee this is to execute the entire notebook after a kernel restart every time you update it and before committing it. You could automate this regeneration step, but really long running notebooks might make this seem onerous. Just be aware that Jupytext could update the notebook file and you may not have realized it.

Jupytext is a nice plugin that will be really useful for those who prefer working in Markdown or regular source files, and for those who practice using code validation tools.

python

Have anything to say about this topic?