cat articles/python-typing

Adding type hints to a Python project and getting value from type checking

Recently I started writing Python for a machine learning project I am helping with. I remembered that Python has type hints, tried adding them to the project, and found that they were easy to introduce and immediately useful because I could get the benefits of static type checking.

When I first started writing type hints, though, I was not sure which documents I should read or how I should introduce type hints into a project. This article explains those points and shows one way to start writing types in a Python project.

Which Documents Should You Read?

If you have written another statically typed language before, these two documents should be enough to get started.

  • Understanding Typing
    • Part of the documentation for pyright, a type checker implementation. It summarizes the important points concisely.
  • typing - Support for type hints
    • The official Python documentation. It feels more like a reference manual than an easy tutorial, so I recommend reading it after Understanding Typing.

After reading those two, you should have a rough sense of Python type hints and may want to start writing them yourself. The sample code I used in an internal study session to check type behavior is here. It may not be exemplary code, but it is useful for observing behavior.

vscode-1
vscode-1

How to Introduce Type Hints

When adding type hints to a project, the first thing to choose is the type checker implementation. Python type hints are specified across several PEPs, and there are multiple tools that implement them. Major options include:

  • mypy
  • pytype
    • Google's implementation. It can also type-check Python 2.7.
  • pyre
    • Meta's implementation. It supports Python 3 and later.
  • pyright
    • Microsoft's implementation. It supports only Python 3 and later, and is written for Node.js.

All of these can be used through the Language Server Protocol, although I have not checked pytype myself, so they can be used from LSP-compatible IDEs and editors. If Python 3 or later is enough, my personal recommendation is pyright. It is fast, and GitHub issues and PRs seem to get quick responses, perhaps because Microsoft employees are assigned to them. In VS Code, it is easy to use through the Pylance extension.

One Pylance trap is that type checking is off by default. Really. Pyright's default is basic, so this surprised me. You should change it in VS Code's settings.json. Otherwise, you may think "I installed Pylance, now I can write type hints, and there are no errors, so everything is fine", when in fact type checking is simply not running. That is exactly what happened to me at first.

  // settings.json
  // 現在初期値は "off" となっている🤣なんでや~
  "python.analysis.typeCheckingMode": "basic",
  // workspace 全体に対して型チェック。
  "python.analysis.diagnosticMode": "workspace"
The default is off!?
The default is off!?

Pylance includes pyright, which is open source, but Pylance itself is not open source because it includes other features as well. If you use something other than VS Code, you can use pyright directly instead of Pylance, so the absence of Pylance should not be a serious problem.

Adding pyright to a Project

If you only want to install pyright as a CLI, install it through npm and run the pyright command.

$ npm install --global pyright
$ pyright
No configuration file found.
pyproject.toml file found at C:\Users\hotch\src\github.com\....
Loading pyproject.toml file at C:\Users\hotch\src\github.com\...\pyproject.toml
Assuming Python platform Windows
No include entries specified; assuming C:\Users\hotch\src\github.com\...
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding .git
stubPath C:\Users\hotch\src\github.com\...\typings is not a valid directory.
Searching for source files
Found 62 source files
0 errors, 0 warnings, 0 infos
Completed in 2.591sec

Type errors are detected when they exist, and pyright -w can watch files and keep running. In normal use, however, VS Code or another editor will run type checks through pyright, so there are not many occasions to invoke the command directly.

You can apply project-wide settings by placing either pyproject.toml or pyrightconfig.json in the project root.

# pyproject.toml
# https://github.com/microsoft/pyright/blob/main/docs/configuration.md
[tool.pyright]
pythonVersion = "3.7"
typeCheckingMode = "basic"

Personally, I wanted to pin the pyright version and install it quickly with npm install, so I put a Node package-management package.json in the project.

{
  "name": "pyright-exec",
  "version": "1.0.0",
  "description": "",
  "main": "",
  "scripts": {
    "pyright": "pyright"
  },
  "author": "",
  "license": "",
  "dependencies": {
    "pyright": "^1.1.155"
  }
}

If you want to run it in CI, you can configure GitHub Actions like this example.

      - uses: actions/setup-node@v1
        with:
          node-version: 14.x
      - name: Install node dependencies
        run: npm install
      - name: Typecheck
        run: npm run pyright

At first I wondered whether it was strange to add pyright, which is written for Node.js, to a Python project. But it has zero dependencies, while many tools pull in a lot of packages, so installation is immediate. That also makes it comfortable to use.

Python Versions and Type Hints

Once you start writing type hints in Python, the next thing you may hit is that available typing features differ by Python version. For example, list type hints behave differently depending on the version.

# py 3.9~ 何もしないで builtins のものとして書ける
l: list[str] = []
# py 3.7~は future import でバックポートを実現
from __future__ import annotations
l: list[str] = []
# py 3.5~ は typing モジュールを使う
# なおこの書き方は 3.9 ~非推奨に…
from typing import List
l: List[str] = []

Another commonly used type is TypedDict, which is supported from Python 3.8.

# py3.8~
from typing import TypedDict

If the typing module in the Python version you need does not include a type you want, you can usually install typing_extensions and use its backport.

# py3.7以前の場合。もちろん3.8以降でも動く。
from typing_extensions import TypedDict

Until you get used to it, you need to check the reference documentation to see which Python version supports which typing feature. For a new project without constraints, using the newest possible Python version is best. But the runtime environment may be older. For example, as of July 2021, Google Colab uses Python 3.7. If your code must run on Colab, you need to write it so it works on 3.7. I initially wrote code for 3.8, found that it did not run on Colab, and had to rewrite it for 3.7.

Type Stubs: Providing Types from Separate Files

Next, you may run into a library that does not provide type hints. In that case, you can use the type stub mechanism defined in PEP 561.

In pyright, the default stubPath is ./typings. .pyi type stub files placed under that directory are loaded. To see concrete examples of how to write .pyi, it is useful to look at typeshed, which collects type stubs for the standard library and well-known packages. Typeshed's stubs are included by default in tools such as pyright and mypy, so you normally do not need to install them separately.

pyright also has a pyright --createstub packagename command that generates a type stub template, and Pylance can create stubs through its UI. See pyright's Type Stub Files documentation for details.

Start Writing Type Hints

Writing type hints brings many benefits: better IDE completion, easier refactoring, improved development efficiency, more confidence from static type checks, and fewer runtime errors. Python's type hints can do far less than, for example, TypeScript's type system, and sometimes I wish I could manipulate types more expressively. But because of that limitation, most Python type annotations stay simple. That keeps the learning cost low and makes the types easy for most readers to understand.

The cost of introducing type hints into a project is also low, especially for a new project. If you are unsure whether to write type hints, I think it is worth trying them first.


Other Notes

As of July 2021, I help with machine learning projects as a software engineer at Nikkei Innovation Lab three to four days a week. I am not an employee; I help on a project basis. Most of the information in this article comes from that work, and this post is based on material I presented at an internal Nikkei study session.

Nikkei is an interesting environment for machine learning. It naturally has newspapers and other text suited to natural language processing, and it also has access to various large-scale datasets through the Nikkei electronic edition. If that sounds interesting, take a look at the hiring site. At the time of writing, Nikkei is also recruiting machine learning interns for a summer internship.

cat related_articles/python-typing.yaml

  1. Understanding LangChain Expression Language (LCEL)LCEL is LangChain's recommended way to build chains. This article explains the basic behavior of Runnable, RunnableSequence, RunnableParallel, dict syntax, invoke, and RunnablePassthrough step by step.
  2. OpenAI API-Compatible Access Without Additional API Billing via CodexA Codex-authenticated OpenAI API-compatible server for Responses, Chat Completions, and image generation. Within the Codex subscription scope, it can be used without additional API usage billing.
  3. Making Transformers inference 1.6 to almost 2 times faster with CTranslate2I tried CTranslate2 through hf_hub_ctranslate2 for SentenceTransformer-style embedding inference and found it easy to get about 1.6x faster GPU inference and 1.9x faster CPU inference with almost no accuracy change.