Tree-sitter 101: A Leaf-to-Root Beginner's Guide

Table of Contents

Welcome to the first installment of our Tree-sitter series! In this series, we will introduce you to the powerful world of the Tree-sitter parsing framework. Our focus, at least initially, is on helping you use existing grammars to build program analysis tools that can operate across various programming languages. We won’t dive into creating new grammars for entirely new programming languages just yet. So, let’s start from the very beginning and get you started with Tree-sitter.

Installing Tree-sitter

The first step in your Tree-sitter journey is to get the Tree-sitter command-line program up and running. Here are a few methods you can choose from:

Native Package Manager

If you are using a platform that has Tree-sitter available in its native package manager, you can use it to install Tree-sitter. For example, on Arch Linux, you can use ‘pacman’:

$ sudo pacman -S tree-sitter
$ tree-sitter --version

Similarly, on macOS, you can use Homebrew:

$ brew install tree-sitter
$ tree-sitter --version

Precompiled Binary

If your platform doesn’t have Tree-sitter in its package manager or the available version is outdated, you can download a precompiled binary from the Tree-sitter releases page on GitHub.

Here’s how you can do it:

$ curl -OL https://github.com/tree-sitter/tree-sitter/releases/download/v0.20.8/tree-sitter-linux-x64.gz
$ mkdir -p $HOME/bin
$ gunzip tree-sitter-linux-x64.gz > $HOME/bin/tree-sitter
$ chmod u+x $HOME/bin/tree-sitter
$ export PATH=$HOME/bin:$PATH
$ tree-sitter --version

Note: The provided release version may be obsolete at the time you are reading this, get the latest from here: -> Tree-sitter releases

NPM Package

If you prefer using NPM, you can install the Tree-sitter command-line program via the ’tree-sitter-cli’ package:

$ npm install tree-sitter-cli
$ npx tree-sitter --version

This option is particularly useful when you’re working on grammar development as it allows you to include Tree-sitter as part of a CI build.

Installing a Grammar

Now that you have the Tree-sitter program installed, it’s time to set up a grammar. By default, Tree-sitter doesn’t install any language grammars because it doesn’t know which languages you want to work with. Let’s install the Tree-sitter Python grammar as an example.

First, you need to generate a configuration file for Tree-sitter. This file tells Tree-sitter where to find language grammars. Run the following command to generate the configuration file:

$ tree-sitter init-config

This will create a configuration file at ~/.tree-sitter/config.json. Open this file in your text editor. In the file, you’ll find a parser-directories section:

{
  "parser-directories": [
    "/home/dcreager/github",
    "/home/dcreager/src",
    "/home/dcreager/source"
  ]
}

You can choose any directories you like to store your grammar definitions. Tree-sitter will look for subdirectories in these locations with names matching the pattern tree-sitter-[language], and it will automatically generate and compile those grammars when needed.

To work with the Tree-sitter Python grammar, clone it into one of the directories listed in the config file. In this example, we’ll use ~/src:

$ mkdir -p ~/src
$ cd ~/src
$ git clone https://github.com/tree-sitter/tree-sitter-python

Parsing Some Code

With the Python grammar installed, you can now parse Python code using the Tree-sitter command-line tool. Let’s test it on an example Python script:

$ cat example.py
import utils

def add_four(x):
    return x + 4

print(add_four(5))

$ tree-sitter parse example.py

You should see a parse tree generated for the Python code. Tree-sitter breaks down the code into a structured tree, making it easier for you to analyze and manipulate the code’s syntax.

Feel free to explore further by parsing example files from other languages. Just clone the corresponding language grammar into the same directory you specified in the configuration file and use the tree-sitter parse command to analyze the code.

That’s it for our beginner’s guide to Tree-sitter. In the next posts of this series, we’ll delve deeper into using Tree-sitter for various programming language analysis tasks. So, stay tuned for more exciting Tree-sitter adventures!


comments powered by Disqus