How to check only changed in CI

Nicolai Antiferov
2 min readJan 22, 2023

--

When you have a lot of code in the repository, like in monorepo case, it became very important to check in CI only things that changed in PR (pull request). Otherwise, CI will be slow and potentially broke on not your changes, which could ruin DevEx experience.

If you’re using something like super-linter with GitHub Actions, it would be enough to set two parameters in workflow:

- name: Lint Code Base
uses: github/super-linter@v4
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: main # or something else, if default branch is not main

But don’t forget to fetch git repo first completely, so reference to default branch will be pulled. Again, in GH Actions case it would be:

      - name: Checkout Code
uses: actions/checkout@v3
with:
fetch-depth: 0 # to fetch all history

In case if you have some checks using bash, you can use something like git diff your-branch origin/main and work with its output.

However, this article I will describe how to use GitPython library together with Invoke to do any custom checks you need. Check my previous article about using Invoke for automation for details.

GitPython is basically wrapper around local git command. It requires Python >= 3.7 and Git ≥=1.7.0 and can be installed via pip install GitPython . Example based on this.

from git import Repo

@task()
def diff(c):
"""
Show changed things in PR
"""
r = Repo(get_root(c))
for i in r.index.diff("origin/main"):
print(i)

This will show this for 2 changed files:

$ inv diff
requirements.txt
=======================================================
lhs: 100644 | c549371f13e670bd14ca4d4ff6f69074fed205b5
rhs: 100644 | c454aa6f6b8475b97b5a0464ab5e7e457d3f6c58
tasks.py
=======================================================
lhs: 100644 | e842c4395a33753708e2d263c35df07d2a84bcea
rhs: 100644 | 2956caa051dacf04693f3246fe8b88536935f96f

Let’s try to make it look a little bit more fancier. Also, we will add new file test.tt to demonstrate how diff works. Code will change to this:

from git import Repo

@task()
def diff(c):
"""
Show changed things in PR
"""
change = {"A": "Added", "D": "Deleted", "M": "Modified"}

r = Repo(get_root(c))
print("Changed in this branch:")
for i in r.index.diff("origin/main"):
print(f"File: {i.a_path}, change: {change.get(i.change_type, i.change_type)}")

Output

Changed in this branch:
File: requirements.txt, change: Modified
File: tasks.py, change: Modified
File: test.tt, change: Deleted

But how test.tt is showed here as deleted if we added it? Well, this is related to git library implementation, repo.index.diff("origin/main")shows what have been changed from index to origin/main.

To achieve what we want to, we need to set R(reverse) parameter to true, so it will look like this: r.index.diff(“origin/main”, R=True). Output will look as expected:

Changed in this branch:
File: requirements.txt, change: Modified
File: tasks.py, change: Modified
File: test.tt, change: Added

Full code could be found in this PR#9.

Now you can add any checks depending on your code structure and needs.

Like if one file from Terraform state changed, add this state to some validation check. Or if one module changed, add all states that uses this module to tests.

The same with Ansible, Helm charts, etc.

--

--