dima

on software
Latest blog articles:

Go: Delayed file system events handling

Suppose you need to do something when some file system event occurs. For example, restart the web server when files change. Quite a common practice in development: recompile the project and restart it on the fly immediately after editing the source files.

This site is written in Go, and I recently decided to add a similar hot-reload for markdown files with articles: make the web server notice a new file in it's data directories and repopulate the internal in-memory-storage without restarting itself. And I wanted to "listen" to the file system, not scan it every few seconds.

Fortunately, there is already a good listener fsnotify, which can observe given directories. (Not recursively though, but I don't have that many directories.)

The README gives a pretty clear example. I wrapped it in the Watcher() function and added my channel there, which I send something to once an event happen. I don't care about the type of events, so I just always send 1. Also, I wrapped the Watcher() function into another Watch() function, which executes the reload function passed to it at every FS change.

Something like this (error handling and some unimportant stuff intentionally left out):

func main() {
    dirs := []string[
        "/foo",
        "/bar",
    ]
    go Watch(dirs, func() {
        reloadStuff()
    })
}

func Watch(dirs []string, reload func()) {
    ch := make(chan int)
    go watcher(dirs, ch)

    // Execute the reload function on each
    // file system event
    for range ch {
        reload()
    }
}

func Watcher(dirs []string, ch chan int) {
    watcher, _ := fsnotify.NewWatcher()
    defer watcher.Close()

    done := make(chan bool)
    go func() {
        for {
            select {
            case event, _ := <-watcher.Events:
                // Send a notification to the channel
                // on any event from the watcher
                ch <- 1
            case err, _ := <-watcher.Errors:
                log.Println("error:", err)
            }
        }
    }()
    for _, dir := range dirs {
        watcher.Add(dir)
    }
    <-done
}

I tried to run ran some tests and found that more than one event can be triggered when changing a file (e.g. CHMOD and WRITE sequentially). And if multiple files are changed at once (git checkout, rsync, touch *.*), there'll be even more events, and my hot-reload is triggered on each of them.

In fact, I only need to trigger it once, if a lot of events came in a short period of time. That is, accumulate them, wait half a second, and if nothing else came, do the thing.

To my shame, I couldn't come up with a good solution on my own, but I noticed that CompileDaemon, which I use in development to recompile the source code, works exactly as I want. The solution from there is elegant (as elegant as it can be in Go), and it's about using time.After(): it starts a timer and sends the current time to the return channel after a specified interval.

As a result, the Watch() function has become the following:

func Watch(dirs []string, reload func()) {
    ch := make(chan int)
    go watcher(dirs, ch)

    // A function that returns the channel, which receives
    // the current time at the end of the specified time interval.
    createThreshold := func() <-chan time.Time {
        return time.After(time.Duration(500 * time.Millisecond))
    }

    // `threshold := createThreshold()` is also acceptable,
    // if you want to trigger reload() on the first run.
    // I don't need that, so an empty channel is enough.
    threshold := make(<-chan time.Time)
    for {
        select {
        case <-ch:
            // At each event, we simply recreate the threshold
            // so that `case <-threshold` is delayed for another 500ms.
            threshold = createThreshold()
        case <-threshold:
            // If nothing else comes into the `ch` channel within 500ms,
            // trigger the reload() and wait for the next burst of events.
            reload()
        }
    }
}

It worked exactly as I wanted it to, and now I can update all the data files through rsync without restarting the web server within 500ms.

I invented PHP.

Docker Buildkit: the proper usage of --mount=type=cache

TL;DR The contents of directories mounted with --mount=type=cache are not stored in the docker image, so it makes sense to cache intermediate directories, rather than target ones.

In dockerfile:1.3 there is a feature of mounting file system directories during the build process, that can be used for caching downloaded packages or compilation artifacts.

For example, the uwsgi package must be compiled every time it is installed, and at first glance, build times can be reduced by making the entire Python package directory cacheable:

# syntax=docker/dockerfile:1.3
FROM python:3.10

RUN mkdir /pip-packages

RUN --mount=type=cache,target=/pip-packages \
      pip install --target=/pip-packages uwsgi
> docker build -t pip-cache -f Dockerfile.pip .
# ...
[+] Building 14.6s (7/7) FINISHED

Looks like everything went well, but the target directory is empty:

> docker run -it --rm pip-cache ls -l /pip-packages
total 0

Something is definitely wrong. You can see that during the build uWSGI was compiled and installed. You can even check it adding ls in the build process:

RUN --mount=type=cache,target=/pip-packages \
      pip install --target=/pip-packages uwsgi \
      && ls -1 /pip-packages
> docker build -t pip-cache --progress=plain -f Dockerfile.pip .
<...>
#6 12.48 Successfully installed uwsgi-2.0.20
<...>
#6 12.91 __pycache__
#6 12.91 bin
#6 12.91 uWSGI-2.0.20.dist-info
#6 12.91 uwsgidecorators.py
#6 DONE 13.0s
<...>

Everything is in its place. But the final image is empty again:

> docker run -it --rm pip-cache ls -l /pip-packages
total 0

The thing is, the /pip-packages catalog, that is inside the image, and the catalog, that is in RUN --mount=type=cache,target=<dirname>, are completely different. Let's try to put something inside this directory and track its contents during the build process:

RUN mkdir /pip-packages \
    && touch /pip-packages/foo \
    && ls -1 /pip-packages

RUN --mount=type=cache,target=/pip-packages \
    ls -1 /pip-packages \
    && pip install --target=/pip-packages uwsgi \
    && ls -1 /pip-packages

RUN ls -1 /pip-packages
> docker build -t pip-cache --progress=plain -f Dockerfile.pip-track .
<...>
#5 [stage-0 2/4] RUN mkdir /pip-packages
      && touch /pip-packages/foo
      && ls -1 /pip-packages
#5 sha256:fb542<...>
#5 0.211 foo  👈1️⃣
#5 DONE 0.2s

#6 [stage-0 3/4] RUN --mount=type=cache,target=/pip-packages
      ls -1 /pip-packages
      && pip install --target=/pip-packages uwsgi
      && ls -1 /pip-packages
#6 sha256:10ed6<...>
#6 0.292 __pycache__            👈2️⃣
#6 0.292 bin
#6 0.292 uWSGI-2.0.20.dist-info
#6 0.292 uwsgidecorators.py
#6 2.802 Collecting uwsgi       🤔3️⃣
#6 3.189   Downloading uwsgi-2.0.20.tar.gz (804 kB)
#6 4.400 Building wheels for collected packages: uwsgi
<...>
#6 13.34 __pycache__            👈4️⃣
#6 13.34 bin
#6 13.34 uWSGI-2.0.20.dist-info
#6 13.34 uwsgidecorators.py
#6 DONE 13.4s

#7 [stage-0 4/4] RUN ls -1 /pip-packages
#7 sha256:fb6f4<...>
#7 0.227 foo  👈5️⃣
#7 DONE 0.2s
<...>
  • 1️⃣ file foo created successfully
  • 2️⃣ the directory with the results of the previous docker build was mounted, and there's no foo file
  • 3️⃣ uWSGI is downloaded, compiled and installed again
  • 4️⃣ an updated uWSGI package appeared in the catalog
  • 5️⃣ only the file foo is left in the directory

This means that --mount=type=cache only works in the context of a single RUN instruction, replacing the directory created inside the image with RUN mkdir /pip-packages and then reverting it back. Also, caching turned out to be ineffective because pip reinstalled uWSGI with a full compilation.

In this case, it would be correct to cache not the target directory, but /root/.cache, where pip stores all the artifacts:

RUN --mount=type=cache,target=/root/.cache \
    pip install --target=/pip-packages uwsgi
> docker build -t pip-cache -f Dockerfile.pip-right .
> docker run -it --rm pip-cache ls -1 /pip-packages
__pycache__
bin
uWSGI-2.0.20.dist-info
uwsgidecorators.py

Now everything is in place, the installed packages have not disappeared.

Let's check the effectiveness of caching by adding the requests package:

RUN --mount=type=cache,target=/root/.cache \
    pip install --target=/pip-packages uwsgi requests
                                                👆
> docker build -t pip-cache --progress=plain -f Dockerfile.pip-right .
<...>
#6 6.297 Collecting uwsgi
#6 6.297   Using cached uWSGI-<...>.whl  👈
#6 6.561 Collecting requests
#6 6.980   Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
<...>

pip used the pre-built wheel file from /root/.cache and installed a ready-to-use package from it.

All sources are available on GitHub.

Fast commit and push

I want to share a shell-function called gacp (Git Add, Commit and Push), which I came up with a few months ago and have been using about every hour since then:

# fish
function gacp
    git add .
    git commit -m "$argv"
    git push origin HEAD
end
# bash/zsh
function gacp() {
    git add .
    git commit -m "$*"
    git push origin HEAD
}

Usage example:

> gacp add some new crazy stuff
[master fb8dcc9] add some new crazy stuff
 <...>
Enumerating objects: 12, done.
 <...>
To github.com:foo/bar.git
   912c95d..fb9dcc9  master -> master

No more chains with && and quotes for verbose messages!