The Best Golang Task Runner
Language
- unknown
by James (Golang Project Structure Admin)
Okay, I’m biased — it’s my task runner — but hear me out.
This post will start by discussing what a task runner actually is and exploring how it works by building a very basic version of one in Go code. Then we’ll look at some example tasks that can be completed with the help of my own task-runner package.
Table of Contents
What Is a Task Runner?
Computers are great at performing repetitive tasks that would be tedious — or impossible — for humans to do.
A task runner is a simple piece of software that helps to run these tasks at certain intervals (say, once a day or every hundred milliseconds).
This can either be a standalone program that runs quietly in the background — i.e. a daemon — or it can be code that is integrated into a larger codebase in order to integrate task-running functionality into a monolithic application.
How Does a Simple Task Runner Work?
Let’s look at some Go code, to see how an extremely simple task runner could be created:
package main
import (
"fmt"
"sync"
"time"
)
var (
taskFunctions []func()
taskFunctionsMutex sync.Mutex
)
func AddTask(function func()) {
defer taskFunctionsMutex.Unlock()
taskFunctionsMutex.Lock()
taskFunctions = append(taskFunctions, function)
}
func RunTasks() {
defer taskFunctionsMutex.Unlock()
taskFunctionsMutex.Lock()
for _, t := range taskFunctions {
t()
}
}
func init() {
AddTask(func() {
fmt.Println("ONE MINUTE HAS PASSED")
})
go func() {
for {
time.Sleep(time.Minute)
RunTasks()
}
}()
}
func main() {
select {}
}
We have declared the taskFunctions
array to store the handlers that will run whenever we want to complete the tasks. This variable is accessed only when the taskFunctionsMutex
is locked, so that there’s no risk of different goroutines trying to access the same memory at once.
The AddTask
function appends a handler function to the tasksFunctions
array, so that it can later be called automatically.
The job of the RunTasks
function is simply to iterate through each of the handlers in the tasksFunctions
array and call them in turn.
In this example, we’ve just added a single task, within the init
function, which prints out a message to the console. The main work of handling the tasks is done in the anonymous goroutine that is also declared in the init
function: it contains an infinite loop, within which the RunTasks
function is called after waiting for a minute until continuing on to the next iteration of the loop.
This ensures that the tasks will be run roughly once a minute. (Note that there could be much more than a minute between each iteration of the loop if our tasks took a long time to return, but that is very unlikely to apply in this case, since our single task isn’t very processor-intensive).
Finally, we have used an empty select
statement in the main
function. This is simply to keep the program running indefinitely, so that you can use it as a daemon in the background that just performs a repetitive sequence of one or more tasks.
What’s So Great About My Task Runner?
My task runner is a Go package that you can import into your code whenever you need some way to perform useful but repetitive work.
If we use the package, we can easily reproduce the functionality shown in the previous example, without having to write the task-handling code ourselves (note that my package also includes other features to improve performance, efficiency and reliability):
package main
import (
"fmt"
"time"
tasks "github.com/theTardigrade/golang-tasks"
)
func init() {
tasks.Set(time.Minute, false, func(id *tasks.Identifier) {
fmt.Println("ONE MINUTE HAS PASSED")
})
}
func main() {
select {}
}
First of all, look at how I import my package: while the last segment of the GitHub URL is "golang-tasks"
(in order to distinguish it from future task-management packages I may write in other languages), the package name used within the code is simply "tasks"
, which is why I use that single word to access it.
The first argument of the tasks.Set
method is the interval that the task will occur at, in this case once every minute.
The second argument determines whether the task will run as soon as it’s set. So in this case the handler won’t be called until the first minute has passed, but if the second argument had been set to true
, then the handler would have been called immediately and then called again in a minute’s time. In either case, the handler function will be called once every minute after it initially runs.
Finally, the third argument of the tasks.Set
method is the handler function itself, which will get called at the registered interval. The id
argument can be used to modify the operation of the task, but we’ll look at that in more detail when we see some examples later.
We use an empty select
statement in the main
function again to stop the program from exiting. It’s also possible, however, that you would want to run tasks within another long-running monolithic program like a web server. In that case, the server itself would stop the program from exiting and the tasks can simply be performed into the background.
Deleting Temporary Files
Many operating systems have a directory where temporary files can be stored. However, the OS often does not delete them when they’re no longer needed.
Yet there’s no need to worry, because it’s possible to create a simple Go program that will run this task:
package main
import (
"os"
"path/filepath"
"time"
tasks "github.com/theTardigrade/golang-tasks"
)
func RemoveTempFiles() error {
contents, err := filepath.Glob(filepath.Join(os.TempDir(), "*"))
if err != nil {
return err
}
for _, tempFilePath := range contents {
err = os.RemoveAll(tempFilePath)
if err != nil {
return err
}
}
return nil
}
func init() {
tasks.Set(time.Hour, true, func(id *tasks.Identifier) {
if err := RemoveTempFiles(); err != nil {
panic(err)
}
})
}
func main() {
select {}
}
The RemoveTempFiles
function performs the file-removal work and it is simply called from the task handler. Creating a separate function helps to organize our code, but we could equally have performed the work within the task handler itself.
The RemoveTempFiles
function makes use of the filepath.Glob
function from the Go standard library: this returns a slice containing paths to all the files that match the given pattern (using the wildcard rules defined in an early version of Unix). Since we used the asterisk within the default temporary directory, we will match all files (and subdirectories) within that directory.
We can then simply iterate over all of the paths and delete the files using the os.RemoveAll
function, which will remove any children in subdirectories — unlike the os.Remove
function, which can only remove files or empty directories.
In other words, os.RemoveAll
works like the Unix command rm -r
, whereas os.Remove
works like the same command without the recursive option set.
Also note that we have set the second argument of the tasks.Set
function to true
, which means that we want the task to run initially before waiting for an hour to run again. This is useful, because there may be some temporary files already waiting to be removed when the program first starts up.
Checking Whether a Variable Has Been Mutated
The example below counts the numbers of users that have visited a certain website or online application.
But it doesn’t matter exactly what is being counted, since we’re just looking at the general logic of mutating a variable and keeping track of the mutation, and the code could be adapted to a wide range of real-world situations.
package main
import (
"os"
"strconv"
"sync"
"time"
tasks "github.com/theTardigrade/golang-tasks"
)
var (
visitorCount int64
visitorCountMutated = true
visitorCountMutex sync.Mutex
)
func writeVisitorCountToFile() error {
defer visitorCountMutex.Unlock()
visitorCountMutex.Lock()
if visitorCountMutated {
err := os.WriteFile("visitor-count.txt", []byte(strconv.FormatInt(visitorCount, 10)), os.ModePerm)
if err != nil {
return err
}
visitorCountMutated = false
}
return nil
}
func incrementVisitorCount() {
defer visitorCountMutex.Unlock()
visitorCountMutex.Lock()
visitorCount++
visitorCountMutated = true
}
func init() {
tasks.Set(time.Minute*10, true, func(id *tasks.Identifier) {
if err := writeVisitorCountToFile(); err != nil {
panic(err)
}
})
}
func main() {
select {}
}
The visitorCount
variable simply contains an updating total of the number of visitors. The visitorCountMutex
variable locks concurrent access to it, so that only one goroutine at a time can read or modify it.
The visitorCountMutated
boolean variable tells us whether it has been modified since we last checked it; we set it to true initially so that the initial value will be saved to a file, just as the updated value will be whenever it gets mutated.
You can see that the writeVisitorCountToFile
function is set to run once every ten minutes, as well as initially when the program first starts.
If the value of visitorCount
has changed since the writeVisitorCountToFile
function last ran, the number will be encoded as a UTF-8 string and written to a text file.
The incrementVisitorCount
function isn’t called here, but it’s included to show that we would only modify the visitorCount
variable through the use of this function, in order to ensure that the mutex is locked, but also to ensure that the visitorCountMutated
variable gets set to true, if it isn’t already, so that we have a reliable way of knowing when to write the updated value to our file system.
We didn’t do it in the code example above, but we could have also handled the syscall.SIGTERM
signal, ensuring that the function would get called before the program exits, so that the most recent value of visitorCount
should always be written to the text file, even if the program comes to a sudden end.
Pinging a Web Server to Test if It’s Online
If you’re running a website, you want it to be accessible to as many people as possible for as long as possible. In other words, you want to minimize downtime.
One recent estimate suggested that just one minute of downtime at Amazon, the world’s biggest online retailer, could cost the company almost $70,000. That’s a lot of money to lose, so executives and technicians will definitely want to know as soon as possible if the site goes down!
The example below shows how to run a task to check whether a web server can be accessed or not. I’ve set it to check whether this website — golangprojectstructure.com — is currently accessible:
package main
import (
"fmt"
"net"
"net/http"
"time"
tasks "github.com/theTardigrade/golang-tasks"
)
func pingWebServer(domainName string) error {
url := "https://" + domainName
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return err
}
timeout := time.Duration(10 * time.Second)
dialTimeout := func(network, addr string) (net.Conn, error) {
return net.DialTimeout(network, addr, timeout)
}
transport := http.Transport{
Dial: dialTimeout,
}
client := http.Client{
Transport: &transport,
Timeout: timeout,
}
resp, err := client.Do(req)
if err != nil {
return err
}
if resp.StatusCode != 200 {
return fmt.Errorf("unexpected status code [%d]", resp.StatusCode)
}
return nil
}
func init() {
tasks.Set(time.Second*2+time.Millisecond*500, true, func(id *tasks.Identifier) {
if err := pingWebServer("golangprojectstructure.com"); err == nil {
return
}
id.Stop()
go func() {
// do something here to attempt to restore the web server
id.Unstop()
}()
})
}
func main() {
select {}
}
The pingWebServer
function takes a domainName
string that is used to create an URL used in the HTTP request. Most of the code within that function is boilerplate to create a http.Client that will timeout if it doesn’t receive a response within ten seconds.
If the HTTP response status code does not equal 200 (which signifies success), an error is created with the unexpected status code and returned. Otherwise, the function returns nil
.
So the task handler will return early if the site is accessible and the pingWebServer
function returns no error. However, if it doesn’t appear to be accessible, we call the id.Stop
method, which will ensure that task doesn’t run again, since there’s no point checking that the web server is down if we already know that it is.
Then we start a new goroutine in order to try and fix the problem with the server, perhaps running various commands to restart and reset it, or simply sending an automated email to a technician, who can look into the problem. However, I’ve just included a comment where that code would go, since it would require different solutions in different situations.
When the server is fixed and the website is back online again, we then call the id.Unstop
method, which does the opposite of the id.Stop
method — precisely as the name suggests. In other words, the task will no longer be paused, and it will continue to check for downtime at regular intervals.
When we first call the tasks.Set
function, our task is set to run every two and a half seconds (there are 1,000 milliseconds in a second, so adding 500 milliseconds onto 2 seconds gives us 2.5 seconds). The handler also runs initially as soon as the task is set, as seen in the other examples.
Minifying Source Code Whenever It Is Modified
One useful task that computers can perform is converting a file from one format to another or otherwise changing a file’s contents in some way.
If we’re working on a programming project, it can be useful to have a small program running in the background that automatically minifies scripts or stylesheets or markup, doing things such as removing whitespaces and shortening variable names.
In the example below, we are watching a single CSS file. As soon as it gets modified (for example, if we add a new property and save the file), the program will notice that the file has changed and automatically minify it, so we don’t have to worry about calling a minification program manually:
package main
import (
"log"
"os"
"time"
"github.com/tdewolff/minify"
"github.com/tdewolff/minify/css"
tasks "github.com/theTardigrade/golang-tasks"
)
const (
watchedFileName = "main.css"
watchedFileMinifiedName = "main.min.css"
)
var (
minifier *minify.M
watchedFileLastModTime time.Time
)
func WatchFile() error {
fileInfo, err := os.Stat(watchedFileName)
if err != nil {
return err
}
if watchedFileLastModTime.IsZero() {
watchedFileLastModTime = fileInfo.ModTime()
return nil
}
if fileInfo.ModTime().After(watchedFileLastModTime) {
fileContent, err := os.ReadFile(watchedFileName)
if err != nil {
return err
}
fileContent, err = minifier.Bytes("text/css", fileContent)
if err != nil {
return err
}
err = os.WriteFile(watchedFileMinifiedName, fileContent, fileInfo.Mode())
if err != nil {
return err
}
log.Println("WATCHED FILE HAS BEEN MINIFIED")
watchedFileLastModTime = fileInfo.ModTime()
}
return nil
}
func init() {
minifier = minify.New()
minifier.AddFunc("text/css", css.Minify)
tasks.Set(time.Second*2, true, func(id *tasks.Identifier) {
if err := WatchFile(); err != nil {
panic(err)
}
})
}
func main() {
select {}
}
Within the init
function, we first set up the minifier
variable, which relies on an external package, so that it’s ready to handle CSS files. Then we use the tasks.Set
function to register our task handler, which calls the WatchFile
function every two seconds.
The WatchFile
function will also run when the program first starts, at which point the watchedFileLastModTime
variable will be set to its zero value. So we use this opportunity simply to update this variable to hold the time that the file we’re watching ("main.css"
) was last modified.
Whenever we run the WatchFile
function again, we will now be able to use the watchedFileLastModTime
variable to check whether our file has been modified since we last checked. If it hasn’t, we simply return from the function. Otherwise, we know that we will have to minify the updated file, since its contents have changed.
(It’s worth mentioning that it is possible that the file’s modification time has changed without the file’s contents having been altered, but that’s unlikely, so we can simply assume that whenever the time has changed, the contents have also changed. Even if we’re wrong in this assumption, it just means that we’ll minify the file slightly more often than strictly necessary, which won’t pose a major problem.)
We use the os.ReadFile
and os.WriteFile
functions to read our CSS file and save the minified version, since these functions abstract away a lot of the unnecessary complexity in file handling (such as declaring buffers and iterating through a read cycle).
Whenever we minify our file, we have to remember to update the watchedFileLastModTime
with the file’s current modification time, so that we don’t perform the same minification multiple times in a row.
You could even apply the principle shown in the example above to your Go programming: you could write a Go program that watches all of the files within certain directories where you keep your Go code and it could automatically compile the code for you by calling the go build
or go install
commands when it notices that one of the files has been modified.
Clearing Old Data From a Cache
In this last example, we’re going to look at how a task can be set that clears data from a cache after a certain amount of time has expired:
package main
import (
"sync"
"time"
tasks "github.com/theTardigrade/golang-tasks"
)
var (
cache sync.Map
)
const (
cacheSetTimeMax = time.Hour * 8
)
type cacheDatum struct {
SetTime time.Time
Information []byte
}
func storeDatumInCache(key string, information []byte) {
cache.Store(key, &cacheDatum{
SetTime: time.Now(),
Information: information,
})
}
func clearOldDataFromCache() (mutated bool) {
cache.Range(func(key, value interface{}) bool {
if datum, ok := value.(*cacheDatum); ok {
if time.Since(datum.SetTime) > cacheSetTimeMax {
cache.Delete(key)
mutated = true
}
}
return true
})
return
}
func init() {
tasks.Set(time.Hour, false, func(id *tasks.Identifier) {
var nextInterval time.Duration
if mutated := clearOldDataFromCache(); mutated {
nextInterval = time.Minute * 15
} else {
nextInterval = time.Hour
}
if id.Iterval() != nextInterval {
id.ChangeInterval(nextInterval)
}
})
}
func main() {
storeDatumInCache("first-datum", []byte("this is just some example information"))
storeDatumInCache("second-datum", []byte("here is some more example information"))
select {}
}
We use a sync.Map
type as our cache, which is just like a native map, except it can handle concurrent reads and writes.
We also declare a cacheDatum
type, which will hold the data that we store in our cache. The Information
field is just a byte slice, so it could hold arbitrary data of any length. The SetTime
will hold the time that the datum was stored in the cache, so that we can calculate at repeated intervals how long it has been since the datum was initially set.
The cacheSetTimeMax
constant holds the maximum duration that we want any datum to be set for. If it’s been in the cache for longer than this, we will assume that it has expired and remove it, creating space for other data to be stored in its place.
The storeDatumInCache
is used, as the name suggests, actually to set a cacheDatum
struct in the cache. The SetTime
is automatically populated with the current time and date, whereas the content of the Information
field is provided as an argument to the function. There is also a string-typed key
that can be used to identify the datum in the cache, if necessary.
The clearOldDataFromCache function, which is called in the task handler, simply iterates through all of the data that’s been set using the cache.Range method. If the duration between when a datum was initially set and the current time is more than cacheSetTimeMax, we delete that datum from the cache. We also set the return value of our function to true, to show that we’ve mutated the cache by deleting one or more entries.
We always return true in the callback that we pass to the cache.Range
method. If we had returned false
, it would mean that we wanted to stop iterating, but we do not do that, since we want to make sure that we go through all the data, checking whether any of the entries has expired.
Within the handler that we pass to the tasks.Set
function, we modify the interval between calls of the current task by calling the tasks.ChangeInterval
function. Note that we first check tasks.Interval
— which returns the current interval duration — so that we only modify it if necessary.
If one or more items has been removed from the cache, the mutated
variable, which stores the return-value of the clearOldDataFromCache
function, will be true. In that case, we only wait fifteen minutes before calling the handler again. On the other hand, if we haven’t found any data that needed to be removed from the cache, we assume that it’s safer to wait longer, so we set the interval to an hour.
We can see in the main function how we call the storeDatumInFunction
two times, setting some example data in the cache. These will, of course, be removed from the cache, freeing up memory, if we keep the program running for more than eight hours.