Golang Project Structure

Tutorials, tips and tricks for writing and structuring code in Go (with additional content for other programming languages)

The Best Golang Task Runner

Language

  • unknown

by

Okay, I’m biased — it’s my task runner — but hear me out.

This post will start by discussing what a task runner actually is and exploring how it works by building a very basic version of one in Go code. Then we’ll look at some example tasks that can be completed with the help of my own task-runner package.

What Is a Task Runner?

Computers are great at performing repetitive tasks that would be tedious — or impossible — for humans to do.

A task runner is a simple piece of software that helps to run these tasks at certain intervals (say, once a day or every hundred milliseconds).

This can either be a standalone program that runs quietly in the background — i.e. a daemon — or it can be code that is integrated into a larger codebase in order to integrate task-running functionality into a monolithic application.

How Does a Simple Task Runner Work?

Let’s look at some Go code, to see how an extremely simple task runner could be created:

package main

import (
	"fmt"
	"sync"
	"time"
)

var (
	taskFunctions      []func()
	taskFunctionsMutex sync.Mutex
)

func AddTask(function func()) {
	defer taskFunctionsMutex.Unlock()
	taskFunctionsMutex.Lock()

	taskFunctions = append(taskFunctions, function)
}

func RunTasks() {
	defer taskFunctionsMutex.Unlock()
	taskFunctionsMutex.Lock()

	for _, t := range taskFunctions {
		t()
	}
}

func init() {
	AddTask(func() {
		fmt.Println("ONE MINUTE HAS PASSED")
	})

	go func() {
		for {
			time.Sleep(time.Minute)
			RunTasks()
		}
	}()
}

func main() {
	select {}
}

We have declared the taskFunctions array to store the handlers that will run whenever we want to complete the tasks. This variable is accessed only when the taskFunctionsMutex is locked, so that there’s no risk of different goroutines trying to access the same memory at once.

The AddTask function appends a handler function to the tasksFunctions array, so that it can later be called automatically.

The job of the RunTasks function is simply to iterate through each of the handlers in the tasksFunctions array and call them in turn.

In this example, we’ve just added a single task, within the init function, which prints out a message to the console. The main work of handling the tasks is done in the anonymous goroutine that is also declared in the init function: it contains an infinite loop, within which the RunTasks function is called after waiting for a minute until continuing on to the next iteration of the loop.

This ensures that the tasks will be run roughly once a minute. (Note that there could be much more than a minute between each iteration of the loop if our tasks took a long time to return, but that is very unlikely to apply in this case, since our single task isn’t very processor-intensive).

Finally, we have used an empty select statement in the main function. This is simply to keep the program running indefinitely, so that you can use it as a daemon in the background that just performs a repetitive sequence of one or more tasks.

What’s So Great About My Task Runner?

My task runner is a Go package that you can import into your code whenever you need some way to perform useful but repetitive work.

If we use the package, we can easily reproduce the functionality shown in the previous example, without having to write the task-handling code ourselves (note that my package also includes other features to improve performance, efficiency and reliability):

package main

import (
	"fmt"
	"time"

	tasks "github.com/theTardigrade/golang-tasks"
)

func init() {
	tasks.Set(time.Minute, false, func(id *tasks.Identifier) {
		fmt.Println("ONE MINUTE HAS PASSED")
	})
}

func main() {
	select {}
}

First of all, look at how I import my package: while the last segment of the GitHub URL is "golang-tasks" (in order to distinguish it from future task-management packages I may write in other languages), the package name used within the code is simply "tasks", which is why I use that single word to access it.

The first argument of the tasks.Set method is the interval that the task will occur at, in this case once every minute.

The second argument determines whether the task will run as soon as it’s set. So in this case the handler won’t be called until the first minute has passed, but if the second argument had been set to true, then the handler would have been called immediately and then called again in a minute’s time. In either case, the handler function will be called once every minute after it initially runs.

Finally, the third argument of the tasks.Set method is the handler function itself, which will get called at the registered interval. The id argument can be used to modify the operation of the task, but we’ll look at that in more detail when we see some examples later.

We use an empty select statement in the main function again to stop the program from exiting. It’s also possible, however, that you would want to run tasks within another long-running monolithic program like a web server. In that case, the server itself would stop the program from exiting and the tasks can simply be performed into the background.

Deleting Temporary Files

Many operating systems have a directory where temporary files can be stored. However, the OS often does not delete them when they’re no longer needed.

The head of a vacuum cleaner on a wooden floor, used to symbolize the importance of performing regular tasks.
Removing old files from a computer’s hard drive is like cleaning a house. It can be a tedious task, but it should be performed regularly.

Yet there’s no need to worry, because it’s possible to create a simple Go program that will run this task:

package main

import (
	"os"
	"path/filepath"
	"time"

	tasks "github.com/theTardigrade/golang-tasks"
)

func RemoveTempFiles() error {
	contents, err := filepath.Glob(filepath.Join(os.TempDir(), "*"))
	if err != nil {
		return err
	}

	for _, tempFilePath := range contents {
		err = os.RemoveAll(tempFilePath)
		if err != nil {
			return err
		}
	}

	return nil
}

func init() {
	tasks.Set(time.Hour, true, func(id *tasks.Identifier) {
		if err := RemoveTempFiles(); err != nil {
			panic(err)
		}
	})
}

func main() {
	select {}
}

The RemoveTempFiles function performs the file-removal work and it is simply called from the task handler. Creating a separate function helps to organize our code, but we could equally have performed the work within the task handler itself.

The RemoveTempFiles function makes use of the filepath.Glob function from the Go standard library: this returns a slice containing paths to all the files that match the given pattern (using the wildcard rules defined in an early version of Unix). Since we used the asterisk within the default temporary directory, we will match all files (and subdirectories) within that directory.

We can then simply iterate over all of the paths and delete the files using the os.RemoveAll function, which will remove any children in subdirectories — unlike the os.Remove function, which can only remove files or empty directories.

In other words, os.RemoveAll works like the Unix command rm -r, whereas os.Remove works like the same command without the recursive option set.

Also note that we have set the second argument of the tasks.Set function to true, which means that we want the task to run initially before waiting for an hour to run again. This is useful, because there may be some temporary files already waiting to be removed when the program first starts up.

Checking Whether a Variable Has Been Mutated

The example below counts the numbers of users that have visited a certain website or online application.

But it doesn’t matter exactly what is being counted, since we’re just looking at the general logic of mutating a variable and keeping track of the mutation, and the code could be adapted to a wide range of real-world situations.

package main

import (
	"os"
	"strconv"
	"sync"
	"time"

	tasks "github.com/theTardigrade/golang-tasks"
)

var (
	visitorCount        int64
	visitorCountMutated = true
	visitorCountMutex   sync.Mutex
)

func writeVisitorCountToFile() error {
	defer visitorCountMutex.Unlock()
	visitorCountMutex.Lock()

	if visitorCountMutated {
		err := os.WriteFile("visitor-count.txt", []byte(strconv.FormatInt(visitorCount, 10)), os.ModePerm)
		if err != nil {
			return err
		}

		visitorCountMutated = false
	}

	return nil
}

func incrementVisitorCount() {
	defer visitorCountMutex.Unlock()
	visitorCountMutex.Lock()

	visitorCount++
	visitorCountMutated = true
}

func init() {
	tasks.Set(time.Minute*10, true, func(id *tasks.Identifier) {
		if err := writeVisitorCountToFile(); err != nil {
			panic(err)
		}
	})
}

func main() {
	select {}
}

The visitorCount variable simply contains an updating total of the number of visitors. The visitorCountMutex variable locks concurrent access to it, so that only one goroutine at a time can read or modify it.

The visitorCountMutated boolean variable tells us whether it has been modified since we last checked it; we set it to true initially so that the initial value will be saved to a file, just as the updated value will be whenever it gets mutated.

You can see that the writeVisitorCountToFile function is set to run once every ten minutes, as well as initially when the program first starts.

If the value of visitorCount has changed since the writeVisitorCountToFile function last ran, the number will be encoded as a UTF-8 string and written to a text file.

The incrementVisitorCount function isn’t called here, but it’s included to show that we would only modify the visitorCount variable through the use of this function, in order to ensure that the mutex is locked, but also to ensure that the visitorCountMutated variable gets set to true, if it isn’t already, so that we have a reliable way of knowing when to write the updated value to our file system.

We didn’t do it in the code example above, but we could have also handled the syscall.SIGTERM signal, ensuring that the function would get called before the program exits, so that the most recent value of visitorCount should always be written to the text file, even if the program comes to a sudden end.

Pinging a Web Server to Test if It’s Online

If you’re running a website, you want it to be accessible to as many people as possible for as long as possible. In other words, you want to minimize downtime.

One recent estimate suggested that just one minute of downtime at Amazon, the world’s biggest online retailer, could cost the company almost $70,000. That’s a lot of money to lose, so executives and technicians will definitely want to know as soon as possible if the site goes down!

The example below shows how to run a task to check whether a web server can be accessed or not. I’ve set it to check whether this website — golangprojectstructure.com — is currently accessible:

package main

import (
	"fmt"
	"net"
	"net/http"
	"time"

	tasks "github.com/theTardigrade/golang-tasks"
)

func pingWebServer(domainName string) error {
	url := "https://" + domainName

	req, err := http.NewRequest("GET", url, nil)
	if err != nil {
		return err
	}

	timeout := time.Duration(10 * time.Second)

	dialTimeout := func(network, addr string) (net.Conn, error) {
		return net.DialTimeout(network, addr, timeout)
	}

	transport := http.Transport{
		Dial: dialTimeout,
	}

	client := http.Client{
		Transport: &transport,
		Timeout:   timeout,
	}

	resp, err := client.Do(req)
	if err != nil {
		return err
	}

	if resp.StatusCode != 200 {
		return fmt.Errorf("unexpected status code [%d]", resp.StatusCode)
	}

	return nil
}

func init() {
	tasks.Set(time.Second*2+time.Millisecond*500, true, func(id *tasks.Identifier) {
		if err := pingWebServer("golangprojectstructure.com"); err == nil {
			return
		}

		id.Stop()

		go func() {
			// do something here to attempt to restore the web server

			id.Unstop()
		}()
	})
}

func main() {
	select {}
}

The pingWebServer function takes a domainName string that is used to create an URL used in the HTTP request. Most of the code within that function is boilerplate to create a http.Client that will timeout if it doesn’t receive a response within ten seconds.

If the HTTP response status code does not equal 200 (which signifies success), an error is created with the unexpected status code and returned. Otherwise, the function returns nil.

So the task handler will return early if the site is accessible and the pingWebServer function returns no error. However, if it doesn’t appear to be accessible, we call the id.Stop method, which will ensure that task doesn’t run again, since there’s no point checking that the web server is down if we already know that it is.

Then we start a new goroutine in order to try and fix the problem with the server, perhaps running various commands to restart and reset it, or simply sending an automated email to a technician, who can look into the problem. However, I’ve just included a comment where that code would go, since it would require different solutions in different situations.

When the server is fixed and the website is back online again, we then call the id.Unstop method, which does the opposite of the id.Stop method — precisely as the name suggests. In other words, the task will no longer be paused, and it will continue to check for downtime at regular intervals.

When we first call the tasks.Set function, our task is set to run every two and a half seconds (there are 1,000 milliseconds in a second, so adding 500 milliseconds onto 2 seconds gives us 2.5 seconds). The handler also runs initially as soon as the task is set, as seen in the other examples.

Minifying Source Code Whenever It Is Modified

One useful task that computers can perform is converting a file from one format to another or otherwise changing a file’s contents in some way.

If we’re working on a programming project, it can be useful to have a small program running in the background that automatically minifies scripts or stylesheets or markup, doing things such as removing whitespaces and shortening variable names.

In the example below, we are watching a single CSS file. As soon as it gets modified (for example, if we add a new property and save the file), the program will notice that the file has changed and automatically minify it, so we don’t have to worry about calling a minification program manually:

package main

import (
	"log"
	"os"
	"time"

	"github.com/tdewolff/minify"
	"github.com/tdewolff/minify/css"
	tasks "github.com/theTardigrade/golang-tasks"
)

const (
	watchedFileName         = "main.css"
	watchedFileMinifiedName = "main.min.css"
)

var (
	minifier                *minify.M
	watchedFileLastModTime  time.Time
)

func WatchFile() error {
	fileInfo, err := os.Stat(watchedFileName)
	if err != nil {
		return err
	}

	if watchedFileLastModTime.IsZero() {
		watchedFileLastModTime = fileInfo.ModTime()
		return nil
	}

	if fileInfo.ModTime().After(watchedFileLastModTime) {
		fileContent, err := os.ReadFile(watchedFileName)
		if err != nil {
			return err
		}

		fileContent, err = minifier.Bytes("text/css", fileContent)
		if err != nil {
			return err
		}

		err = os.WriteFile(watchedFileMinifiedName, fileContent, fileInfo.Mode())
		if err != nil {
			return err
		}

		log.Println("WATCHED FILE HAS BEEN MINIFIED")

		watchedFileLastModTime = fileInfo.ModTime()
	}

	return nil
}

func init() {
	minifier = minify.New()
	minifier.AddFunc("text/css", css.Minify)

	tasks.Set(time.Second*2, true, func(id *tasks.Identifier) {
		if err := WatchFile(); err != nil {
			panic(err)
		}
	})
}

func main() {
	select {}
}

Within the init function, we first set up the minifier variable, which relies on an external package, so that it’s ready to handle CSS files. Then we use the tasks.Set function to register our task handler, which calls the WatchFile function every two seconds.

The WatchFile function will also run when the program first starts, at which point the watchedFileLastModTime variable will be set to its zero value. So we use this opportunity simply to update this variable to hold the time that the file we’re watching ("main.css") was last modified.

Whenever we run the WatchFile function again, we will now be able to use the watchedFileLastModTime variable to check whether our file has been modified since we last checked. If it hasn’t, we simply return from the function. Otherwise, we know that we will have to minify the updated file, since its contents have changed.

(It’s worth mentioning that it is possible that the file’s modification time has changed without the file’s contents having been altered, but that’s unlikely, so we can simply assume that whenever the time has changed, the contents have also changed. Even if we’re wrong in this assumption, it just means that we’ll minify the file slightly more often than strictly necessary, which won’t pose a major problem.)

We use the os.ReadFile and os.WriteFile functions to read our CSS file and save the minified version, since these functions abstract away a lot of the unnecessary complexity in file handling (such as declaring buffers and iterating through a read cycle).

Whenever we minify our file, we have to remember to update the watchedFileLastModTime with the file’s current modification time, so that we don’t perform the same minification multiple times in a row.

You could even apply the principle shown in the example above to your Go programming: you could write a Go program that watches all of the files within certain directories where you keep your Go code and it could automatically compile the code for you by calling the go build or go install commands when it notices that one of the files has been modified.

Clearing Old Data From a Cache

In this last example, we’re going to look at how a task can be set that clears data from a cache after a certain amount of time has expired:

package main

import (
	"sync"
	"time"

	tasks "github.com/theTardigrade/golang-tasks"
)

var (
	cache sync.Map
)

const (
	cacheSetTimeMax = time.Hour * 8
)

type cacheDatum struct {
	SetTime     time.Time
	Information []byte
}

func storeDatumInCache(key string, information []byte) {
	cache.Store(key, &cacheDatum{
		SetTime:     time.Now(),
		Information: information,
	})
}

func clearOldDataFromCache() (mutated bool) {
	cache.Range(func(key, value interface{}) bool {
		if datum, ok := value.(*cacheDatum); ok {
			if time.Since(datum.SetTime) > cacheSetTimeMax {
				cache.Delete(key)
				mutated = true
			}
		}

		return true
	})

	return
}

func init() {
	tasks.Set(time.Hour, false, func(id *tasks.Identifier) {
		var nextInterval time.Duration

		if mutated := clearOldDataFromCache(); mutated {
			nextInterval = time.Minute * 15
		} else {
			nextInterval = time.Hour
		}

		if id.Iterval() != nextInterval {
			id.ChangeInterval(nextInterval)
		}
	})
}

func main() {
	storeDatumInCache("first-datum", []byte("this is just some example information"))
	storeDatumInCache("second-datum", []byte("here is some more example information"))

	select {}
}

We use a sync.Map type as our cache, which is just like a native map, except it can handle concurrent reads and writes.

We also declare a cacheDatum type, which will hold the data that we store in our cache. The Information field is just a byte slice, so it could hold arbitrary data of any length. The SetTime will hold the time that the datum was stored in the cache, so that we can calculate at repeated intervals how long it has been since the datum was initially set.

The cacheSetTimeMax constant holds the maximum duration that we want any datum to be set for. If it’s been in the cache for longer than this, we will assume that it has expired and remove it, creating space for other data to be stored in its place.

The storeDatumInCache is used, as the name suggests, actually to set a cacheDatum struct in the cache. The SetTime is automatically populated with the current time and date, whereas the content of the Information field is provided as an argument to the function. There is also a string-typed key that can be used to identify the datum in the cache, if necessary.

The clearOldDataFromCache function, which is called in the task handler, simply iterates through all of the data that’s been set using the cache.Range method. If the duration between when a datum was initially set and the current time is more than cacheSetTimeMax, we delete that datum from the cache. We also set the return value of our function to true, to show that we’ve mutated the cache by deleting one or more entries.

We always return true in the callback that we pass to the cache.Range method. If we had returned false, it would mean that we wanted to stop iterating, but we do not do that, since we want to make sure that we go through all the data, checking whether any of the entries has expired.

Within the handler that we pass to the tasks.Set function, we modify the interval between calls of the current task by calling the tasks.ChangeInterval function. Note that we first check tasks.Interval — which returns the current interval duration — so that we only modify it if necessary.

If one or more items has been removed from the cache, the mutated variable, which stores the return-value of the clearOldDataFromCache function, will be true. In that case, we only wait fifteen minutes before calling the handler again. On the other hand, if we haven’t found any data that needed to be removed from the cache, we assume that it’s safer to wait longer, so we set the interval to an hour.

We can see in the main function how we call the storeDatumInFunction two times, setting some example data in the cache. These will, of course, be removed from the cache, freeing up memory, if we keep the program running for more than eight hours.

Leave a Reply

Your email address will not be published. Required fields are marked *