A Super Speed-up With Binary Search

Saturday, 22 January 2022, 17:31 PM

Language

unknown

by James (Golang Project Structure Admin)

In any programming language, there are two main algorithms used for finding a given value in an array: linear search and binary search.

(Just to note: in Go, we generally use slices rather than arrays, but I’ll use those two terms interchangeably in this post, just for the sake of simplicity, since everything that I discuss here will apply to either one of the two data structures.)

Inside the machine, arrays are stored within RAM, on circuit boards like the ones shown above. The more memory your computer has installed, the larger the size of the arrays that you’ll be able to create.

I’m going to compare the two algorithms, discussing how they’re implemented, and see why one — in most cases — performs so much better than the other.

Table of Contents

Why Is Linear Search So Slow?

The most obvious way to search for a value in an array is simply to iterate through every index, comparing the value at it to the target and stopping if there’s a match. This is what we mean by linear search, because we can think about it conceptually as though we’re moving through each of the elements in a straight line, from start to finish.

package main

import "fmt"

func main() {
	numbers := [...]int{5, 3, 99, -4, 2, -1, 9, 0, 6}
	const numberToFind = 0

	for i, n := range numbers {
		if n == numberToFind {
			fmt.Printf("Found number %d at index %d.\n", n, i)
			break
		}
	}
}

This is a very simple solution, but it’s not particularly efficient. In the worst-case scenario, we have to search the entire array to find our target. If it is actually located in the array but is at the very end, we’ll have to pass though n elements, where n is the length of the array, in order to discover this. We’d also have to travel the same distance if we found out that our target doesn’t, in fact, exist in the area.

In the best-case scenario, however, we’ll only ever need to look at a single element, since it could be that our target is the very first thing that we look at. It’s important to understand that this is the same for any searching algorithm. We could be lucky and find what we’re looking for straight away. But we shouldn’t rely on luck, because it isn’t always on our side.

The average number of elements that we’ll search to find our item, assuming that it is actually present, can, therefore, be calculated by taking the mean of the best-case and the word-case scenarios: in other words, we can use the simple formula (n+1) / 2. So if we have an array with 596 elements, we can expect to visit 299 elements before finding our target (the actual solution to the calculation is 298.5, but I have rounded it up).

Introducing Binary Search

But what if we don’t actually have to visit each element in the array in order to find the one that we’re actually looking for? So long as our array is already sorted (so the lower numbers come first and the higher numbers later), we can take advantage of this fact to skip large parts of the array that we know will not contain our target. This is what we do in the function below:

package main

import (
	"fmt"
	"sort"
)

func binarySearch(numbers []int, leftBound, rightBound, numberToFind int) int {
	if rightBound >= leftBound {
		midPoint := leftBound + (rightBound-leftBound)/2

		if numbers[midPoint] == numberToFind {
			return midPoint
		}

		if numbers[midPoint] > numberToFind {
			return binarySearch(numbers, leftBound, midPoint-1, numberToFind)
		}

		return binarySearch(numbers, midPoint+1, rightBound, numberToFind)
	}

	return -1
}

func main() {
	numbers := []int{5, 3, 99, -4, 2, -1, 9, 0, 6}
	const numberToFind = 0

	sort.Ints(numbers)
	fmt.Println(numbers)

	if i := binarySearch(numbers, 0, len(numbers)-1, numberToFind); i != -1 {
		fmt.Printf("Found number %d at index %d.\n", numberToFind, i)
	}
}

This algorithm is a little more complicated to get your head around at first, but, once you understand how it works, it’s really not too difficult — and its elegance is impressive.

You can see that we start by passing in inclusive bounds as arguments to the function (in other words, the first and last index). Then we calculate the midpoint between the two bounds. If our target element is found at that index, we’re lucky and we simply return the index straightaway.

However, fortune is fleeting, and most of us won’t be so lucky. In that case, we split the search area into two halves, checking the left side if the value is lower than the one at the midpoint (since it can’t logically be in the right side) and checking the right side in the opposite case.

Old Wine in New Bottles

If you haven’t quite got your head around binary search, try thinking about it in terms of something more familiar and tangible: a telephone directory, one of those big books that you used to have out on your desk whenever you wanted to find the contact details for a local plumber or tree surgeon.

All of the businesses in the telephone directory are stored in alphabetically order, so AT&T, the telecoms conglomerate, would be somewhere near the beginning of the book and Zara, the clothes company, would be listed towards the end.

If you open the book somewhere near the middle, you can glance down at the page and get an idea of what the listings are on that page, then you can decide whether what you’re looking for comes earlier or later alphabetically. If it comes earlier, you can grab the left half of the open book and open it again somewhere in the middle, or if it comes later, you can do the same with the right half.

You can repeat this process as many times as necessary until you find the page that contains the entry you’re searching for. All you need to do is divide the portion of the book that is still remaining to be searched into two equal halves and go to the page in the middle of either the left or right half, continuing to move through the book in this way until you find what you need or discover that it isn’t present.

Reading room in the library at Trinity College Dublin. — This library at Trinity College Dublin contains around 200,000 books, most of them containing hundreds of pages each. Just think how long it would take to search through them, if they weren’t well organized.

Once you think of it like that, it’s easy to see how this approach will be much faster than starting on page one of the book and turning every page in sequential order until you find what you’re looking for. The bigger the book, the more time you’ll save by using this method.

We instinctively use an algorithm that’s very similar to binary search when we use books like telephone directories, dictionaries or thesauruses that contain entries in sorted order. We just know that it would take too much time to do it linearly.

Comparing Cards

The logic used in the binary search algorithm can also be seen as similar to that used to play the American TV gameshow Card Sharks (or the British version Play Your Cards Right). We always start at the midpoint, but then we decide whether we should consider the group above or below the current one depending on whether the we think the result is higher or lower than the one at the midpoint.

In the TV show, which mimics a casino game, the contestant is shown a playing card, say a seven of clubs or a queen of diamonds, and then they have to guess whether a randomly selected card that will be revealed afterwards is going to be higher or lower in value than the current one (the ace ranks highest and the deuce, two, ranks lowest). If they get it right, they continue and are able to boost their score, giving them a chance to win prizes, but if they’re wrong, play is passed over to one of the other contestants.

Logo for the Card Sharks TV show, currently broadcast by ABC in the United States.
(The logo is shown here under the fair-use doctrine of copyright law, and no relationship between this website and any organisation involved with the gameshow exists or is implied.)

Of course, this involves less certainty than our algorithm, because the players have no idea what the next card is going to be: they only know that if they’ve drawn a three of clubs, then the probability of the next card being lower is much greater than that of it being higher, because there are many cards that are higher than a three, but there is only one value, the deuce, that is lower. So any sensible contestant would always shout “lower” at the host, giving themselves the best chance of success in moving on to the next round. It only gets tricky when the current card is somewhere in the middle, because the probabilities are more evenly matched, so the game becomes more of a gamble.

The binary search algorithm doesn’t rely on probability in that way, since we always know that the leftmost values are lower and that the rightmost values are higher, since we can assume that any input has already been sorted, but the process of choosing the next move based on whether we want to consider the higher or lower values is similar.

Less Recursive, More Iterative

Now that we’ve thought more about how the algorithm works, let’s return to our implementation of binary search. I’ve written on this blog before about how any recursive algorithm can be converted into an iterative one. Maybe it would be a good idea to do that with our binary search algorithm, so that we don’t waste resources growing the call stack unnecessarily.

The code below performs exactly the same functionality as the previous implementation of our binary search algorithm, but it simply mutates the value of the leftBound and rightBound variables rather than calling itself again with different values:

func binarySearch(numbers []int, leftBound, rightBound, numberToFind int) int {
	for leftBound <= rightBound {
		midPoint := leftBound + (rightBound-leftBound)/2

		if numbers[midPoint] == numberToFind {
			return midPoint
		}

		if numbers[midPoint] > numberToFind {
			rightBound = midPoint - 1
		} else {
			leftBound = midPoint + 1
		}
	}

	return -1
}

There’s nothing too complicated about that, but it’s a minor tweak that will allow us better to compare our linear and binary search algorithms side-by-side, since they now both use iterative code, rather than one of them relying on recursion, which introduces its own costs.

Big O Notation

If you’ve ever studied a formal computer science course, you make have come across Big O notation, which is a mathematical way of describing how the time a function takes to return grows as the size of its input, known as N, grows.

We don’t need to get into the serious mathematics here (in fact, you can skip this section altogether, if you have no interest in Big O notation), but suffice it to say that linear search can be described as an O(N) algorithm — pronounced “O of N” — whereas binary search is an O(log N) algorithm — pronounced “O of log N”.

This means that the time that linear search takes to complete is exactly proportional to the size of its input, N. However, binary search is proportional to the binary logarithm of N, which means that the running time is proportional to the mathematical power that the number 2 must be raised to in order to obtain the value N. That may be pretty complicated, if you’re not used to thinking with numbers, and I’d understand if my explanation so far hasn’t really helped you to get your head around it. So let’s look at a real world example of a function that runs in O(log N) time:

Running Time	Size of Input (N)
1 nanosecond	2 elements
2 nanoseconds	4 elements
3 nanoseconds	8 elements
4 nanoseconds	16 elements
5 nanoseconds	32 elements
6 nanoseconds	64 elements
8 nanoseconds	256 elements
11 nanoseconds	2,048 elements
15 nanoseconds	32,768 elements
20 nanoseconds	1,048,576 elements
40 nanoseconds	1,099,511,627,776 elements

You can see above that the amount of time taken by the function to return a result grows as the number of input elements grows. However, the former grows significantly less slowly than the latter, which means that an algorithm that has a time complexity of O(log N) is extremely efficient at handling large amounts of input data. This means that binary search really excels when you need to search through large arrays or slices.

Checking Our Assumptions by Running a Benchmark

Scientists have played a more prominent role in our lives throughout the recent COVID pandemic, and many of us have become amateur enthusiasts in interpreting scientific data, as we attempt to judge for ourselves what the best course of action may be to keep ourselves and others safe without incurring unnecessary costs.

The scientific method is all about testing our assumptions by performing experiments, and seeing if we can get results that support or challenge our initial beliefs, so we can gain a better picture of what’s going on with the world. So let’s do just that: let’s run a little experiment of our own.

I’ve told you that binary search is almost always faster than linear search, and I’ve provided some reasonable explanations for why this ought to be the case, but why should you take my word for it? Even if you trust me, it’s a good idea to check that what I say actually holds up in reality.

We can write some simple benchmarks that will allow us to test the speed of our two search algorithms, allowing us to compare and contrast the results. To have a fair test, we will, of course, have to pass exactly the same input to each algorithm. I have written some example code below that includes the two functions we can use to perform the searches:

package main

import (
	"fmt"
	"sort"
)

func linearSearch(numbers []int, numberToFind int) int {
	for i, n := range numbers {
		if n == numberToFind {
			return i
		}
	}

	return -1
}

func binarySearch(numbers []int, leftBound, rightBound, numberToFind int) int {
	for leftBound <= rightBound {
		midPoint := leftBound + (rightBound-leftBound)/2

		if numbers[midPoint] == numberToFind {
			return midPoint
		}

		if numbers[midPoint] > numberToFind {
			rightBound = midPoint - 1
		} else {
			leftBound = midPoint + 1
		}
	}

	return -1
}

func main() {
	numbers := []int{5, 3, 99, -4, 2, -1, 9, 0, 6}
	const numberToFind = 0

	sort.Ints(numbers)
	fmt.Println(numbers)

	if i := linearSearch(numbers, numberToFind); i > -1 {
		fmt.Printf("Found number %d at index %d using linear search.\n", numberToFind, i)
	}

	if i := binarySearch(numbers, 0, len(numbers)-1, numberToFind); i > -1 {
		fmt.Printf("Found number %d at index %d using binary search.\n", numberToFind, i)
	}
}

If the code in the previous example is stored in a file named main.go, you can create a file named main_test.go to store your benchmark code. If you have not already done so, you will also need to run go mod init, in order to create a module in the current directory, before you can run the benchmarks.

The benchmark functions below follow a set format recognized by the Go tool. The b.N variable is automatically updated to hold a reasonable number of iterations, so that the benchmarks don’t end too quickly or go on too long.

package main

import "testing"

var numbers = []int{5, 3, 99, -4, 2, -1, 9, 0, 6}

const numberToFind = 0

func BenchmarkBinarySearch(b *testing.B) {
	for i := 0; i < b.N; i++ {
		binarySearch(numbers, 0, len(numbers)-1, numberToFind)
	}
}

func BenchmarkLinearSearch(b *testing.B) {
	for i := 0; i < b.N; i++ {
		linearSearch(numbers, 0, len(numbers)-1, numberToFind)
	}
}

You can run the code by using the following command:

go test -bench=Search$

The bench option takes a regular expression: if it matches part or all of the name of one of the benchmark functions in any of the test files in the current directory, it will run them. We only have two benchmarks functions in a single file, but the command will match them, because BenchmarkBinarySearch and BenchmarkLinearSearch both contain the word Search. The dollar sign ensures that a match is made only when the word Search comes at the end of the function name, as a suffix.

Running the command should give output something that looks like this (although the actual result will depend on the specification and performance of your computer):

goos: windows
goarch: amd64
pkg: github.com/theTardigrade/golangProjectStructureExampleCode/binarySearch
cpu: Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz
BenchmarkBinarySearch-4         189962356                8.060 ns/op
BenchmarkLinearSearch-4         172470762                7.297 ns/op
PASS
ok      github.com/theTardigrade/golangProjectStructureExampleCode/binarySearch 4.282s

Oh, but we have a problem! It seems like linear search is faster than our binary algorithm, which isn’t at all what we were expecting.

Have I been lying to you, overestimating its performance? Well, binary search is almost always faster than linear search, but there are some important exceptions. For example, when the array contains only a small number of elements, the extra logic performed by a binary search slows us down, because a linear search only has to run through the small array, which is a simpler operation.

Because the slice in our benchmarks only contains nine elements and the same slice is reused every time, we’re not getting a true picture of the performance gains that can be achieved by using binary search on much bigger datasets. Let’s do something to fix that.

Making the Benchmarks Use More Realistic Input

In the code below, we still use the same slice for each one of our tests. However, this time it is initialized with a million values, each one a consecutive number, so that the slice is automatically sorted, a necessary precondition of our binary search algorithm.

The more interesting difference here is that the target number we’re looking for is different on each iteration of the benchmark. We randomly select one of the values to be our target. To make our test fairer, we reseed the pseudo-random-number generator with the loop counter, which ensures that the same target will be set on each iteration of both benchmark functions.

package main

import (
	"math/rand"
	"testing"
)

const benchmarkNumbersLen = 1_000_000

var benchmarkNumbers = make([]int, benchmarkNumbersLen)

func init() {
	for i := 0; i < benchmarkNumbersLen; i++ {
		benchmarkNumbers[i] = i
	}
}

func benchmarkNumberToFind(i int) (numberToFind int) {
	rand.Seed(int64(i))

	index := rand.Intn(benchmarkNumbersLen)
	numberToFind = benchmarkNumbers[index]

	return
}

func BenchmarkBinarySearch(b *testing.B) {
	for i := 0; i < b.N; i++ {
		numberToFind := benchmarkNumberToFind(i)

		binarySearch(benchmarkNumbers, 0, benchmarkNumbersLen-1, numberToFind)
	}
}

func BenchmarkLinearSearch(b *testing.B) {
	for i := 0; i < b.N; i++ {
		numberToFind := benchmarkNumberToFind(i)

		linearSearch(benchmarkNumbers, numberToFind)
	}
}

We now get results that are much more like what we had initially expected:

goos: windows
goarch: amd64
pkg: github.com/theTardigrade/golangProjectStructureExampleCode/binarySearch
cpu: Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz
BenchmarkBinarySearch-4           114207             10550 ns/op
BenchmarkLinearSearch-4             3464            354488 ns/op
PASS
ok      github.com/theTardigrade/golangProjectStructureExampleCode/binarySearch 4.282s

The binary algorithm is over thirty times faster than the linear one, now that we've given it a fairer test. That's an impressive performance gain, and it shows how important choosing the right algorithm is, when you need make your code run as fast as possible.

The cost of buying a new processor that can do thirty times the amount of work in the same amount of time as the one that you currently own would be huge, if it's even possible to buy such a supremely speedy processor at all, but there's very little economic cost involved in refactoring your code and simply swapping out a less efficient algorithm for one that's more efficient.

Golang Project Structure

Tips and tricks for writing and structuring Go code

A Super Speed-up With Binary Search

Language

by James (Golang Project Structure Admin)

Why Is Linear Search So Slow?

Introducing Binary Search

Old Wine in New Bottles

Comparing Cards

Less Recursive, More Iterative

Big O Notation

Checking Our Assumptions by Running a Benchmark

Making the Benchmarks Use More Realistic Input

Related

Tags

Leave a Reply Cancel reply

Why Is Linear Search So Slow?

Introducing Binary Search

Old Wine in New Bottles

Comparing Cards

Less Recursive, More Iterative

Big O Notation

Checking Our Assumptions by Running a Benchmark

Making the Benchmarks Use More Realistic Input

Share this:

Related

Tags

Leave a Reply Cancel reply