Discovering Whether One String Contains Another in Go

Sunday, 22 September 2024, 13:46 PM

Language

unknown

by James Smith (Golang Project Structure Admin)

When working with strings in Go, one common task is checking whether one string contains another.

While this may seem straightforward, Go offers several methods, each with varying levels of complexity depending on the needs of your application.

Whether you’re performing a case-sensitive substring search, looking for the index of a substring or diving into more advanced pattern matching with regular expressions, Go’s standard library provides a robust set of tools to help you achieve what you want.

Five pieces of string arranged in a circular ring. Four of the rings of string each contain another ring of string. — In Go, a string, which is just a collection of characters, can contain another string. This means that all of the characters that make up the smaller string are also found in the same order within the larger string.

In this guide, we’ll explore the most efficient ways to discover whether one string contains a substring, handling everything from simple searches to working with multi-byte Unicode characters.

Whether you’re new to Go or just refining your skills, this blog post will give you the knowledge to handle these string searches with confidence.

Table of Contents

Using the Contains Function From the Standard Library

The most direct and idiomatic way to check if one string contains another in Go is to use the strings.Contains function.

This function is provided by the standard library’s "strings" package, and it is optimized for the task of determining whether a substring exists within a larger string.

Note that, by definition, a substring cannot be found within a string that is smaller than itself. Think about it and you’ll soon understand why that’s the case.

Anyway, here’s a practical example that uses the strings.Contains function:

package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "Welcome to the Go programming language"
    substr := "Go"
    
    if strings.Contains(str, substr) {
        fmt.Println("The substring has been found.")
    } else {
        fmt.Println("The substring could not be found.")
    }
}

In the example above, the strings.Contains function checks whether the string "Go" exists within the larger string "Welcome to the Go programming language" — which, as we can see from a quick glance at the two strings, it does.

The function will return true if it finds the substring and false otherwise.

Since strings.Contains performs the task of scanning through the runes of the string to check for the existence of the substring, it’s a case-sensitive operation.

That means the function will treat "Go" and "go" as different strings. Next we will look at how to perform a case-insensitive search for a substring.

Searching for a Substring Without Case Sensitivity

In many real-world scenarios, case sensitivity can be an issue. For instance, you may want to check if a string contains another string without caring about whether the characters are uppercase or lowercase.

However, the strings.Contains function does not provide a built-in way to perform case-insensitive searches.

Fortunately, Go makes it easy to handle this by providing functions for converting strings to lowercase or uppercase.

To perform a case-insensitive search, you simply need to convert both the main string and the substring to the same case using strings.ToLower or strings.ToUpper before calling the strings.Contains method.

Here’s an example:

package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "Hello, Go programmers!"
    substr := "go"
    
    if strings.Contains(strings.ToLower(str), strings.ToLower(substr)) {
        fmt.Println("The substring was found (with a case-insensitive search).")
    } else {
        fmt.Println("The substring was not found.")
    }
}

In the example above, both the target string and the substring are converted to lowercase using strings.ToLower before passing them into the strings.Contains function. This ensures that the comparison is case-insensitive.

However, we could equally have converted both the target string and the substring to uppercase. It’s important that both strings are in the same case, but it doesn’t matter what case that is.

This technique is effective when you want to perform a search that ignores differences in capitalization, and it is often used in user-input validation where the adoption of letter case may vary widely between different users.

While the solution that we’ve just looked at does introduce an extra step by converting both of the strings to a uniform case, Go’s internal string conversion mechanisms are highly optimized, so this approach remains efficient and practical for most applications.

For extremely long strings or when working with much larger numbers of strings, you may want to measure performance, but, in general, this method strikes a good balance between simplicity and functionality. It gets the job done.

Finding the Position of a Substring Using the Index Function

Sometimes it’s not enough just to know whether a substring exists in a string; you may also want to know where that substring appears.

And, as luck would have it, the Go standard library provides the strings.Index function for precisely this purpose.

Instead of returning a boolean value as the strings.Contains function does, the strings.Index function returns the position — i.e. the index — of the first occurrence of the substring within the string.

If the substring is found, the function returns the index of the first byte of the substring.

Otherwise, it returns -1 to signify that the substring has not been found. A negative number, of course, cannot be a valid index, which is why -1 is used to signify failure.

Here’s an example to illustrate the function’s use:

package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "The quick brown fox jumps over the lazy dog."
    substr := "fox"
    
    index := strings.Index(str, substr)
    
    if index != -1 {
        fmt.Printf("The substring was found at index %d\n", index)
    } else {
        fmt.Println("The substring was not found.")
    }
}

In this example, the substring "fox" can be found within the larger string, and the index of its first appearance is returned.

If you run this code, it will output the position where "fox" starts within the string, which is index 16 (signifying the 17th character in the string, since the first index of any string is always zero).

Using strings.Index can be particularly useful when you need to manipulate or extract part of a string.

For instance, if you want to slice a string at the location where the substring appears, you can use the index returned by strings.Index to guide that operation.

One limitation of the strings.Index function, however, is that it only returns the index of the first occurrence.

If the substring appears multiple times within the string, only the first instance is reported.

If you need to find all occurrences of the substring, then you’ll need to loop through the string or use a more advanced method, which we’ll explore next.

Advanced String Matching with Regular Expressions

In some cases, a simple substring search might not be enough. You may want to search for patterns within a string rather than a specific substring.

For example, you could be interested in finding all email addresses in a piece of text or searching for words that match a particular format.

Regular expressions — which are sometimes known as “regex” — provide a powerful way to perform complex searches in strings, allowing you to match patterns rather than just fixed sequences of characters.

Go’s "regexp" package provides full support for regular expressions.

With regexp.MatchString, you can define a pattern and check if it exists within a string.

Here’s an example:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "My email is contact@example.com and my website is example.com."
    pattern := `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`
    
    matched, _ := regexp.MatchString(pattern, str)
    
    if matched {
        fmt.Println("The string contains a valid email address!")
    } else {
        fmt.Println("No email address found.")
    }
}

In this example, the regular expression pattern is designed to match email addresses.

The regexp.MatchString function returns true if a match is found and false otherwise.

Of course, regular expressions allow for much more complex searches than simple substring checks.

With regex, you can search for patterns involving character classes, repetitions, optional characters and much more.

However, while regular expressions are powerful, they are also computationally more expensive than using the strings.Contains or strings.Index functions.

Therefore, regular expressions should be used only when you truly need their pattern-matching features, since they can introduce additional overhead, especially for large inputs.

Unicode and Multi-Byte Characters in Substring Searches

Go’s strings are always UTF-8 encoded, meaning that each character — or, in Go terminology, each rune — can consist of one or more bytes.

This can become especially important when dealing with non-ASCII characters, such as those from languages that use multi-byte encodings (like Chinese or Japanese).

Go handles these multi-byte characters seamlessly in most string operations, including substring searches.

For example, consider the following string that contains both ASCII and non-ASCII characters:

package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "Hello, 世界"
    substr := "世界"
    
    if strings.Contains(str, substr) {
        fmt.Println("The substring was found.")
    } else {
        fmt.Println("The substring was not found.")
    }
}

In this case, even though "世界" — which is the Chinese word for “world” — consists of multi-byte characters, the strings.Contains function works just as it would with any other ASCII-character string.

Go’s built-in string functions automatically handle the intricacies of UTF-8, so you don’t need to worry about the underlying byte representation when performing basic operations like substring checks.

However, if you need to perform more complex manipulations, such as slicing a string that contains multi-byte characters, you may need to work with Go’s rune type.

A rune represents a single Unicode code point, allowing you to manipulate multi-byte characters safely.

Here’s an example of converting a string to a slice of runes:

package main

import "fmt"

func main() {
    str := "Hello, 世界"
    runes := []rune(str)
    
    fmt.Println(len(str))      // output: 9 (the number of bytes)
    fmt.Println(len(runes))    // output: 7 (the number of runes)
}

In this example, "世界" consists of two Unicode characters, but each character takes up three bytes in UTF-8 encoding.

When you convert the string into a slice of runes, the correct character count is returned, allowing you to manipulate the string without breaking multi-byte characters.

Learning How to Remove a Substring From a String

Now that we’ve looked at the various ways to find a substring within a larger string, I plan to write another blog post in the near future that will discuss some of the ways that substrings can be removed from a string — or replaced with other text.

UPDATE: The next post in this two-part series, which goes on to explore how to extract substrings, is now available to read here.

Golang Project Structure

Tutorials, tips and tricks for writing and structuring code in Go (with additional content for other programming languages)

Discovering Whether One String Contains Another in Go

Language

by James Smith (Golang Project Structure Admin)

Using the Contains Function From the Standard Library

Searching for a Substring Without Case Sensitivity

Finding the Position of a Substring Using the Index Function

Advanced String Matching with Regular Expressions

Unicode and Multi-Byte Characters in Substring Searches

Learning How to Remove a Substring From a String

Related

Tags

Leave a Reply Cancel reply

Using the Contains Function From the Standard Library

Searching for a Substring Without Case Sensitivity

Finding the Position of a Substring Using the Index Function

Advanced String Matching with Regular Expressions

Unicode and Multi-Byte Characters in Substring Searches

Learning How to Remove a Substring From a String

Share this:

Related

Tags

Leave a Reply Cancel reply