LeetCode Group Anagrams Solution

Continuing some of our algorithm solutions, let’s review a new LeetCode challenge: Group Anagrams Solution

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

Example 1:

Input: strs = [“eat”,”tea”,”tan”,”ate”,”nat”,”bat”]
Output: [[“bat”],[“nat”,”tan”],[“ate”,”eat”,”tea”]]
Explanation:

There is no string in strs that can be rearranged to form "bat".

The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.

The strings "ate", "eat", and "tea" are anagrams as they can be rearranged to form each other.

Example 2:

Input: strs = [“”]
Output: [[“”]]
Example 3:

Input: strs = [“a”]
Output: [[“a”]]
Constraints:

1 <= strs.length <= 10⁴

0 <= strs[i].length <= 100

strs[i] consists of lowercase English letters.

We all faced the “is a valid anagram” where given two strings, we should return if they form an anagram or not. It’s pretty easy, right? But what about grouping all anagrams in a list/array of words? A little more complex. And what about doing it efficiently?

There are several ways of doing this. Perhaps one simple solution would be to sort the characters of each word in the array of words, then we would only have to count repetitions, right? Well, yeah, that’s one way of doing it. Straightforward.

However, sorting each word in the array would cause our solution to be at least O(n * m log m). n being the word count and m the max length of any given word. That’s because for each word we need to sort its characters, and the best sorting algorithms are O(m log m).

Can we do better?

Yes, we can. We’ll make use of one of our best friends: A dictionary (Hash Map).

But how? Well, we need to understand what an anagram is.

What are the requirements for an anagram? Well, two words need to have the same characters in the same amount. So “eat” and “tea” are valid anagrams because each character is present only once. But “bat” and “bath” are not. So we need to come up with a hash based on the character count of each word

def get_string_hash(string: str) -> str:
    letterCount = [0] * 26
  
    for c in string:
        intval = ord(c)
        letterCount[intval - ord('a')] += 1
  
    return "".join(
        [str(val) for val in letterCount]
    )

This simple hash function creates a list with 26 items, corresponding to the 26 characters in the alphabet, and then, taking the ASCII value of each character in the string parameter, we map it to the list and start counting. Finally, we convert the list to a string and return it.

After that, we only need to create another dictionary (hash map), and for each word in the input list, we generate its hash and search of it in our anagrams list. If there is not, we put it and assign an empty to a list, this new list will contain all the words corresponding to that hash. And then, we append the word to that hash key.

def groupAnagrams(strs: List[str]) -> List[List[str]]:
    anagrams = {}

    for word in strs:
        word_hash = get_string_hash(word)
        if not anagrams.get(word_hash):
            anagrams[word_hash] = []
        anagrams[word_hash].append(word)

    return list(anagrams.values())

And we are done! We just need a little extra space, and we can get a more efficient algorithm.

The complete solution:

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        anagrams = {}

        for word in strs:
            word_hash = self.get_string_hash(word)
            if not anagrams.get(word_hash):
                anagrams[word_hash] = []
            anagrams[word_hash].append(word)

        return list(anagrams.values())
        
    def get_string_hash(self, string: str) -> str:
        letterCount = [0] * 26

        for c in string:
            intval = ord(c)
            letterCount[intval - ord('a')] += 1

        return "".join(
            [str(val) for val in letterCount]
        )