DSA 12: Hash Tables - Collisions [Part 2]

Collisions in Hash Tables explained with a suggestion solution
data structures
algorithms
Author

Tony Phung

Published

January 22, 2025

1. Setup

From previous post: Hash Tables - Part 1

thesaurus_arr = [None]*16 # create empty thesaurus
letters_dict = {chr(i): i - 96 for i in range(97, 97 + 26)}
print(f"letters_dict: \n{letters_dict}")
letters_dict: 
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}
def alphabet_multiplicative_hash_fn(input_word: str):
    # convert some string into hash_value by multiplying all letters value
    hash_value = 1
    for character in input_word:
    # dict_val = dict[char]
        letter_val = letters_dict[character]
        hash_value *= letter_val # cumulative product: char1 * char2 ...
    return hash_value
PretendDictMemoryList = [None]*16
print(f"PretendDictMemoryList: {PretendDictMemoryList}")
PretendDictMemoryList: [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
print(f"Test getting 'd': {letters_dict['d']}") # 4

print(f"Test hash_fn('bad'): {alphabet_multiplicative_hash_fn('bad')}") # 2*1*4 = 8
print(f"Test hash_fn('cab'): {alphabet_multiplicative_hash_fn('cab')}") # 3*1*2 = 6

entry_pairs_list = [("bad","evil"),("cab","taxi"),("aaa","best")]
entry_pairs_list

for pair in entry_pairs_list:
    hash_value = alphabet_multiplicative_hash_fn(pair[0])
    PretendDictMemoryList[hash_value-1]=pair[1]
    # print(hash_value, PretendDictMemoryList)
    
thesaurus_app = PretendDictMemoryList

print(f"\nThesaurus app:\n{thesaurus_app}")
Test getting 'd': 4
Test hash_fn('bad'): 8
Test hash_fn('cab'): 6

Thesaurus app:
['best', None, None, None, None, 'taxi', None, 'evil', None, None, None, None, None, None, None, None]

2. Collisions (same hash values)

2.1 New Entry with Same Hash Value

Notice new entry ["dab","tap"] will result in a hash_value of 8:

print(f"Test hash_fn('dab'): {alphabet_multiplicative_hash_fn('dab')}") # 4*1*2 = 8
Test hash_fn('dab'): 8

2.2 A Collision Has Occured 💥!

Index 8 in memory already exists from a previous entry:

  • ["bad","evil"].

Index 8 will be over-ridden by the new entry:

  • from ["bad","evil"]
  • to ["dab","tap"]

The current approach is sub-optimal.

entry_pairs_list = [("dab","tap")]

for pair in entry_pairs_list:
    hash_value = alphabet_multiplicative_hash_fn(pair[0])
    PretendDictMemoryList[hash_value-1]=pair[1]
    
thesaurus_app = PretendDictMemoryList


print(f"\nThesaurus app:\n{thesaurus_app}")

Thesaurus app:
['best', None, None, None, None, 'taxi', None, 'tap', None, None, None, None, None, None, None, None]

2.3 Updated Solution: Sub-Arrays

Create an array with sub-arrays at the index

For the example: * create an array within index 8, * with each item of the array, * represented by the [input_word,synonym]

2.4 Search Time-Complexity: Previous versus Updated Solution

Previous Solution:

  • PretendDictMemoryList[8] = "taxi"
    • Thus, Search is \(O(1)\)

Updated Solution:

  • PretendDictMemoryList[8] = [["cab","taxi"],["dab","tap"]]
    • Search \(O(1)\) to get value at index (or hash_value)
    • Then, search array \(O(n)\) to find our cab or dab key
    • Thus, Total Search \(O(n+1) \to O(n)\)