DSA 11: Hash Tables - Hash Functions [Part 1]

Create hash tables, hash functions and replicate how the data is in memory
data structures
algorithms
Author

Tony Phung

Published

January 21, 2025

1. Exampe with Arrays

Imagine I opened a cafe called Tonys Cafe to sell my favourite foods.

Using my existing DSA knowledge (i.e. arrays), the menu may looking like something below.

That is, an array with sub-arrays representing the food item and prices. To find a specific food item:

  • If menu/array is unordered, it will take \(O(N)\) steps via \(LinearSearch()\)
  • If menu/array is ordered, it will take \(O(log_{2}N)\) steps via \(BinarySearch()\)

However I used a special data structure called hash table, to find our item:

  • It only takes \(O(1)\) steps!

1.1 Create Array with Sub-Arrays

menu_arr = [
    ["Deep-Fried Chicken Wings", 4.99],
    ["Malaysian Laksa", 4.99],
    ["Tony Pepperoni Pizza", 4.99],
    ["Cheesey Carbonara", 4.99],
]

menu_title = "Tony's Cafe Menu"
menu_price = "Price ($A)"
print(f"{menu_title:25} {menu_price:>10}")
print(f"{'-'*36}")

for menu_item in menu_arr:
    # menu_item_str = menu_item[0]+":"
    print(f"{menu_item[0]:25} {menu_item[1]:>10}")
Tony's Cafe Menu          Price ($A)
------------------------------------
Deep-Fried Chicken Wings        4.99
Malaysian Laksa                 4.99
Tony Pepperoni Pizza            4.99
Cheesey Carbonara               4.99

2. Example with Hash Table

2.1 Create Hash Table (aka Python Dictionary) example

menu_dct = {
    "Deep-Fried Chicken Wings": 4.99,
    "Malaysian Laksa": 4.99,
    "Tony Pepperoni Pizza": 4.99,
    "Cheesey Carbonara": 4.99
}
print(menu_dct) # print whole dct
print(type(menu_dct)) # get type
print(menu_dct.get("Tony Pepperoni Pizza")) # get food item price
print()
print(f"{menu_title:25} {menu_price:>10}") # print menu
print(f"{'-'*36}")
for k,v in menu_dct.items():
    print(f"{k:25} {v:>10}")
{'Deep-Fried Chicken Wings': 4.99, 'Malaysian Laksa': 4.99, 'Tony Pepperoni Pizza': 4.99, 'Cheesey Carbonara': 4.99}
<class 'dict'>
4.99

Tony's Cafe Menu          Price ($A)
------------------------------------
Deep-Fried Chicken Wings        4.99
Malaysian Laksa                 4.99
Tony Pepperoni Pizza            4.99
Cheesey Carbonara               4.99

3. Hashing with Hash Functions (HF)

Hashing is the process of taking characters and converting them to number.
Hashing the integer value (usually integer) created from hashing. Used as the index for the hash table Hash function is the code used to conduct hashing.
* Valid means the function converts the same string to the same number every single time.

3.1 HF: Addition Example 1

  • Assume A=1, B=2, C=3, D=4, etc…
  • Input is BAD
  • HF(BAD) will return: 1 + 2 + 4 = 7

3.2 HF: Multiplication Example 1

  • Assume A=1, B=2, C=3, D=4, etc…
  • Input is BAD
  • HF(BAD) will return: 1 * 2 * 4 = 8

3.3 HF: Multiplication Example 2

  • Assume A=1, B=2, C=3, D=4, etc…
  • Input is DAB
  • HF(DAB) will return: 4 * 1 * 2 = 8

Note: BAD and DAB returned the same index value of 8. This issue needs and will be address later.

4. Thesaurus App - Part 1

Imagine creating a thesaurus_app when given a input_word (or key) it return a single synonym (or value)

  • Create hash_values (or indices to our array):
    • by hashing our input_word via
    • our hash_function
  • Create a letter_dict dictionary from some input (representing real world data):
    • key (at index): hash_value
    • value: synonym
  • Retrieve synonym by:
    • by hashing our input_word via our hash_function
    • using dict[hash_value]

4.1 Create letters_dict

Each dict key is the letter and each dict value is the letter’s place in the alphabet

thesaurus_arr = [None]*16 # create empty thesaurus
letters_dict = {chr(i): i - 96 for i in range(97, 97 + 26)}
print(letters_dict)
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}

4.2 Test letters_dict

letters_dict["d"]
4

4.3 Create hash_function

Converts a string into a hash_value.

This hash_value will be the index in our thesaurus_dict.

def alphabet_multiplicative_hash_fn(input_word: str):
    # convert some string into hash_value by multiplying all letters value
    hash_value = 1
    for character in input_word:
    # dict_val = dict[char]
        letter_val = letters_dict[character]
        hash_value *= letter_val # cumulative product: char1 * char2 ...
    return hash_value

4.4 Test hash_function

print(alphabet_multiplicative_hash_fn("bad")) # 2*1*4 = 8
print(alphabet_multiplicative_hash_fn("cab")) # 3*1*2 = 6
8
6

4.5 Add entries into App (mock-up data)

Pretend the computer memorys storing our dictionary is represented by an array of 16 contiguous values like a list.

Add 3 entries:

  • thesaurus[​“bad”​] = ​“evil”​
  • thesaurus[​“cab”​] = ​“taxi”​
  • thesaurus[​“ace”​] = ​“star”​
PretendDictMemoryList = [None]*16
PretendDictMemoryList
[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

4.6 Add Entry to Memory - Psuedo-Code

  • Get each pair (string,synonym) entry
  • Create hash_value for each pairs string
  • Add to memory at hash_value index: memory[hash_value]=synonym

4.7 Add Entry to Memory - Python

entry_pairs_list = [("bad","evil"),("cab","taxi"),("aaa","best")]
entry_pairs_list

for pair in entry_pairs_list:
    hash_value = alphabet_multiplicative_hash_fn(pair[0])
    PretendDictMemoryList[hash_value-1]=pair[1]
    # print(hash_value, PretendDictMemoryList)
    
thesaurus_app = PretendDictMemoryList

5. Next Up

I’ll add the feature to get our a synonym given a input_word and explore it’s time-complexity.

Also, how to deal with collisions. This is when are multiple entries at the same index of our dictionary. That is, the same hash value calculated for different words.