In this tutorial, we are going to learn how HashMap works internally.This is a very popular java interview question from the collection framework and been asked so many times to check candidates understanding on Map collection.
If you want video explanation then go through this :
Let’s suppose A hashmap is a storage place where we want to keep all of the data. So building upon this, we might want to add data, retrieve data or manipulate data, right? We’ll go through all of this by understanding the structure of the HashMap Java Class.
`public class java.util.HashMap<K, V> {}`
What does this mean? It says, that a HashMap (A storage place) will store the data in the form of a key-value pair where K is key and V is value. Why key-value, you’d ask? So this is because it makes the retrieval of data easier.
What a hashmap does is, it takes the data and store the data in a bucket and then label that bucket with the hash of the key. So when we want that data back, we just call a method and pass in that key. And the hashmap will again hash the key, looks up if it has a bucket labeled with the hash of the key.
I told you that the HashMap stores the data in key value pair and that it uses the power of both array and a linkedlist. That’s because it has an array and linked list implemented in it (Duh!).
The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.
One thing to note here is that, the HashMap stores the object in the form of Entry Class. And that Entry class is in the form of a linked list.
An Entry class looks like this.
So, let’s suppose that each key can correspond to multiple values (Collision — explanation given later), these multiple values will be then stored in the Entry Object one by one and each entry object will be a LinkedList Node.
The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.
What it’s saying is, to add data in form of key value pair where K can be anything like int, String, char or even an Object. And same goes for a value. And this method returns the old value present for that key. If there’s no value present for that key, this method will return null.
Let’s go into the method put.
Step 1: What it’s saying is, first check if it’s null or not. If it’s null, don’t take the hash of the key as it will produce an error and instead take the data and insert a linked list node at 0th index of the array and put the data and key as an Entry Object. Only one object can be stored at the 0th index.
If the key is not null, then take the hash of the key through the hash method overriden from the Object Class by the developer and then again take a hash through hash method implemented by the HashMap .
Step 2: Now this hash will be a long digit integer. The array size cannot be this huge. The default array size is 16. And it grows exponentially in the power of 2. So what we do is, we take the hash and the array size and through some algorithm, produce an index within the range of the size of the array.
Step 3: Then from the array it takes out each object (an Entry Object) one by one, and checks if the key matches and the hash matches. If it does, it adds the Entry object into linked list and return the old value.
If there is already a value present in the linked list for that key, it adds the new Entry object at the end of that node and it forms a chain like this.
How to prevent the dupicate Key entry? This is done by the equals() method. An equal method as used in the if condition in the above example. If the key is present it just adds the Entry object in the next node at that array location.
This method is saying that it gives you the Value (Entry Object), if you add the key in the get method. Let’s dive deeper into the method structure. It’s almost same as the put() method.
Step 1: Check if the key is null or not. If the key is null, then return the value kept at the 0th index of the array.
Step 2: If the key is not null, take the key and use the developer overridden hashCode() method if it’s there. Then take the hash of the output again.
Step 3: In the for loop, what you see is, you take out the whole linked list on that array location by passing the index (calculated from the indexFor() method)
Step 4: Now we know that Each entry object looks like this.
So it means each node of the linked list looks like this. Each node stores the value, the hash, the key and the address of the next Entry Object or the node in the chain.
So when in the for loop, we get the first entry object of the linked List at that array location. Then we match it’s Key and Value and the hash. If it matches we return the value or else we pick up the next node in the linked list. and do the same process again.
Step 4: If we find the value, we return it or else we return null.
All of these methods are used for manipulating data. I guess you understand how this works now. Still if you don’t, ask in the comment section.
Initial capacity is the array size. It’s default value is 16. And it grows in the power of 2.
Load Factor is the value that determines at which point the array needs to be resized. The default value is 0.75. This means if the array has been filled upto INITIAL_CAPACITY*LoadFactor, the array is resized and then it becomes of 32 in length (16*2). And that keeps on repeating.
If you want to set them explicitly, yourself, you can do it like this.
Map<String,String> hashMapWithCapacity=new HashMap<>(32);
Map<String,String> hashMapWithCapacityAndLF=new HashMap<>(32, 0.5f);
A high INITIAL_CAPACITY is good for large number of Entries but little iterations.
And a low INITIAL_CAPACITY is good for small number of entries but huge iterations.
As of java 8, when the Entries in a linked list reaches 8 (MIN_TREEIFY_CAPACITY;), it converts the linked list to a Balanced Tree . This improved the performance a million times.
Earlier, the get method had the complexity of O(n) in worst case scenarios. But that changed in Java 8. It became O(logn). This was because of the Balanced Tree which made the traversal faster.
Now even with a million entries, it takes only 10 nanosecond!
If you want video explanation then go through this :
Let’s suppose A hashmap is a storage place where we want to keep all of the data. So building upon this, we might want to add data, retrieve data or manipulate data, right? We’ll go through all of this by understanding the structure of the HashMap Java Class.
`public class java.util.HashMap<K, V> {}`
What does this mean? It says, that a HashMap (A storage place) will store the data in the form of a key-value pair where K is key and V is value. Why key-value, you’d ask? So this is because it makes the retrieval of data easier.
What a hashmap does is, it takes the data and store the data in a bucket and then label that bucket with the hash of the key. So when we want that data back, we just call a method and pass in that key. And the hashmap will again hash the key, looks up if it has a bucket labeled with the hash of the key.
The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.
One thing to note here is that, the HashMap stores the object in the form of Entry Class. And that Entry class is in the form of a linked list.
An Entry class looks like this.
static class Entry<K,V> implements Map.Entry<K,V>{ final K key; V value; Entry<K,V> next; final int hash; ........}
So, let’s suppose that each key can correspond to multiple values (Collision — explanation given later), these multiple values will be then stored in the Entry Object one by one and each entry object will be a LinkedList Node.
Collisions in HashMap
A collision is a case when we try to add different keys with the same hash. To avoid returning back the value and saying that the HashMap can’t add this new value because it already contains this hash of the key, what it does is, it creates a linked list. And keeps on adding the values with the same hash but different key by creating new node in the linked list present at the array position calculated with the hash of that key. Get the idea with this picture.There are mainly 3 operations to perform:
- Adding Data
- Retrieving Data
- Manipulating Data
1. Adding Data
To add data in a HashMap, we use the put() method.
`public V put(K key, V value) {…}`
public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } return null; }
Step 1: What it’s saying is, first check if it’s null or not. If it’s null, don’t take the hash of the key as it will produce an error and instead take the data and insert a linked list node at 0th index of the array and put the data and key as an Entry Object. Only one object can be stored at the 0th index.
If the key is not null, then take the hash of the key through the hash method overriden from the Object Class by the developer and then again take a hash through hash method implemented by the HashMap .
Step 2: Now this hash will be a long digit integer. The array size cannot be this huge. The default array size is 16. And it grows exponentially in the power of 2. So what we do is, we take the hash and the array size and through some algorithm, produce an index within the range of the size of the array.
Step 3: Then from the array it takes out each object (an Entry Object) one by one, and checks if the key matches and the hash matches. If it does, it adds the Entry object into linked list and return the old value.
If there is already a value present in the linked list for that key, it adds the new Entry object at the end of that node and it forms a chain like this.
How to prevent the dupicate Key entry? This is done by the equals() method. An equal method as used in the if condition in the above example. If the key is present it just adds the Entry object in the next node at that array location.
2. Retrieving Data
For retrieval, HashMap uses a get() Method.
`public V get(K)`
public V get(Object key)
{
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next)
{
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
Step 1: Check if the key is null or not. If the key is null, then return the value kept at the 0th index of the array.
Step 2: If the key is not null, take the key and use the developer overridden hashCode() method if it’s there. Then take the hash of the output again.
Step 3: In the for loop, what you see is, you take out the whole linked list on that array location by passing the index (calculated from the indexFor() method)
Step 4: Now we know that Each entry object looks like this.
static class Entry<K,V> implements Map.Entry<K,V>
{
final K key;
V value;
Entry<K,V> next;
final int hash;
........
}
3. Manipulating Data
`public V remove(K);
public void clear();
public boolean remove(K,V);
public boolean replace(K, V, V);
public V replace(K, V);`
public void clear();
public boolean remove(K,V);
public boolean replace(K, V, V);
public V replace(K, V);`
All of these methods are used for manipulating data. I guess you understand how this works now. Still if you don’t, ask in the comment section.
Performance of the HashMap
The performance of the hashmap is affected by the Initial Capacity and the Load Factor.Initial capacity is the array size. It’s default value is 16. And it grows in the power of 2.
Load Factor is the value that determines at which point the array needs to be resized. The default value is 0.75. This means if the array has been filled upto INITIAL_CAPACITY*LoadFactor, the array is resized and then it becomes of 32 in length (16*2). And that keeps on repeating.
Map<String,String> hashMapWithCapacity=new HashMap<>(32);
Map<String,String> hashMapWithCapacityAndLF=new HashMap<>(32, 0.5f);
A high INITIAL_CAPACITY is good for large number of Entries but little iterations.
And a low INITIAL_CAPACITY is good for small number of entries but huge iterations.
New In Java 8
As of java 8, when the Entries in a linked list reaches 8 (MIN_TREEIFY_CAPACITY;), it converts the linked list to a Balanced Tree . This improved the performance a million times.Earlier, the get method had the complexity of O(n) in worst case scenarios. But that changed in Java 8. It became O(logn). This was because of the Balanced Tree which made the traversal faster.
Now even with a million entries, it takes only 10 nanosecond!
Comments
Post a Comment