Skip to main content

HashMap in Java | How HashMap internally works | Performance of HashMap

In this tutorial, we are going to learn how HashMap works internally.This is a very popular java interview question from the collection framework and been asked so many times to check candidates understanding on Map collection.

If you want video explanation then go through this :




Let’s suppose A hashmap is a storage place where we want to keep all of the data. So building upon this, we might want to add data, retrieve data or manipulate data, right? We’ll go through all of this by understanding the structure of the HashMap Java Class.


`public class java.util.HashMap<K, V> {}`


What does this mean? It says, that a HashMap (A storage place) will store the data in the form of a key-value pair where K is key and V is value. Why key-value, you’d ask? So this is because it makes the retrieval of data easier.


What a hashmap does is, it takes the data and store the data in a bucket and then label that bucket with the hash of the key. So when we want that data back, we just call a method and pass in that key. And the hashmap will again hash the key, looks up if it has a bucket labeled with the hash of the key.

I told you that the HashMap stores the data in key value pair and that it uses the power of both array and a linkedlist. That’s because it has an array and linked list implemented in it (Duh!).
The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.
One thing to note here is that, the HashMap stores the object in the form of Entry Class. And that Entry class is in the form of a linked list.
An Entry class looks like this.

static class Entry<K,V> implements Map.Entry<K,V>{ final K key; V value; Entry<K,V> next; final int hash; ........}


So, let’s suppose that each key can correspond to multiple values (Collision — explanation given later), these multiple values will be then stored in the Entry Object one by one and each entry object will be a LinkedList Node.


Collisions in HashMap

A collision is a case when we try to add different keys with the same hash. To avoid returning back the value and saying that the HashMap can’t add this new value because it already contains this hash of the key, what it does is, it creates a linked list. And keeps on adding the values with the same hash but different key by creating new node in the linked list present at the array position calculated with the hash of that key. Get the idea with this picture.


The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.

There are mainly 3 operations to perform:

  1. Adding Data
  2. Retrieving Data
  3. Manipulating Data
Let’s go into detail of each one.

1. Adding Data

To add data in a HashMap, we use the put() method.
`public V put(K key, V value) {…}`


What it’s saying is, to add data in form of key value pair where K can be anything like int, String, char or even an Object. And same goes for a value. And this method returns the old value present for that key. If there’s no value present for that key, this method will return null.

Let’s go into the method put.
public V put(K key, V value) {
  if (key == null)
  return putForNullKey(value);
  int hash = hash(key.hashCode());
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next)
  {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
    {
      V oldValue = e.value;
        e.value = value;
        e.recordAccess(this);
        return oldValue;
    }
   }   return null;
}


Step 1: What it’s saying is, first check if it’s null or not. If it’s null, don’t take the hash of the key as it will produce an error and instead take the data and insert a linked list node at 0th index of the array and put the data and key as an Entry Object. Only one object can be stored at the 0th index.
If the key is not null, then take the hash of the key through the hash method overriden from the Object Class by the developer and then again take a hash through hash method implemented by the HashMap .

Step 2: Now this hash will be a long digit integer. The array size cannot be this huge. The default array size is 16. And it grows exponentially in the power of 2. So what we do is, we take the hash and the array size and through some algorithm, produce an index within the range of the size of the array.

Step 3: Then from the array it takes out each object (an Entry Object) one by one, and checks if the key matches and the hash matches. If it does, it adds the Entry object into linked list and return the old value.
If there is already a value present in the linked list for that key, it adds the new Entry object at the end of that node and it forms a chain like this.


How to prevent the dupicate Key entry? This is done by the equals() method. An equal method as used in the if condition in the above example. If the key is present it just adds the Entry object in the next node at that array location.

2. Retrieving Data

For retrieval, HashMap uses a get() Method.

`public V get(K)`

This method is saying that it gives you the Value (Entry Object), if you add the key in the get method. Let’s dive deeper into the method structure. It’s almost same as the put() method.
public V get(Object key)
{
  if (key == null)
  return getForNullKey();
  int hash = hash(key.hashCode());
  for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next) 
   {
     Object k;
     if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
     return e.value;
   }
  return null;
}

Step 1: Check if the key is null or not. If the key is null, then return the value kept at the 0th index of the array.

Step 2: If the key is not null, take the key and use the developer overridden hashCode() method if it’s there. Then take the hash of the output again.

Step 3: In the for loop, what you see is, you take out the whole linked list on that array location by passing the index (calculated from the indexFor() method)

Step 4: Now we know that Each entry object looks like this.

static class Entry<K,V> implements Map.Entry<K,V>
{
 final K key;
 V value;
 Entry<K,V> next;
 final int hash;
 ........
}

So it means each node of the linked list looks like this. Each node stores the value, the hash, the key and the address of the next Entry Object or the node in the chain.

So when in the for loop, we get the first entry object of the linked List at that array location. Then we match it’s Key and Value and the hash. If it matches we return the value or else we pick up the next node in the linked list. and do the same process again.

Step 4: If we find the value, we return it or else we return null.

3. Manipulating Data

`public V remove(K);
public void clear();
public boolean remove(K,V);
public boolean replace(K, V, V);
public V replace(K, V);`



All of these methods are used for manipulating data. I guess you understand how this works now. Still if you don’t, ask in the comment section.

Performance of the HashMap

The performance of the hashmap is affected by the Initial Capacity and the Load Factor.

Initial capacity is the array size. It’s default value is 16. And it grows in the power of 2.

Load Factor is the value that determines at which point the array needs to be resized. The default value is 0.75. This means if the array has been filled upto INITIAL_CAPACITY*LoadFactor, the array is resized and then it becomes of 32 in length (16*2). And that keeps on repeating.

If you want to set them explicitly, yourself, you can do it like this.

Map<String,String> hashMapWithCapacity=new HashMap<>(32);
Map<String,String> hashMapWithCapacityAndLF=new HashMap<>(32, 0.5f);

A high INITIAL_CAPACITY is good for large number of Entries but little iterations.
And a low INITIAL_CAPACITY is good for small number of entries but huge iterations.


New In Java 8

As of java 8, when the Entries in a linked list reaches 8 (MIN_TREEIFY_CAPACITY;), it converts the linked list to a Balanced Tree . This improved the performance a million times.
Earlier, the get method had the complexity of O(n) in worst case scenarios. But that changed in Java 8. It became O(logn). This was because of the Balanced Tree which made the traversal faster.

Now even with a million entries, it takes only 10 nanosecond!


Comments

Popular posts from this blog

Long-Polling vs WebSockets vs Server-Sent Events

Long-Polling vs WebSockets vs Server-Sent Events  Long-Polling, WebSockets, and Server-Sent Events are popular communication protocols between a client like a web browser and a web server. First, let’s start with understanding what a standard HTTP web request looks like. Following are a sequence of events for regular HTTP request: Client opens a connection and requests data from the server. The server calculates the response. The server sends the response back to the client on the opened request. HTTP Protocol Ajax Polling Polling is a standard technique used by the vast majority of AJAX applications. The basic idea is that the client repeatedly polls (or requests) a server for data. The client makes a request and waits for the server to respond with data. If no data is available, an empty response is returned. Client opens a connection and requests data from the server using regular HTTP. The requested webpage sends requests to...

MonoLithic Vs Microservice Architecture | which Architecture should i choose ?

From last few years ,microservices are an accelerating trend . Indeed, microservices approach offers tangible benefits including an increase in scalability, flexibility, agility, and other significant advantages. Netflix, Google, Amazon, and other tech leaders have successfully switched from monolithic architecture to microservices. Meanwhile, many companies consider following this example as the most efficient way for business growth. On the contrary, the monolithic approach is a default model for creating a software application. Still, its trend is going down because building a monolithic application poses a number of challenges associated with handling a huge code base, adopting a new technology, scaling, deployment, implementing new changes and others. So is the monolithic approach outdated and should be left in the past? And is it worth shifting the whole application from a monolith to microservices ? Will developing a microservices application help you reach you...

Face Detection model on Image/Webcam/Video using Machine Learning OpenCV

Face detection is a computer technology that is being applied for many different applications that require the identification of human faces in digital images or video. It can be regarded as a specific case of object-class detection, where the task is to find the locations and sizes of all objects in an image that belongs to a given class. The technology is able to detect frontal or near-frontal faces in a photo, regardless of orientation, lighting conditions or skin color . Not Face Recognition! It’s about Detection. Face recognition describes a biometric technology that goes way beyond recognizing when a human face is present. It actually attempts to establish whose face it is. In this article, I’m not going deep into recognizing. I’ll keep that for a future blog article and for the time being, I’m going to explain how to run a simple Face  Detection program using your WebCam with Python Or we can run simple program on image as well . We are dealing with below Cascade models for...