Tuesday, October 12, 2010

Hashtable Vs Hashmap, Vector Vs ArrayList

A Map is a class that stores key-value pairs and provides a way to locate a value based on the key. The Hashtable class is a synchronized implementation of the Map interface. The HashMap is better in terms performance because the hashing algorithm it uses. The Hashtable is synchronized so performance is slightly worse.

As to the difference between a Vector and an ArrayList:

I. Synchronization

Vectors are synchronized. Any method that touches the Vector’s contents is thread safe. ArrayList, on the other hand, is unsynchronized, making them, therefore, not thread safe. With that difference in mind, using synchronization will incur a performance hit. So if you don’t need a thread-safe collection, use the ArrayList. Why pay the price of synchronization unnecessarily?

II. Data growth

Internally, both the ArrayList and Vector hold onto their contents using an Array. You need to keep this fact in mind while using either in your programs. When you insert an element into an ArrayList or a Vector, the object will need to expand its internal array if it runs out of room. A Vector defaults to doubling the size of its array, while the ArrayList increases its array size by 50 percent. Depending on how you use these classes, you could end up taking a large performance hit while adding new elements. It’s always best to set the object’s initial capacity to the largest capacity that your program will need. By carefully setting the capacity, you can avoid paying the penalty needed to resize the internal array later. If you don’t know how much data you’ll have, but you do know the rate at which it grows, Vector does possess a slight advantage since you can set the increment value.

III. Usage patterns

Both the ArrayList and Vector are good for retrieving elements from a specific position in the container or for adding and removing elements from the end of the container. All of these operations can be performed in constant time — O(1). However, adding and removing elements from any other position proves more expensive — linear to be exact: O(n-i), where n is the number of elements and i is the index of the element added or removed. These operations are more expensive because you have to shift all elements at index i and higher over by one element. So what does this all mean? It means that if you want to index elements or add and remove elements at the end of the array, use either a Vector or an ArrayList. If you want to do anything else to the contents, go find yourself another container class. For example, the LinkedList can add or remove an element at any position in constant time — O(1). However, indexing an element is a bit slower — O(i) where i is the index of the element. Traversing an ArrayList is also easier since you can simply use an index instead of having to create an iterator. The LinkedList also creates an internal object for each element inserted. So you have to be aware of the extra garbage being created.

Java performance tips – Part 1 : Vector, ArrayList & LinkedList

When, where & how to use Vector & ArrayList

Even though both Vector & ArrayList are used for the same purpose – as a dynamic collection of related objects – there is a right time and place to use them. As we all know there is a major difference: Vector is thread-safe but ArrayList is not. If there is no need of synchronization we can use ArrayList instead of Vector. And that will definitely be a performance booster.

When size matters

Internally both Vector & ArrayList uses arrays to hold the elements. Both supports an initialCapacity parameter which decides the initial size of the array. (I really don’t remember the default value). Once if the capacity becomes insufficient to accommodate new elements either of these collections need to expand their capacity. Both Vector & ArrayList do it differently. Vector doubles the current capacity & ArrayList increases the capacity by 50%. Whenever we create a Vector we can specify a capacityIncrement parameter; which is an integer using which the capacity will get incremented whenever the Vector overflows.

Its also advised that, if we know the maximum number of elements that gonna be inserted in to the collection, we should set it as the initialCapacity while creating the collection itself. But its not practically possible every time.

Element access & performance

In both Vector & ArrayList addition/deletion of elements at one end & accessing the elements at a particular index wont take much effort (just O(1) ). But adding or deleting an element at the i-th position will take some effort based on the value of i (O(n-i), where n is the size of the collection). Use of a LinkedList can help us here. B’ coz in a LinkedList addition/deletion of elements is pretty easy as it’s a matter of adjusting the pointers (just O(1)).

LinkedList is good, but…

The problem with LinkedList is that it will create an object for each element being added to the list. Means double the duty for the garbage collector. And as LinkedList is not using arrays to hold elements, it’s a bit painful for it to access elements at a particular index (O(i), where i the index).

So, programmers should decide when to use what based on the requirement.