Java Programmer: Data Structures and Algorithms in Java By Simon Harris, James Ross

Just now i have completed reading Data Structures and Algorithms in Java By Simon Harris, James Ross

I need to accept that i could not understand much from these below chapters.

a) Advanced Sorting Chapter
b) Ternary Search Trees Chapter
c) B-Trees Chapter
d) String Matching
e) Computational Geometry

Below are the points i wanted to share with readers from this book.

1) Fundamental Development practices include

a) Simplicity leads to better code.
b) Don’t optimize prematurely.
c) Interfaces promote flexibility of design.
d) All production code should be covered by automated unit and functional tests.
e) Assertions are a developer’s best friend.

2) The complexity of an algorithm is usually defined in terms of the order of magnitude of the number of operations required to perform a function, denoted by a capital O for order of—hence, big-O—followed by an expression representing some growth relative to the size of the problem denoted by the letter N.

O(1): Pronounced “order 1” and denoting a function that runs in constant time

O(N): Pronounced “order N” and denoting a function that runs in linear time

O(N2) : Pronounced “order N squared” and denoting a function that runs in quadratic time

O(log N): Pronounced “order log N” and denoting a function that runs in logarithmic time

O(N log N): Pronounced “order N log N” and denoting a function that runs in time proportional
to the size of the problem and the logarithmic time

O(N!): Pronounced “order N factorial” and denoting a function that runs in factorial time

3)

a) Constant Time: O(1)
You would be forgiven for assuming that a complexity of O(1) implies that an algorithm only ever takes one operation to perform its function. While this is certainly possible, O(1) actually means that an algorithm takes constant time to run; in other words, performance isn’t affected by the size of the problem. If you think this sounds a little too good to be true, then you’d be right.

Granted, many simple functions will run in O(1) time. Possibly the simplest example of constant time performance is addressing main memory in a computer, and by extension, array lookup. Locating an element in an array generally takes the same amount of time regardless of size.

b) Linear Time: O(N)

An algorithm runs in O(N) if the number of operations required to perform a function is directly proportional to the number of items being processed.

One example of this might be waiting in a line at a supermarket. On average, it probably takes about the same amount of time to move each customer through the checkout: If it takes two minutes to process one customer’s items, it will probably take 2 × 10 = 20 minutes to process ten customers, and 2 × 40 = 80 minutes to process 40 customers. The important point is that no matter how many customers are waiting in line, the time taken to process each one remains about the same. Therefore, we can say that the processing time is directly proportional to the number of customers, and thus O(N).

c) Quadratic Time: O(N2)

Imagine a group of people are meeting each other for the first time and in keeping with protocol, each person in the group must greet and shake hands with every other person once. If there were six people in the group, there would be a total of 5 + 4 + 3 + 2 + 1 = 15 handshakes,

What would happen if there were seven people in the group? There would be 6 + 5 + 4 + 3 + 2 + 1 = 21 handshakes. If there were eight people? That would work out to be 7 + 6 + ... + 2 + 1 = 28 handshakes. If there were nine people? You get the idea: Each time the size of the group grows by one, that extra person must shake hands with every other person.

The number of handshakes required for a group of size N turns out to be (N2 – N) /

2. Because big-O notation disregards any constants—in this case, the 2—we’re left with N2 – N. As N becomes larger, subtracting N from N2 will have less and less of an overall effect, so we can safely ignore the subtraction, leaving us with a complexity of O(N2).

Algorithms that run in quadratic time may be a computer programmer’s worst nightmare; any algorithm with a complexity of O(N2) is pretty much guaranteed to be useless for solving all but the smallest of problems.

d) Logarithmic Time: O(log N) and O(N log N)

The running time of a logarithmic algorithm increases with the log—in most cases, the log base 2—of the problem size. This means that even when the size of the input data set increases by a factor of a million,the run time will only increase by some factor of log(1,000,000) = 20. An easy way to calculate the log base 2 of an integer is to work out the number of binary digits required to store the number. For
example, the log base 2 of 300 is 9, as it takes 9 binary digits to represent the decimal number 300 (the binary representation is 100101100).

Achieving logarithmic running times usually requires your algorithm to somehow discard large portions of the input data set.

O(N log N) is still better than O(N2) but not quite as good as O(N).

e) Factorial Time: O(N!)

As a consequence, even more so than with O(N2), you’d better hope that your code isn’t O(N!).

4) A list is an ordered collection of elements supporting random access to each element, much like an array—you can query a list to get the value contained at any arbitrary element. Lists also preserve insertion order so that, assuming there are no intervening modifications, a given list will always return the same value for the same position.

5) A linked list, conversely, is a chain of elements in which each item has a reference (or link) to the next (and optionally previous) element.Double linking makes it possible to traverse the elements in either direction. It also makes insertion and deletion much simpler than it is for an array list.

6) Queues are often described in terms of producers and consumers. A producer is anything that stores data in a queue, while a consumer is anything that retrieves data from a queue.

Queues can be ether bounded or unbounded. Bounded queues have limits placed on the number of items that can be held at any one time. These are especially useful when the amount of available memory is constrained—for example, in a device such as a router or even an in-memory message queue.Unbounded queues, conversely, are free to grow in size as the limits of the hardware allow.

Below are typical Queue operations

Operation	Description>
enqueue	Stores a value in the queue. The size of the queue will increase by one.
dequeue	Retrieves the value at the head of the queue. The size of the queue will decrease by one. Throws EmptyQueueException if there are no more items in the queue.
clear	Deletes all elements from the queue. The size of the queue will be reset to zero (0).
Size	Obtains the number of elements in the queue.
isEmpty	Determines whether the queue is empty (size() == 0) or not.

7) A linked list is perfectly suited for use with a queue, as it is capable of efficiently adding and removing elements from either end. Compare this to an array list, which you may recall incurs the overhead of continually moving elements as they are removed.

8) Instead, a blocking queue is one way to provide a thread-safe implementation,ensuring that all access to the data is correctly synchronized.

By encapsulating all this behind the Queue interface, you free the consumers of the queue from the intricacies and subtleties of thread synchronization. There are two options for creating the blocking queue:extend an existing queue implementation (so far only ListFifoQueue), or try to wrap the behavior around another queue. The first option would lock you into one specific queue implementation, so instead you
should use the second option—wrap another queue—as it gives you the flexibility to easily turn any queue implementation.

Below is one implementation of Blocking Queue.


/**
 * A {@link Thread}-safe {@link Queue} that blocks (waits) until values are available to {@link #dequeue()}.
 *
 * TODO: Tests.
 */
public class BlockingQueue implements Queue {
    /** The lock object to use for synchronisation. */
    private final Object _mutex = new Object();

    /** The underlying queue. */
    private final Queue _queue;

    /** The maximum size of the queue. */
    private final int _maxSize;

    /**
     * Constructor.
     *
     * @param queue The underlying Queue.
     */
    public BlockingQueue(Queue queue) {
        this(queue, Integer.MAX_VALUE);
    }

    /**
     * Constructor.
     *
     * @param queue The underlying queue.
     * @param maxSize The maximum size of the queue.
     */
    public BlockingQueue(Queue queue, int maxSize) {
        assert queue != null : "queue can't be null";
        assert maxSize > 0 : "size can't be < 1";

        _queue = queue;
        _maxSize = maxSize;
    }

    public void enqueue(Object value) {
        synchronized (_mutex) {
            while (size() == _maxSize) {
                waitForNotification();
            }
            _queue.enqueue(value);
            _mutex.notifyAll();
        }
    }

    public Object dequeue() throws EmptyQueueException {
        synchronized (_mutex) {
            while (isEmpty()) {
                waitForNotification();
            }
            Object value = _queue.dequeue();
            _mutex.notifyAll();
            return value;
        }
    }

    public void clear() {
        synchronized (_mutex) {
            _queue.clear();
            _mutex.notifyAll();
        }
    }

    public int size() {
        synchronized (_mutex) {
            return _queue.size();
        }
    }

    public boolean isEmpty() {
        synchronized (_mutex) {
            return _queue.isEmpty();
        }
    }

    /**
     * Waits on the semaphore for a notify.
     */
    private void waitForNotification() {
        try {
            _mutex.wait();
        } catch (InterruptedException e) {
            // Ignore
        }
    }
}

9) Plates are usually stacked—you place the first one on the shelf and add to the top. If you need a plate, you remove the top one first. The newspapers at your local convenience store are stacked, as are the books on your desk that you’ve been meaning to read.Stacks can also be used to implement a simple Most-Recently-Used (MRU) cache and are often used for parsing programming languages.

Below are the typical operations of the Stack

Operation	Description
push	Adds a value to the top of the stack. The size of the stack will increase by one.
pop	Deletes and returns the value at the top of the stack. The size of the stack will decrease by one. Throws EmptyStackException when there are no more elements on the stack.
size	Obtains the number of elements in the stack.
peek	Returns but does not delete the value at the top of the stack. Throws EmptyStackException when there are no elements on the stack.
isEmpty	Determines whether a stack is empty. Returns true if the stack is empty (size() == 0); otherwise, returns false.
clear	Deletes all elements from a stack. The size of the stack is reset to zero.

10) Working with a Selection Sort

Imagine that you have a bookshelf filled with several books of varying sizes that are arranged haphazardly.Your mother is coming to visit, and to impress her with your housekeeping prowess, you decide to arrange the books neatly on the shelf in order from the tallest to the shortest.

You’d be unlikely to use bubble sort in this case, because all that swapping would be a waste of time.You’d be taking each book out and putting it back on the shelf many times, and that would take too long. In this example, the cost of moving the items is relatively large when measured against the cost of comparing items. A selection sort is a better choice here.

Start by scanning the shelf for the tallest book. Pull it out, as it needs to move to the far left of the shelf.Rather than move all the other books along the shelf to make room for it, just pull out the book that happens to be in the space where you want this one to go and swap them. Of course, the rest of the books will have to move a little because the books vary in thickness, but that won’t matter in this software implementation, so just ignore that little issue.

Leaving the largest books where they are, continue to scan the remaining books for the tallest among them, each time swapping it with the book that is just to the right of the already sorted books at the left end of the shelf. Each time you scan the shelf, you are selecting the next book in order and moving it into
its final sorted position. That’s why this algorithm is called selection sort.

11) Understanding Insertion Sort

Insertion sort is the algorithm very commonly used by people playing cards to sort the hand they have been dealt. Imagine you have been given five cards face down and you want to sort them according to the following rules:

Separate into suits in the following order: spades, clubs, diamonds, and hearts
Within each suit, sort in ascending order: Ace, 2, 3, . . . , 9, 10, jack, queen, king

I really liked the way authors implemented Insertion Sort using Linked List implementation.


Below is the tricky sort() implementation 

 /**
     * Sorts a list using the inseriton sort algorithm.
     *
     * @param list The list to sort.
     * @return a new List containing the objects from the original List in sorted order.
     */
    public List sort(List list) {
        assert list != null : "list cannot be null";

        final List result = new LinkedList();

        sort(list, result);

        return result;
    }

    void sort(List list, final List result) {
        assert list != null : "list can't be null";
        assert result != null : "result can't be null";

        Iterator it = list.iterator();

        for (it.first(); !it.isDone(); it.next()) {
            int slot = result.size();
            while (slot > 0) {
                if (_comparator.compare(it.current(), result.get(slot - 1)) >= 0) {
                    break;
                }
                --slot;
            }
            result.insert(slot, it.current());
        }
    }

12) A priority queue is a queue that supports accessing items from the largest to the smallest.The point is that the priority queue has a mechanism to determine which item is the largest (a comparator)
and provides access to the largest item in the queue at any given time.

13) A heap is simply a binary tree structure in which each element is larger than its two children. This is known as the heap condition.

14) Binary searching is a technique for locating items in a sorted list. A binary search takes advantage of certain characteristics of sorted lists that a simple linear search doesn’t. Indeed, whereas a bruteforce linear search runs in O(N) time, a binary search runs in O(log N), assuming the data to be searched is sorted.

The actual performance of binary searching is excellent as long as you are using a data structure (such as an array list) that supports fast, indexed-based lookup.When using a linked list, for example, although the number of comparisons remains the same as for an array list, the continual traversal of the list to find the next item for comparison imposes considerable time penalties.

15)Binary searching uses a divide-and-conquer approach to locating a search key and in doing so achieves an average number of comparisons that approaches O(log N).

Binary searching can be achieved efficiently using either recursion or iteration.

Binary searching works best when used with a data structure that supports fast, index-based lookup.

Binary insertion builds on binary searching to add items to a list while maintaining the sort order using O(log N!) comparisons.

Binary insertion proves to be very efficient, performing as well as—and arguably better than— some of the sorting algorithms.

16) For a balanced tree—such the height of the tree is O(log N).

17) Hashing is a technique that promises to achieve O(1) data lookup. This doesn’t mean that only one comparison will be made, but rather that the number of comparisons will remain the same, no matter how large the data set. Compare this with the O(N) search time of simple linked lists or even O(log N) for binary search trees, and hashing starts to look very attractive.

18) An example of a fairly simple yet effective hashing algorithm is the one used in the String class of the JDK. The algorithm itself is grounded in very sound mathematics.

Documentation says

/**
* Returns a hash code for this string. The hash code for a
* String object is computed as
*


     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     *

* using int arithmetic, where s[i] is the
* ith character of the string, n is the length of
* the string, and ^ indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/

19) One way to decide when to resize the hash table is to monitor the load factor—the ratio of stored values to available buckets. The idea is to set some threshold load factor that, when reached, will trigger a resize. Again, a balance needs to be found between space and performance efficiency. Making the load factor too low leads to a lot of wasted space. If it is too high, then the size of the buckets increases, resulting in more collisions. A load factor of around 0.75 or 75% of capacity is a fairly good trade-off between space and time.

20) I liked the author's Hashtable implementation that uses buckets

Below is the implementation.


/**
 * A {@link Hashtable} that uses buckets.
 *
 */
public class BucketingHashtable implements Hashtable {
    /** The threshold load factor after which resizing occurs. */
    private final float _loadFactor;

    /** The buckets for holding values. */
    private List[] _buckets;

    /** The number of values in the table. */
    private int _size;

    /**
     * Constructor.
     *
     * @param initialCapacity The initial number of buckets.
     * @param loadFactor The threshold load factor after which resizing occurs.
     */
    public BucketingHashtable(int initialCapacity, float loadFactor) {
        assert initialCapacity > 0 : "initialCapacity can't be < 1";
        assert loadFactor > 0 : "loadFactor can't be <= 0";

        _loadFactor = loadFactor;
        _buckets = new List[initialCapacity];
    }

    public void add(Object value) {
        List bucket = bucketFor(value);

        if (!bucket.contains(value)) {
            bucket.add(value);
            ++_size;
            maintainLoad();
        }
    }

    public boolean contains(Object value) {
        List bucket = _buckets[bucketIndexFor(value)];
        return bucket != null && bucket.contains(value);
    }

    public int size() {
        return _size;
    }

    /**
     * Obtains a bucket for a specified value.
     *
     * @param value The value for which a bucket is required.
     * @return The bucket.
     */
    private List bucketFor(Object value) {
        int bucketIndex = bucketIndexFor(value);

        List bucket = _buckets[bucketIndex];
        if (bucket == null) {
            bucket = new LinkedList();
            _buckets[bucketIndex] = bucket;
        }

        return bucket;
    }

    /**
     * Obtains the index to the bucket appropriate for a specified value.
     *
     * @param value The value for which the index is desired.
     * @return The bucket index.
     */
    private int bucketIndexFor(Object value) {
        assert value != null : "value can't be null";
        return Math.abs(value.hashCode() % _buckets.length);
    }

    /**
     * Resizes the table to satisfy the load factor requirements.
     */
    private void maintainLoad() {
        if (loadFactorExceeded()) {
            resize();
        }
    }

    /**
     * Determines if the load factor threshold has been exceeded.
     *
     * @return <code>true</code> if the threshold has been exceeded; otherwise <code>false</code>.
     */
    private boolean loadFactorExceeded() {
        return size() > _buckets.length * _loadFactor;
    }

    /**
     * Re-sizes the table.
     */
    private void resize() {
        BucketingHashtable copy = new BucketingHashtable(_buckets.length * 2, _loadFactor);

        for (int i = 0; i < _buckets.length; ++i) {
            if (_buckets[i] != null) {
                copy.addAll(_buckets[i].iterator());
            }
        }

        _buckets = copy._buckets;
    }

    /**
     * Adds all values from a bucket into the table.
     *
     * @param values The values.
     */
    private void addAll(Iterator values) {
        assert values != null : "values can't be null";

        for (values.first(); !values.isDone(); values.next()) {
            add(values.current());
        }
    }
}

21) A perfect hash function is one that causes no collisions; however, this is hard to achieve.Increasing the table size can reduce the number of collisions at the expense of wasted memory,as can using a prime number for the table size.Linear probing degenerates into a linked list.

22) A set is a pool of distinct, unordered values.The intersection of two sets is another set that contains only those values that are common to both, again remembering that a set can contain no duplicate values.The union of two sets is another set containing all the values from both, remembering of course that a set
has no duplicates.

23) Binary-search-tree-based sets provide O(log N) performance as well as predictable iteration order.

24) Maps store values associated with a key.Maps are also known as associative arrays, dictionaries, indexes, and lookup tables.In general, a map makes no guarantee of the iteration order.Three common map implementations are the list map, the hash map, and the tree map.Binary search tree–based sets provide O(log N) performance as well as predictable iteration order.

25) Ternary search trees are specialized structures for storing and retrieving strings.However, unlike a binary search tree, a ternary search tree doesn’t hold the entire value in each node. Instead, a node holds a single letter from a word, and another reference—hence ternary—to a subtree of nodes containing any letters that follow it in the word

Even in such a simple tree we can see that a ternary search tree performs far fewer
individual character comparisons compared with an equivalent binary search tree. This is because words that share a common prefix are compressed. Having traversed those common letters, you need never compare them again. Compare this with a binary search tree, in which you continually compare all the letters from each node.

In addition to being efficient at finding positive results, ternary search trees particularly excel at quickly discarding words that don’t exist in the tree. Whereas a binary search tree continues searching until it runs out of nodes at the leaves, a ternary search tree will terminate as soon as no matching prefix is found.

26) While searching a balanced ternary search tree runs in O(N log M) comparisons, searching an unbalanced tree requires O(NM) comparisons.One rather novel way in which ternary search trees can be used is for solving just such a problem: finding all the words that match a given pattern.

27) In String searching implementations

A Brute-Force Algorithm

The simplest and most obvious solution is to perform a brute-force scan through the text. This algorithm is quite widely used and actually performs pretty well in most cases. It is also very easy to describe and code.

The brute-force algorithm is very straightforward and can thus be defined in a few simple steps. Imagine overlaying the text with the pattern, starting from the left-hand side and continuing to slide the pattern right one character until a match is found:

a. Start at the first (leftmost) character in the text.

b. Compare, from left-to-right, each character in the pattern to those in the text.

If all of the characters are the same, you have found a match.

Otherwise, if you have reached the end of the text, there can be no more matches, and you are done.

If neither of the preceding results occur, move the pattern along one character to the right and repeat from step b.

Here is one implementation of Brute-Force Aligorithm


public interface StringSearcher {
    public StringMatch search(CharSequence text, int from);
}

public class BruteForceStringSearcher implements StringSearcher {

    private final CharSequence _pattern;

    public BruteForceStringSearcher(CharSequence pattern) {
        assert pattern != null : "pattern can't be null";
        assert pattern.length() > 0 : "pattern can't be empty";

        _pattern = pattern;
    }

    public StringMatch search(CharSequence text, int from) {
        assert text != null : "text can't be null";
        assert from >= 0 : "from can't be < 0";

        int s = from;

        while (s <= text.length() - _pattern.length()) {
            int i = 0;

            while (i < _pattern.length() && _pattern.charAt(i) == text.charAt(s + i)) {
                ++i;
            }
             
            if (i == _pattern.length()) {
                return new StringMatch(_pattern, text, s);
            }

            ++s;
        }

        return null;
    }
}

public class BruteForceStringSearcherTest  {
    
 public static void main(String[] args) {
  
  BruteForceStringSearcher searcher=new BruteForceStringSearcher("ring");
  
  StringMatch match=searcher.search("ttring",0);
  System.out.println("Get text:" + match.getText());
  System.out.println("Get Pattern:" + match.getPattern());
  System.out.println("Get Pattern:" + match.getIndex());
 }
}

28) The Boyer-Moore Algorithm

Although the brute-force approach works fairly well, you have seen that it is far from optimal. Even in the average case, there are numerous false starts and partial matches. However, with a few simple enhancements, you can do much better.

Two men—R. S. Boyer and J. S. Moore—came up with an algorithm that has become the basis for some of the fastest string searching algorithms currently available.

29) Optimization is an important part of software development, but not as important as you might think. We recommend that you take time to accumulate an awareness of the types of issues that affect performance in your particular programming environment, and keep them in mind as you code. For example, using a StringBuffer to build a long character sequence is preferable to multiple concatenations of String objects in Java, so you should do that as a matter of course.

Profiling is the process of measuring the behavior of a program. Java lends itself to profiling because support for it is built right into the virtual machine, as you’ll see later in this chapter. Profiling other languages varies in its difficulty but is still a very popular technique. Three major areas are measured when profiling a program: CPU usage, memory usage, and concurrency, or threading behavior.

A profiler will report statistics such as the following:
a) How many times a method was called
b) How much CPU time was consumed in a given method
c) How much CPU time was consumed by a method and all methods called by it
d) What proportion of the running time was spent in a particular method

These statistics enable you to identify which parts of your code will benefit from some optimization.Similarly for memory usage, a profiler will gather statistics on overall memory usage, object creation,and garbage collection, and provide you with information such as the following:

a) How many objects of each class were instantiated
b) How many instances of each class were garbage collected
c) How much memory was allocated to the virtual machine by the operating system at any given time (the heap size)
d) How much of the heap was free and how much was in use at a given time

30) The standard Sun Java virtual machine supports basic profiling out of the box. To determine whether your Java environment has this support, try the following command:

java -Xrunhprof:help

31) The free Java Memory Profiler provides a graphical view of the memory usage of your application,allowing you to quickly find the problem areas that you need to address.

More details can be found in below site.
http://www.khelekore.org/jmp

About the Authors

Simon Harris started writing animated sprites on a Commodore 64 in primary school. After a break of many years, he taught himself 80x86 and IBM System/370 assembler and started working professionally. Since then he has moved from assembler to C, C++, and, of course, Java. He believes a fundamental understanding and appreciation of algorithms is essential to developing good software; and since starting his own company, RedHill Consulting, he has managed to make a living discussing and demonstrating software development practices and techniques to anyone who will listen.

In his more than 15 years of development experience, James Ross has ranged from building packaged products to large enterprise systems to research into compilers and languages. In recent years, he has become a code quality fanatic and agile methods specialist, particularly with test-driven development. He works as a consultant for ThoughtWorks, the world’s leading agile software development company. He is currently leading the development of a large J2EE project in the insurance industry in Melbourne, Australia. He lives with his wife and family in Melbourne.

Hope you enjoy reading this book

Java Programmer

Tuesday, 17 June 2008

Data Structures and Algorithms in Java By Simon Harris, James Ross

No comments:

Blog Archive

Followers