CSC148H1: December 2013

Friday, December 6, 2013

Dec 6 2013

Well, this is the last post which I will be giving in this course, so I will really make an effort to write this well.
I have done a lot of practice during this week in terms of recursion. The focus of my study is mostly on the problems which I was unable to solve before midterm 2. Before this week, my Impression about recursion on trees is mostly based on the understanding of list concatenation. During this week, I have learned how to truly handle the combinatorial nature of trees by assigning variables to the recursive calls and perform functions on these calls. With this technique, I was able to solve many hard problems such as finding the max element in a Binary Tree and looking for a path to the minimum value. Here is a program which I have wrote that handles one of those problems

def max_val(BT):
    if not BT:
        return 0
    else:
        left_max = max_val(BT.left)
        right_max = max_val(BT.right)
    return max(BT.data, left_max, right_max)

As we can see, it does not involved any list comprehensions or concatenation but it is a very straight forward program that solves a problem that would otherwise be very difficult.

Now to sum up this course. In general, I have greatly benefitted from the material in this course. Especially the fact that although there is no text-book, there is a common theme that runs across the entire course. I loved the material about recursion and how it can be applied to solve a wide range of problems. Since this is the first computer science course ever in my life, I am rather satisfied with what I am able to achieve throughout this course. Although there are times which I find the course difficult, and there are times which I have to spend hours trying to solve a problem. But in return I can say that I have acquired a solid foundation in computer science. Comparing with the other courses which I have taken so far, this area is certainly a breath of fresh air.

Well that is pretty much it, I am certain that everyone have enjoyed this course as much as I did and I hope everyone have a successful exam on Tuesday.

Sunday, December 1, 2013

Regular Entry Dec 1 2013

This week's material is rather interesting due to the fact that it concerns a little bit on the hardware part of computer science. First, We have learned MapReduce, which is the practically the essence of the different sorting algorithms we have learned so far. As I have said in the assigned topics, the essence of O(n*logn) algorithms lies in the fact that they employ a recursive structure. This allows the partition of the domain of a particular function and then simultaneously perform computation to different partitions. The net result will be assembled together and the final result will be returned.

What I have described above is practically a mini version of MapReduce. MapReduce is a frame work for computing parallelizable problems across a large number of CPUs. The entire frame work consist of two steps. The Map step uses the master node and takes in the input data. The input problem is then split into sub problems and assigned to different sub nodes. The worker node may perform another separation and so on. The second step is the reducing step. This step consists of the collection of answers from different sub nodes and the combination of these answers to these outputs which will eventually yield the answer to the problem it originally intended to solve. The general theme of this framework is clearly very important in computer science due to the fact that a large portion of this course is practically recursion. And recursion is essentially the same concept on a smaller scale. The beauty of this concept is that it allows distributed simple problems to be solved while they are altogether too complicated to solve. As we can see from the sorting algorithms, a method that splits the original input and recursively apply the function are usually more efficient.

Another thing which we have learned is memorization. This is completely a new concept to me since it is similar to caching. The whole idea is that it may improve all the recursive functions I have programed so far by making the programs not repeating the calculations which have been performed already. This can further save time in real time since no calculations will be repeated, which makes a recursive function even more efficient.

Assigned Topics

This particular post will specifically focus on selection algorithms. I have previously touched upon the topic in the post regarding the big-O notation. This post will focus more on the methods which these algorithms employ and compare the advantages of each.

To begin, let us talk about the well known bubble sort. This particular sorting algorithm is a standard but an inefficient one. Each pass through a list makes one less comparison comparing to the previous one. Due to this nature, the bubble sort algorithm can be implemented using recursion. However due to the sheer number of switches which the algorithm has to make in the worst possible case, i.e the case which the list is completely reversed. This method is extremely inefficient, since the number of switches increases in a quadratic sense with respect to the length of the list.

The selection sort is an improvement of the bubble sort. It improves the bubble sort by selecting out only the largest element in the input and place it at the proper position each time it goes through a list. The number of comparisons which the method makes is the same as the bubble sort, hence yielding O(n^2) efficiency. However, it is much more efficient in terms of the number of switches which it makes during each pass of the element, thus saving us time and energy during the application of this method.

Now, let us shift our perspective a bit. Other than passing linearly through a list, we employ a divide and conquer strategy. To demonstrate this strategy, we look at the method of quicksort. The quicksort algorithm yields O(nlogn) efficiency by partitioning each list in 2 with respect to a pivot value and then recursively apply the same procedure to the subsequent lists generated. The division of the lists is that all items that are smaller than the pivot value goes to the left and all the values that are greater goes to the right. Same method will be repeated for the subsequent lists until the length of the list becomes 1. The correctness of this algorithm can be proven using induction. This particular method is much more efficient comparing to the previous one. This is due to the recursive nature of the algorithm allows the computer to solve multiple problems at the same time. This is a common theme in terms of the efficiency of different sorting algorithms. When a specific method allows us to only put one item in order, we are usually faced with two options. One is to actually go through the entire list sorting one at a time. This will usually yield a O(n^2) efficiency. However, if we are able to divide the list up and use recursion to simultaneously solve the problem on different partitions of the list.

To have something that goes along with this theme, let us look at merge sort. Similar to the quick sort, this algorithm yield a O(n*log n) efficiency. This is due to the same reason as above, there are long n number of ways to split a list in smaller lists of length 2. This method works by completely breaking down a large list into smaller lists of size one. The lists will be sorted with the neighbouring lists, and recursively, bigger lists will be sorted according to the same method and finally yields the sorted master list. Again, together with last algorithm, the beauty of this particular method lies with the beauty of recursion. Bigger problems are being divided down into smaller lists so that simultaneous application of the function to different partitions of the domain is allowed.

Let us again shift our attention a bit and talk about count sort. Count sort takes a slightly different approach comparing to the bubble sort. I put this sorting algorithm together with bubble sort due to their non recursive nature. Comparing to the bubble sort or insertion sort, this method is highly efficient for a non-recursive function. This is because this entire method is based on the key space of the input list. The bucket list is first generated during the first pass through the list. This list contains a range of possible value that is contained within the list. I say range of possible values due to the fact that in case of integers, we might want to generate a list that has values from the smallest to the largest value of the input. In case of the elements are a deck of cards, we might want a list of all possible representations of the cards. The second step of this method is counting the occurrence of each element in the bucket-list. The third step, which consist of only one pass through the bucket list, is to rebuild the sorted list from the bucket list. Since the bucket list will be produced sorted by default and it contains the number of occurrences of each element, the resulting list will be sorted. Note that the beauty of this sort is that really requires one pass through a list and thus the efficiency is linear O(n + k) with n being the number of elements in the input list and k being the number of elements in the bucket list. One disadvantage of this method is that it requires the initial generation of the bucket list, which is entirely dependent on the size of the key space. If the key space is large, then great cost will come with the application of this algorithm.