CSC148H1: November 2013

Sunday, November 24, 2013

This week we have covered 3 different topics: equality, hiding attributes and reduce. The topic this week which particularly interests me was the build in reduce function. In a typical computer science exercise, we may be asked to find the maximum element within a nested list. With the build in function max(), this is a very easy exercise. However, many times in practice, we may not have tasks as nice as this. For instance, we maybe asked to pick, out of a nested list of string, the string that contains the most vowels. Although still simple, it is a more complicated task comparing to the maximum problem. And in this case, we may utilize the reduce function in other to carry out the task. The code for this task is posted is the following:

def most_vowels1(x,y):
    return max([[sum([x.count(i) for i in 'aeiou']),x], [sum([y.count(i) for i in 'aeiou']),y]])[1]
def most_vowels(L:"iterable"):
    return None if L == [] else reduce(most_vowels1, [most_vowels(i) if isinstance(i, list) else i for i in L])

In my personal perspective, I think using the reduce function in conjunction with recursion can truly yield elegant solutions to a wide range of problems. In the above code, recursion within a list comprehension was used in other to deal with nested lists. The most_vowels1 is a helper function which helps to find the string with the highest occurrence of vowels out of only two inputs. The most vowels function implements the reduce function and greatly simplifies the code due to the built in algorithm. This approach can usually simplify a problem due to the fact that what is needed to be performed and compared among ALL elements of a nested list can be broken down to what is needed to be performed to only 2 elements. This make the coding much easier and the recursive nature of the final input much easier to handle.
Another thing to note is that the approach above in solving the vowel problem is rather generic. Compare the above code with the following code which sums all elements within a list:

def my_add(x,y):
    return x+y
def my_sum1(L:"iterable"):
    return 0 if L == [] else reduce(my_add,[my_sum1(i) if isinstance(i,list)  else i for i in L])

Notice the rather remarkable similarity to the first block of code. Especially in the second function, the syntax is almost exactly the same. This emphasizes the generality of the reduce function and the fact that it may be used for any arbitrary function which defined with respect to two inputs of the same type and returns data with the same type. Therefore , in other to use the reduce function to the fullest extend, this is something good to keep in mind.

Monday, November 18, 2013

November 18 2013

Today I will be talking about last week's test since it was the only event that occurred that week. In general, I found it to be a fair test. Since all of the questions are of the same kind, the test turned out to be surprisingly short. I have achieved the desired results with what I have coded during the test. However, my methods are not the most efficient there is. For instance, in the last question, instead of directly creating the linked list, I created a helper function that gives out a list and the main function converts the list into a linked list. This method works well if the amount of data which the program is handling is small. However, if we have a large tree and that the amount of data we have to deal with is large, this program will have memory overflow error. This is due to the fact that although there is no upper limit in terms of the number of elements may be contained within each list, there is a max in the storage capacity of a memory slot. Therefore, with only a small or average sized tree, my method will work well, otherwise it will cause exceptions.

Tuesday, November 12, 2013

Since the university is closed on Monday and Tuesday, today's post will focus on last week's material.
Last week's lecture covered mostly on big-O analysis. One thing that I have learned is that although big-O analysis gives a rough idea about a specific algorithm's performance, it cannot alone determine an algorithm's efficiency. For instance, the selection sort and the bubble sort are both O(n^2) in terms of the number of comparisons made through the passes. But due to the nature of the bubble sort, many more exchanges occur during each pass comparing to the selection sort. Taking this into consideration, the selection sort is a much more efficient algorithm although it does not appear so from the big-O analysis. Another thing that I have learned is that when performing tasks such as searching through a list, many times the input format of the list does not affect the results of a big-O analysis. This means that the worst case efficiency for many cases is the same. For example, when searching through a unsorted list or searching through a sorted list, the worst case scenario for these two input formats are the same. In both cases, a pass through a list is required and n comparisons are required in total. Therefore both search would yield O(n). If the best cases are considered, the sorted input would be more efficient since if the item is smaller than the first item in the list no pass is required. Thus many times if want to evaluate the efficiency of an algorithm, not only that we need to consider the worst case, the best cases also need to be considered. In order to choose the ideal method, one must evaluate the form of input, not just at one instance in time but also over an entire period of time. The average of the form of the input may then be obtained and could be used to make this decision.
In terms of the assignment, I have found the matching part of the regular expression to be rather challenging. The concept of regular expression deals with the combinatorial explosion of possibilities when the length of the string is enlarged. During the assignment, I have found out that the regular expression of a string is by no means unique. Therefore, when matching an regular expression to a string , decomposing the target string itself may result in ambiguities depending on the method that is used to decompose the string. Thus a better approach would be to start from the regular expression itself and generate the possibilities of the potential matches. When it is finished, the target string can be searched within the generated list of possibilities. Although inefficient for long strings, this method always work since it decodes the regular expression itself. With this approach, I was able to math the basic regular expression with their corresponding targets. However, due to time constraints, I was unable to finish the entire 2nd part. If time permits in the future, I will try to finish the assignment and hopefully come up with a more efficient approach to this problem.

My friend in CSC148 have talked about the Big-Theta Notation on his blog which can supplement my info that I have provided. Here is the link to his blog. http://athameemcsc148.wordpress.com/

Sunday, November 3, 2013

Nov 3rd 2013

This week's material is an extension of the material from the previous week. Using the concept from the previous week, we have learned a new ADT: binary search tree. By implementing binary search tree ADT, searching through can be much more efficient. This is due to the fact that depending on the specific data value with respect to the root, certain parts of the tree can be ignored. While this seems to be rather trivial as a student, but its effectiveness can be certainly appreciated when it comes to large data sets such as trees that exist in operating systems.

One thing that I really enjoyed over the past week is the introduction of the big O notation. I have personally really enjoyed this material due to my Mathematics background and that it really gives me a sneak peak at the theoretical aspects of computer science. The big O notation is a concept that is widely applied throughout mathematics. In essence, it captures the limiting behaviour of a specific function.
Formally speaking: f (x) -> O ( g (x) ) as x -> infinity <=> lim x->infinity sup|f(x)/g(x)| < infinity
where the sup notation denotes the least upper bound of a specific set. and | | denotes not necessarily the absolute value but the norm of R1. From this more concrete definition, we can have a better understanding of why T1(n)=(0.3365n2 + 0.17n + 0.32) s and T2(n)=(0.47n2 + 0.08n) s are both O ( n^2). By taking the limit of Ti(n)/n^2 as x -> infinity i = 1, 2 , by L'Hospital's Rule, T1(n)/n^2=0.3365 and T2(n) = 0.47. Since both of which are real numbers, we can say that n^2 describes the limiting behaviour of both functions within a certain constant and therefore both algorithms' efficiency are O(n^2).
In terms of the assignment, I have encountered a challenge this week. The challenge is how to create a recursive data type from a know linear data type such as a string. The second assignment requires us to initialize the tree expression for a regular expression string. But the creation of a tree requires a data type that is recursive in nature. Therefore, the approach that I have decided to take is that first, recover the tree list representation of the regular expression string. secondly, use a recursive algorithm to link the data as nodes. As it turns out, by using a recursive algorithm to peel off the brackets in the string one by one, one can really recover the treelist representation of a regex. From there, it should be very straight forward to link the lists together using the exsting definitions of nodes provided by the course document.