the slog of angela

Tuesday 1 April 2014

Week 11 - Sorting and Efficiency

The idea of "sorting" as a commonly-mentioned and ever-evolving topic in computer science has always struck me as interesting. What's interesting about it is, unlike some functions and abstract data types, we encounter it in daily life. We sort things all the time. But how would you explain to someone exactly how you came up with a sorted set of data? That's harder to do.

There are actually many ways to explain it because there are tons of methods. The reason there are so many algorithms is that the best run-time is always preferred. What's important to understand when comparing run-time analyses is that we do not really care about the run-time for smaller lists of items. We want to see how each algorithm can perform as the number of items approaches infinity. Because of this, when looking at the run-time equation for algorithms, what matters is the degree of the highest exponent. Constants are not much of a concern. Thus, we use "big-O" notation in front of functions.

One interesting thing about sorting that I didn't quite grasp at first is that some algorithms don't adhere to a worst-case or best-case condition and some do. For example, selection sort will always have n^2 run-time. This is because selection sort looks through the entire remaining list for the biggest number and then brings that number to the back of the unsorted part each time it passes. It looks like [unsorted | sorted] as it moves along. So, even if you have a list of, say, [1, 2, 3, 4, 5, 6], selection sort will run through the list and find 6 as biggest, then run through the remaining five items and find 5 as biggest, and so on. Even though the list is already sorted, and you'd think that would be beneficial, selection sort still takes the same amount of time to ensure a list is sorted as it would if it were unsorted.

Insertion sort, on the other hand, will have run-time that grows proportionally to the size of the list if the list is sorted. Insertion works by keeping a sorted sub-list at the beginning of the list, unlike selection sort. The list looks like this as it moves along: [sorted | unsorted] with the sorted section growing and the unsorted section dwindling with time. It takes each first new item that comes after the sorted list and places it where it belongs in the sub-list. When insertion sees [1, 2, 3, 4, 5, 6], then, it looks at 1, sees that it is in place, then it looks at 2 and sees that it is in place, then it looks at 3, and so on. So it just goes through checking each item. This O(n) run-time is the best-case scenario, while O(n^2) is its worst time, occurring when the list is unsorted.

Learning about sorting and efficiency also revealed to me why binary search trees are so useful. These trees have O(log(n)) run-time in the best-case scenario. This is better than constant or exponential run-time, and it occurs because the data is divided into two parts based on the criteria of whether it's larger or smaller than the root. At first, I did not really understand why BSTS have O(n) run-time in the worst case. It then became apparent to me that this run-time occurs with a unary tree, when there is only one main branch. This makes sense since the tree basically resembles a linear list.

Sunday 16 March 2014

Week 9 Post - Regexes and Assignment Two

Working on the second part of Assignment Two is frustrating.

For all four functions, coding for regular expressions of length 1 or length 2 is extremely easy to do. They don't have any permutations, they can only be one specific thing to be a valid regex, they're easy to match with strings, and building a tree from them is so easy it's silly.

Unfortunately, dealing with bar trees and dot trees is much more difficult. And, of course, it's recursive! Yay! It's not even understanding the recursion that gets me, though, because I can understand why it would be recursive and how the function needs to go over each child in the regex. It's structuring the code to make it recursive that gets me.

For example, part of the permutations function requires swapping values. This can be done by using:
j = s[i]
s[i] == s[i + 2]
s[i + 2] == j
Which is totally fine for regexes like '(1|0)' or '(0.e)', but then if you have a regex like '(1*|2*)' it doesn't really work because the indexes get all funny. Or maybe it's not even about indexing. Maybe if is_regex returns True for the child, say, 1*, then the new string becomes 1* and that is swapped with 2*. The base would be that if len(s) == 1 or 2 then it doesn't swap itself in any way.

Huh.

Another thing that I think would really, really, really help me for a lot of these functions is having a way to find the root of a regex. With a regex like '(((0|1)|(0.1)).(1*))' I can tell that the highest root is the '.' beside the (1*) child. How to tell my code this is something I'm unsure of. One way would be to use the brackets as an indicator, since the dot is only contained within the outermost set of brackets.

Anyway, I'm going to be spending the next few days at the CSC Help Centre and office hours. Fun all around.

Saturday 8 March 2014

I've got the Exercsie 3 blues...

Exercise 3 Part A is understandable. I get what I'm supposed to do, which essential is the reverse of inorder and preorder. Unfortunately, I am at a road block with how to go about forming a tree.

It is obvious that the first piece of information to gather about the tree is the root. This can be done by getting the item at index 0 of preorder. So that should be the first element of the tree list. Next, you can find out which items are in the left subtree and which are in the right by finding that root's position in the inorder list.

For example, given preorder [10, 6, 8, 12, 15, 11] and inorder [8, 6, 12, 10, 11, 15] I can see that the root is 10 thanks to preorder. In inorder, it is clear that the left tree has six as the root with 8 and 12 as children, and the right tree has 15 as root with 11 as child. Thus, the resulting Tree List would be:

[10, [6, [8, None, None], [12, None, None]], [15, [11, None, None], [None, None, None]]]

The order of numbers in this tree list is exactly the same order as preorder gives them in. What is challenging is putting these numbers in tree list format, because a tree list's 2nd and 3rd elements are tree lists themselves unless they are none. What about a tree list with tons of children on one side? Something like [3, 8, 6, 12, 4, 10, 11, 15]? I am unsure of how to put that into a tree list format.

I feel like this exercise is going to take me as much time as the 2nd assignment, which is saddening.

Thursday 27 February 2014

Week 7 - Recursion

Recursion is a confusing little thing but once you know how to use it its kind of beautiful. Here I will highlight a few of my favourite examples.

The first is simple. A function called 'sum_list' might say this:

return sum([sum_list(x) if isinstance(x, list) else x for x in L]) where L is the list or object.

When I first looked at this, I was perpetually confused and thought it was the stupidest thing on earth. The way I read it in my mind, was "if x is a list then return the sum of the sum_list" which made absolutely no sense at first because wasn't sum_list what we were trying to write in the first place? How was this function going to take the sum_list of something if sum_list was the function being created?
After some time I came to realize that the essential part of this function lies in the else statement. If x is a list, then sum_list will repeat itself, but now the items in the list are being looked at individually. So if the items inside the list isn't a list, sum_list will only look at the item and will add it to the sum. It's wonderful.

Another example of recursion is as follows (excuse the poor stylistic elements):
def remove3s(L: list):
I = 0
while I < len(L):
   if isinstance(L[I], int):
      if L[I] == 3:
         del L[I]
         continue
elif isinstance(L[I], list):
         remove3s(L[I])
I = I + 1

This function strives to remove all 3s from a list L. It takes into consideration the possibility that L[I] might be another nested list, so it calls itself if that's the case. When it calls itself, it takes the nested list as the new L and goes through each element in the nested list.

Recursion is also useful in the class Tree:
def leaf_count:
if len(t.children) == 0:
   return 1
else:
   return sum(leaf.count(c) for c in t.children)

Something is a leaf of a Tree if it has no children. In Trees, though, it's not just as simple as there being parents and children, because children can have children who can have more children. In this case, because Trees multiply in such a repetitive way, and each Tree contains more Trees, recursion is necessary. This method says that if a tree has no children, return 1 because this means there is one leaf. However, if a tree does have children, these children might be leaves themselves so the function has to go back and also see if the children have any children.

And that pretty much sums up my little blurb on recursion!

Tuesday 18 February 2014

Reading Week

It's Reading Week (finally) so I have begun to study for the upcoming term test.

So far I have gone over the first three weeks of lectures, including articles and python examples. Mostly, my reaction has been, "Wow, why didn't I go over this sooner? Maybe then that exercise or that lab would not have seemed so perplexing". Basically, I did not practice enough when we were actually learning this stuff for the first time. Oops.

Some interesting things I have learned that probably everyone else already realized (don't laugh at me):
- creating a __str__ method in a class means you can call str(instance) and print(instance) and they will actually return what you want them to
- Zip creates tuples of pairs that are at the same index in their respective groups (I feel like this method would have been useful in 108 instead of all those 'if s[i] == t[i]...')
- When creating init methods for subclasses, you actually have to say 'superclass.__init__(self) to get it to refer to the superclass. It won't automatically do this for you.

Things I am still confused about:
- unittest. I looked at the Stack Tester example and noticed it's a little different from the unittest file my partner and I created in a lab a couple weeks ago.
- What it is that makes attributes of a class changeable by users and what makes them immutable
- How __repr__ method calls itself recursively in a lot of the classes we created in class

I'm having one of those "the more I study the less I feel like I know" kind of days. Nevertheless, I plan on solving these issues by exploring the info on the python website and trying things out in Wing. Also, I'll probably head over to office hours when I'm back on campus.

Saturday 8 February 2014

Week 5 Post

This week we got some hints and explanations with regards to Assignment 1. We saw how the Tour of Hanoi game actually works when you execute it and how many steps you really need to get the task done. It was a big help because I would probably have been lost without that visual aid.
The lab was all about recursion. It was good in the sense that I was able to learn how to actually call functions within themselves. We were pretty much able to complete the lab on time too, which was a bonus.

Week 4 post

This week's lab was alright. Besides the really tedious math involved, it required use of subclasses and exceptions. I was happy about this because I remembered subclasses from 108, so there wasn't anything too new for me there.

In class, we learned about recursion. At first I thought I got the list-comprehension part. The example given, which was about finding the sum total of nested lists, seemed easy enough to understand. The only part that really gave me a hard time was the idea that the nested lists all evaluate in one step.

Later, though, I realized that recursion also involves the function calling itself, and that really confuses me still. I'm going to go over the examples from class again to understand it.