An additional opportunity to be hopelessly wrong

The Wikipedia Entry on BST

leave a comment »

I am looking at the page of June 15 2008, see BST. I’ll comment on the python code there. I am not sufficiently familiar with wikis and the social norms surrounding them to feel comfortable to edit directly. I will put a link to this on the Talk page once I am done though.

The first most obvious point is that there is a going back and forth between talking about trees where there are only keys, and trees where the keys are (only) ways of finding values associated with these keys. For the code this makes not much difference, but reading it like this can be a bit confusing. Especially since the words key and value do not have a fixed meaning throughout the article.

The search_binary_tree function written there has a certain asymmetry in if/else use. The following seems cleaner to me (of course not essentially different).

def search_binary_tree(node, key):
     if node is None:
         return None  # key not found
     if key < node.key:
         return search_binary_tree(node.left, key)
     if key > node.key:
         return search_binary_tree(node.right, key)
     # key is equal to node key
     return node.value

The next function binaray_tree_insert does not correspond to the explanation. In the explanation of BST different nodes with the same key are allowed. In the implementation given if you give a new value associated to a key, the old value will just be overwritten. Since the right subtree is to contain the values associated to greater or equal keys, the following would be more correct. Note that all that was needed was to remove two lines of code.

def binary_tree_insert(node, key, value):
     if node is None:
         return TreeNode(None, key, value, None)
     if key < node.key:
         return TreeNode(binary_tree_insert(node.left, key, value), node.key, node.value, node.right)
         return TreeNode(node.left, node.key, node.value, binary_tree_insert(node.right, key, value))

Then we get to deletion. Here in the python code suddenly the structure is enriched with a parent pointer. Also there are some print statements that are out of place. The parent pointer always needs to point to a real node there are many uses of self.parent.leftChild (rightChild) without checking if parent != None (which I think would be the reasonable choice for the root).

In findSuccessor the indentation of the return statement is wrong. More interesting is the interpretation by this code of successor in the case this item is the largest in the tree. Using the notation for trees in the post on Implementing a BST run this to the tree (1()(2()(3()(4()(5()()))))) while at the node corresponding to 5. Then the code iterates up the tree to fail at the root.

The function spliceOut only needs a correct interpretation. I am unfamiliar with the term “splice out” so this might very well be the obvious meaning. What it does is remove self from the tree. Its precondition is that self.leftChild == None or self.rightChild == None. One of the two needs to be trivial, or one of the subtrees will be removed while it might contain many nodes. Also note that the elif is equivalent to else.

For the function binary_tree_delete the first observation is that when it tries to call itself (I think that is the intention) it calls a member function delete_key instead. And then we see that the functions above are only called on input on which they work. spliceOut is only called on nodes found by findSuccessor which when it works finds nodes satisfying the precondition, and the function findSuccessor is only called on nodes having both children, on which it works fine (after fixing the indent on the return statement).

Seems to me in example code like this it might be better to avoid such “delicate” functions, it all works, but the preconditions are never mentioned. (it was fun figuring it out though)

The next bit of code I worked on understanding was the “Example of a Binary Search Tree in Python”. I do not understand what self.l is supposed to denote. It occurs twice in the code, once in __init__, one in add. The add function returns -1 if the key already exists (and in the code key and value are used as equivalents), this does not correspond to the idea in the intro that multiple nodes with the same key can exist.

Why is this not implemented as a recursive function? Same for search, also looks correct, but seems to be more naturally implemented as a recursive function.

Why does proceed have three arguments and only uses one? Also I do not have a good idea of what it is intending to do, but with my current understanding it requires delValue to not have both subtrees nontrivial. This is not true.

Lets run it on (12()(14(13)(15))), according to my understanding this deleteNode (14) will fail.

root.value != 14 so we enter the while True loop with self.p being the root. self.p.value < 14 then self.p.rchild = 14, so we execute self.p.rchild = self.proceed(self.p, self.p.rchild). So this call is equivalent (writing the trees as strings) to proceed( (12()(14(13)(15))), (12()(14(13)(15))), (14(13)(15))), and proceed will return the right subtree, which is (15). So the resultant tree is (12()(15)). Not the right result.

About these ads

Written by kasterma

June 15, 2008 at 6:45 pm

Posted in Uncategorized

Tagged with

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.