Blaginations

An additional opportunity to be hopelessly wrong

php functions to help you get out of unicode hell.

leave a comment »

function bpad ($instr) {
  return substr("000000", 0, 8 - strlen($instr)) . $instr;
}

function dpad ($instr) {
  return substr("      ", 0, 3 - strlen($instr)) . $instr;
}

function show($instr) {
  for ($i = 0; $i < strlen($instr); $i++) {
    $let = $instr[$i];                                                          
    $no = ord($let);
    echo $i . " : " . $let . " / " . dpad($no) . " / " . bpad(decbin($no)) . " / " . dechex($no).  "\n";
  }
}

Here is an example of use

php > $var = "T" . pack("cc", 195, 171) . "st";
php > show($var);
0 : T /  84 / 01010100 / 54
1 : � / 195 / 11000011 / c3
2 : � / 171 / 10101011 / ab
3 : s / 115 / 01110011 / 73
4 : t / 116 / 01110100 / 74
php > show (utf8_decode($var));
0 : T /  84 / 01010100 / 54
1 : � / 235 / 11101011 / eb
2 : s / 115 / 01110011 / 73
3 : t / 116 / 01110100 / 74

Written by kasterma

September 12, 2011 at 3:30 pm

Posted in Uncategorized

Computing the height of a Tkinter.Label on given text.

leave a comment »

This is a first solution of code that does this. It doesn’t yet take care of the fact a Label takes whitespace into account, but it does look like a good start. The real work is in lin.count (), the other code is just enough to exercise lin.count ().

Read the rest of this entry »

Written by kasterma

September 6, 2011 at 4:08 pm

Posted in Uncategorized

Some fun with the Stanford GraphBase.

leave a comment »

Fascinating conclusion, in this graph of 5 letter English words connecting two when they differ by a single letter, even though it only contains 5757/26^5 = 4.8 e-4 of the complete list of words there is a component of size 4493. (the second largest is size 24, and the maximum degree is 25).

Read the rest of this entry »

Written by kasterma

September 1, 2011 at 7:30 pm

Posted in Uncategorized

AIDE on Ubuntu.

leave a comment »

I had a link to a linux.com weekend project lying around for quite a while. I finally decided to play some with aide (advanced intrusion detection environment). I set up a config file as suggested in the article, but then got stuck on an error that was not immediately very descriptive to me. After some searching I found the thread Aide will not work on the ubuntu forums. It didn’t give me the immediate solution, but did make me think in the right direction. I had forgotten to add database file specifications to the configuration file (see my post in the tread mentioned above). Now aide is doing its first run, and I can work on figuring out what the “best” configuration is for me over the coming days (I should at least get a good view of which parts of my system change regularly and which do not).

UPDATE: when running aide my system became rather slow, so I would recommend running it with nice: sudo nice -n 20 aide –init –config=aide.kasterma.conf

Written by kasterma

August 2, 2011 at 11:19 am

Posted in Uncategorized

My computer lying to me.

leave a comment »

The following might show that I don’t use windows much, but it just took me a while to figure out it was lying to me.

total lie

Total LIE!

The location is in fact C:\Users\bart\Documents\

Written by kasterma

June 3, 2011 at 6:47 am

Posted in Uncategorized

Why the graph is linear.

leave a comment »

In my previous post I indicated that I didn’t quite know why the graph was so linear as I expected something to happen when pages had to be swapped more. I think I have an answer, the linux scheduler (a description) schedules threads for a certain time quantum. According to this question on a “random” webpage that quantum is about 100ms. A processor running at about 2.7 GHz, and a page size of 4 KB, has about 72,000 ticks to deal with a byte on a memory page if it is to finish the page in its quantum … that seems plenty. I.e., from this it seems that in swapping out, and then later swapping in a thread it is unlikely to need to same page; and the linearity really is thread administration overhead.

Written by kasterma

May 11, 2011 at 12:28 pm

Posted in Uncategorized

Parallelizing merge sort doesn’t help, or does it?

leave a comment »

Merge sort not being a computationally heavy algorithm (lots of moving of data, but not computing much with it) you might expect not much improvement in parallelizing it. I did the experiment and found that on my 8 core processor dividing the work into 4 to 8 threads does speed things up considerably (about a factor 3). I believe this to be the case because it allows for more efficient use of the memory bus and the cache.

Then adding more and more threads leads to significant slowdown.

Thread Data

The linear slowdown as shown in the graph continues at least up to 750 threads. I was expecting an additional slowdown (an increase in slope of the line) when there were so many threads their pages of memory would get swapped out in between them being active (4 KB per page, 8192 KB cache size, at least 5 pages per thread [for the 3 different parts of the array it is sorting, code, and stack], means this would happen before 410 threads). I am not quite sure yet why that does not happen.

Code is again available in my github repository.

Written by kasterma

May 11, 2011 at 11:58 am

Posted in Uncategorized

Timing n VS n*log(n).

with 2 comments

Standard merge sort is O(n*log(n)), just playing around with some code I did some timing up to the size that all work sorting an array of integers fits in memory. When I drew the results initially I was convinced I had made a mistake, but this really was just b/c I had never quite realized how “linear” n * log(n) looks at these sizes. Here is the graph (red dots are the measured time, blue is a line fitted with the next to last measured value, green is n * log(n) fitted with the next to last measured value).

Merge timing data graphed.

Note that I used to next to last value measured since the last value is a bit of an outlier (I was watching the system monitor, and for some reason the process switched to another core). All code is available in the github repository.

Written by kasterma

May 10, 2011 at 4:11 pm

Posted in Uncategorized

Using TRAMP on Windows; first steps.

leave a comment »

I decided to give TRAMP a spin on windows. In order to get it initially working I needed to take the following steps:

  • download plink.
  • add the location of plink to the windows path (on the page how to set the windows path there is a description, that didn’t quite work for me but got me close enough).

After doing this I now have a local emacs editing a remote file. It remains to be seen if it is quick enough (although I am connected over a very fast network, so I do not expect problems). The big advantage is that the local emacs has much better default color scheme than the default of emacs over putty.

Written by kasterma

May 9, 2011 at 12:22 pm

Posted in Uncategorized

Wolfram|Alpha has an API

leave a comment »

The other day I was playing around a little with Wolfram|Alpha and found that it has an API. I was looking at population histograms at the time, so my first use of it was to get a bit more of a picture of which countries were had a histogram very different from the one in the Netherlands (where I took as a marker the fact of whether the ago group 0-4 was the largest of the population). The result can be seen in the image Google Maps Representation (clearly google maps misinterpreted at least two country names and drew the markers for those in the US).

The code for getting the string ‘Swaziland|Liberia|RepublicCongo|Lesotho|Tajikistan|Mauritania|EastTimor|Togo|T\
onga|FrenchGuiana|Libya|Panama|CentralAfricanRepublic|EquatorialGuinea|GuineaBi\
ssau|Belize|SaintVincentGrenadines|Namibia|Micronesia|PapuaNewGuinea|Djibouti|S\
aoTomePrincipe|Reunion|Nicaragua|SolomonIslands|Vanuatu|Israel|Jordan|Gambia|Er\
itrea|WesternSahara|Honduras|Botswana|Paraguay|Comoros’ consisting of all countries where the 0-4 age group is the biggest is mostly given below. I am not giving all of it b/c I think I need to keep my key private, and I don’t want to be the cause of tops of queries to their API (if you want you can come up with your own complete list of countrys; hint Wolfram|Alpha).

First the script that queries Wolfram|Alpha and saves the info to a file.

""" wa.py

Setting up some basic code to get queries answered by Wolfram|Alpha.

Bart Kastermans, www.bartk.nl
"""

import time
import pickle
import urllib
import BeautifulSoup

APPID="GET-YOUR-OWN"
URL="http://api.wolframalpha.com/v2/query?input=%(query)s&appid=" + \
    APPID  + \
    "&podstate=AgeDistributionGrid:AgeDistributionData__Show details"

ALL_COUNTRIES = ["China", "India", "United States", "Indonesia", "Brazil", "Pakistan", "Bangladesh", "Nigeria", "Russia", "Japan", "Mexico", "Philippines", "Vietnam", "Ethiopia", "Germany", "Egypt", "Turkey", "Iran", "Thailand", "France", "Democratic Republic of the Congo", "United Kingdom", "Italy", "Myanmar", "South Africa"]

for country in ALL_COUNTRIES:
    query = urllib.quote ("age distribution " + country)
    url_query = URL % {'query':query}

    print "current query:", url_query

    page = urllib.urlopen (url_query)
    cont = page.readlines ()

    # extract the info we want (apologies for magic constants)
    cont_parsed = BeautifulSoup.BeautifulStoneSoup (''.join (cont))
    data = cont_parsed.findAll (scanner='Data')
    if len (data) > 1:
        result = cont_parsed.findAll (scanner='Data')[1].plaintext
    else:
        result = None

    countryinfo [country] = result

    print "current result:"  # give some feedback something is happening
    print result
    print "Just got information for country:", country

    time.sleep (5)   # spread the queries over time

# save the information to file
COUNTRY_FILE = open ("country_info.txt", "w")
PICKLER = pickle.Pickler (COUNTRY_FILE)
PICKLER.dump (countryinfo)
COUNTRY_FILE.close ()

Then the script that does the parsing


import pickle

def convert (x):
    """ anything that does not convert is set to 0 """
    try:
        return int (x)
    except:
        return 0

def parse_info (inp):
    """ certainly presumes the input is of right form """
    example2 = str(inp).split ("\n")
    example3 = example2 [1:-2]
    example4 = map (lambda x: x.split ("|"), example3)
    example5 = map (lambda x: [x[0], x[3].split (" ")[1]], example4)
    example6 = map (lambda x: [x[0], convert (x[1])], example5)
    return example6

COUNTRY_FILE = open ("country_info.txt")
UNPICKLER = pickle.Unpickler (COUNTRY_FILE)
country_info = UNPICKLER.load ()
COUNTRY_FILE.close ()

# remove coutries with no info, "parse" the ones with
for country in country_info.keys ():
    if country_info [country] == None:
        del country_info [country]
    else:
        country_info [country] = parse_info (country_info [country])

def country_color (country):
    """ green means the 0-4 group is largest, red otherwise """
    val_least_key = country [0][1]
    color = "green"

    for item in country:
        if item [1] > val_least_key:
            color = "red"

    return color

# assign colors
colors = {}
for country in country_info.keys ():
    colors [country] = country_color (country_info [country])

# create the strings to paste into google staticmap
red = []
green = []
for country in colors.keys ():
    if colors [country] == 'red':
        red.append (country)
    else:
        green.append (country)

red_str = "|".join (red)
green_str = "|".join (green)

Written by kasterma

April 8, 2011 at 8:56 am

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.