Tuesday, August 2, 2011

Weasel

Dawkins' weasel is a toy model of evolution in which the goal is to write an algorithm that evolves the phrase "METHINKS IT IS LIKE A WEASEL" out of a random string of 28 characters via random mutation and non-random selection. The one constraint on the algorithm is that you're not allowed to "lock in" characters; i.e. if one of your strings happens to contain the right character in the right position, you can't exclude that character from the possibility of mutating.

I wrote a weasel program in R, in which the "evolution" proceeds as follows: you draw a random string of 28 characters; the string "breeds" n "offspring," but each character in each of the offspring strings has a probability m of mutating into any character of the alphabet; each mutated string gets a fitness score which is simply the number of same characters in the same position as in the target string; (one of the) strings with the highest fitness survives and breeds in the next generation while the rest are erased.

Below is a sample run of an R-weasel with the number of offspring = 100 an mutation rate = 0.05 (the number to the right of each string is its fitness score):

1 L Z F Z N Y O F N O Z E B I X N P X H B O P U M L A 1
2 L Z F Z N Y O F N O Z E B I N P X H K O P U I L A 2
3 L Z F Z N Y O F N O D E B I N P X H K O P U I L L 3
4 L Z F Z N Y O F N O D E B I L P X H K O P U I L L 4
5 L Z T Z N Y O F S O D E B I L P X H K O P U I L L 5
6 L Z T H N Y O F S O D E B I L P X H K O P U I L L 6
7 L Z T H N Y O F S O D E B I L P X H K O P U I L L 6
8 L Z T H N Y O F S O D E I I L P X H K O P U I L L 7
9 L Z T H N Y O F S O D E I S L P X H K O P U I L L 8
10 L E T H N Y O F S O D E I S L P X H K O P U I H L 9
11 L E T H N N O F S O Q E I S L P X H K O P U I H L 10
12 L E T H N N O S S O Q E I S L P X H K O P U I H L 11
13 L E T H N N O S S O Q E I S L P K I K O P U I H L 12
14 L E T H N N O S S O Q E I S L P K I A P U I H L 13
15 L E T H N N O S S A T E I S L P K I A P U I H L 14
16 L E T H N N O S S A T E I S L P K I A P U M E L 15
17 L E T H N N O S S A T X I S L P K I W P U M E L 16
18 L E T H N N O S S A T X I S L P K I W P U M E L 16
19 L E T H N N O S U A T X I S L P K I W P U M E L 16
20 L E T H N N O S U A T X I S L P K A W P U M E L 17
21 L E T H N N O S U B T X I S L P K A W P U M E L 17
22 L E T H I N O S U B T X I S L P K A W P U M E L 18
23 L E T H I N O S U N T X I S L P K A W P U M E L 18
24 L E T H I N O S U N T G I S L P K A W P U M E L 18
25 M E T H I N O S U N T G I S L P K A W P U M E L 19
26 M E T H I N K S U N T G I S L P K A W P U M E L 20
27 M E T H I N K S U I T G I S L P K A W P U M E L 21
28 M E T H I N K S I T G I S L P K A W P U M E L 22
29 M E T H I N K S I T U I S L I K A W P U G E L 23
30 M E T H I N K S I T U I S L I K E A W P U G E L 24
31 M E T H I N K S I T U I S L I K E A W P U G E L 24
32 M E T H I N K S I T U I S L I K E A W P U G E L 24
33 M E T H I N K S I T U I S L I K E A W P U G E L 24
34 M E T H I N K S I T U I S L I K E A W P U G E L 24
35 M E T H I N K S I T U I S L I K E A W P U G E L 24
36 M E T H I N K S I T U I S L I K E A W P U G E L 24
37 M E T H I N K S I T I S L I K E A W P U G E L 25
38 M E T H I N K S I T I S L I K E A W I U G E L 25
39 M E T H I N K S I T I S L I K E A W M A G E L 26
40 M E T H I N K S I T I S L I K E A W E A G E L 27
41 M E T H I N K S I T I S L I K E A W E A G E L 27
42 M E T H I N K S I T I S L I K E A W E A G E L 27
43 M E T H I N K S I T I S L I K E A W E A G E L 27
44 M E T H I N K S I T I S L I K E A W E A G E L 27
45 M E T H I N K S I T I S L I K E A W E A G E L 27
46 M E T H I N K S I T I S L I K E A W E A G E L 27
47 M E T H I N K S I T I S L I K E A W E A G E L 27
48 M E T H I N K S I T I S L I K E A W E A G E L 27
49 M E T H I N K S I T I S L I K E A W E A G E L 27
50 M E T H I N K S I T I S L I K E A W E A G E L 27
51 M E T H I N K S I T I S L I K E A W E A G E L 27
52 M E T H I N K S I T I S L I K E A W E A G E L 27
53 M E T H I N K S I T I S L I K E A W E A G E L 27
54 M E T H I N K S I T I S L I K E A W E A G E L 27
55 M E T H I N K S I T I S L I K E A W E A S E L 28
user system elapsed
0.09 0.00 0.10

This is not a typical run; it's on the shorter side (for these parameters the median length is something like 70). Note how fast it converges though; R can be quite fast f you do things in vectors and matrices rather than loops.

The next step is to allow the strings to mate and swap their genes.

R code for the weasel is below the fold.




# A Version of Dawkins' Weasel in R
# (c) Przemyslaw Nowaczyk 2011

score <- function(x,y) {sum(x==y)}

weasel <- function(phrase,no.kids,mutation.rate) {
  alphabet <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N",
                "O","P","Q","R","S","T","U","V","W","X","Y","Z"," ")
  split.phrase <- unlist(strsplit(phrase,""))
  new.phrase   <- sample(alphabet,size=nchar(phrase),replace=TRUE)
  distance     <- score(new.phrase,split.phrase)
  generation   <- 0

  while (distance < length(split.phrase)) {
    m.newph     <- as.matrix(new.phrase)
    offspring   <- m.newph[,rep(1,no.kids)]
    mutant.flag <- mat.or.vec(nrow(offspring),ncol(offspring))
    mutant.flag <- sample(c(0,1),prob=c((1-mutation.rate),mutation.rate),
                          replace=TRUE,size=(nrow(offspring)*ncol(offspring)))
    offspring[mutant.flag==1] <- 
      sample(alphabet,size=length(mutant.flag[mutant.flag==1]),replace=TRUE)
    scores <- apply(offspring,2,score,y=split.phrase)
    new.phrase <- offspring[,which(scores==max(scores))[1]]
    distance   <- score(new.phrase,split.phrase)
    generation <- generation + 1
    cat(generation,new.phrase,distance,"\n")
  }
}

# sample run with timing
system.time(weasel(phrase="METHINKS IT IS LIKE A WEASEL",no.kids=100,
                   mutation.rate=0.05))
Created by Pretty R at inside-R.org

No comments:

Post a Comment