Saturday, January 4, 2014

Introduction to advanced Norn genealogy using Social Network Analysis (SNA)

This post will be a quick taste of things to come.

I will show you where to look for Norn genealogy information in the game files, how to extract it using python, and then use it with more powerful tools than the built-in "family tree" of the original games to fully map out all of your Norn population.

We will only be scratching the surface of all those vast topics here, and each aspect of it will be the topic for a dedicated article further down the road. For now,I'm trying to get as much new and varied information out as I can. I hope this will bring some new topics, tools and knowledge to the Creatures community.

Looking for family information

All relevant information is stored in the "Genetics" folder inside the game folder ( this is true for C1 C2 & C3/DS ).
This folder contains a ".gen" file for every single Norn that ever lived or passed through your world.
Each .gen file matches a given Norn 1:1.
The name of this file is 4 characters in C1 and C2, and more in the subsequent games, and it is called the Norns "Moniker".

You can match ingame creatures to monikers thanks to the genetics page in the various games :

Finding a moniker in C1
Finding a moniker in C2

Finding a moniker in C3/DS

Each .gen file contains the whole genome for any given Creature. ( Note that those files are never removed, so nothing prevents you from re-hatching any of your favorite long departed Norns from their genome files ! We will cover that voodoo in an upcoming post in greater detail )

Among a creatures genes, the first gene ( called the header gene ) contains information about it's parents :

This information is easily extracted from the .gen files, as the mother and father names are always at a fixed offset from the beginning of the file.

The parents information inside a .gen file, always found at the same offset

All you have to do to redraw the full family tree of your game, is to scan all the files in the .gen folder , and extract all creatures moniker,father, and mother information.

Matching monikers to Norn names

For now you'd be left with the structure of the tree, but all your nodes would be unreadable monikers you could hardly match to your Norn's names.

So where do we find the moniker->Name information programmatically ?

This will vary slightly among the games:

In C1, the information is available from the "The Register" file in the game directory

The file is a bunch of entries containing Norns monikers, names and parent monikers.
( Note that here monikers are expressed as the hexadecimal codes for the corresponding characters, you will have to translate them to 4 characters ASCII monikers. Here for example 464f5835=>FOX5 . If you want to convert that easily you can try any online converter , but I'll rather assume that since you're reading this blog you're more programmaticaly inclined.)

In C2 things are easier, the information is found in the "History\GameLog" file inside the game folder:

The file contains the information in a much more practical format, one entry per line, fields separated by "|" which makes for easier parsing.( note that if you use the updated C2 version with the world switcher, you might need to parse all GameLog files for all your different worlds )

The first field is the moniker, and the last one is the Creatures Name, if the field begins with a "N" in front of it ( as in _N_amed creature ).
Other entries are not relevant to our current business, but a full explanation is available here.

In C3 the format is much more involved and will be further covered in an upcoming reversing article.

Extracting useful information

Now we know where to look for the information, we will need a practical way of exploiting it.
Here is a quick python snippet that will parse all .gen files from a creatures 2 game, and then generate two text files: one containing all moniker->Name matches, and one describing all family links between those  creatures ( why we make these will become apparent in the next section ).

Put the following code in a .py file , throw it in your C2 "Genetics" directory, and  run it, the two files ( Norns.txt and links.txt ) will appear in the same directory as your script.

import os
import re

#Change this depending on your game's path:
hist=open("C:\Program Files\Creatures 2\History\GameLog").readlines()


#Change this too depending on your games's path:
os.chdir("C:\Program Files\Creatures 2\Genetics")


# For each file in the directory :
for fil in files:
    # We're only interested in .gen files
    if ".gen" in fil:
        me=fil[0:4]     # The filename is the Norn's Moniker
        mum=cnts[20:24] # Mum moniker offset in a C2 gen file
        dad=cnts[16:20] # Dad moniker offset in a C2 gen file

        # Now add real name to our moniker list
        for lig in hist:
            if me in lig :
                name= lig.split("|")[2]
                if name[0]== "N":

for norn in Norns:
        if norn.has_key("name"):
                #print "Writing:" + norn["moniker"]+","+norn["name"]
                outnorns.write(norn["moniker"]+"," + "\n")
                #print "Writing:" + norn["moniker"] + ","

Note that this is just a very crude script only meant to illustrate how you can quickly extract interesting information from the game files, to build your own Albian exploration tools.
A more robust and usable version for anybody is still in development and will be released soon.

Advanced visualisation tools

Here comes the fun part.

We will visualize the extracted information using Gephi , which is an open source dynamic graph visualisation software.
This tools allows us to use Social Network Analysis  algorithms to map, sort and overall make sense of that bunch of data in a pretty way.
Basically, not only will it show a pretty and dynamic graph of all our Norn population, but it will also perform automatic tasks on it such as coloring "clusters" of densely related Norns the same colour ( so you can outright identify well mixed in or isolated Norn Families in your graph), or visualising the family tree organised by genetical proximity etc...

Install Gephi, ( and also Java if you don't have it yet. Yes, I know...sorry about that... ), then run it and choose "new project".

Go to the "Data Lab" page, and click "Import spreadsheet".

Pick the Norns.txt file, be sure to import that as a "Node table", this describes all the nodes in the graph.

Then clik next and finish.
Once again, clik "import spreadsheet" , this time importing links.txt , be sure to import it as an "Edge table" this time.This is your data on interconnection between precendently set nodes.

Click next and finish.

You're back at the Data lab page.
Choose the "Fill column with a value" button at the bottom of the page and fill the "Weight" column with the value 1.

You can now go back on the "Overview" main page.
Click the "T" icon on the bottom of the screen to "show labels". you should see a bunch of nodes in a mess now drawn :

A basic representation of a short Wolfing run.Note how the Ettin and Grendel branches don't link to the main Norns genetic pool. This will change if you crossbreed your creatures.

Your graph won't look that neat at first , you will have to mess with the "spatialisation" setup in the left column ( refer to additional resources on the bottom of this article for more in depth tutorials about using Gephi ), the "Atlas" 1 & 2 is a good thing to try first, followed by Fruchterman Ringold ( those are all spatialisation algorithms, organising your data layout depending on different parameters.More detailled tutorials coming soon depending on feedback ).

Pretty isnt' it ?
Note that when your mouse cursor gets over a node ( Norn ) , it will highlight it and it's neighbors, making it easy to track who is whose child or parent :

Here "Bob" is singled out , along with its parents ( Amie & Alfred ) , and children. Notice the slight hue change, automatically coloring "similar" nodes and helping distinguishing the parents from the children )

It sure beats the default ingame representation of family trees but wait, there's much more!
To make all of our hard work worth it, we can now run plenty of algorithms on that data.
Apart from all sort of cute representations you could use ( see the plugins page for more eye candy stuff ), you can also access some more possibilities.

On this graph, the label size is set to be proportional to a Norns number of children.Looks like Coco's been busy,erm...spreading his genetic material !

This graph has been colored by "Clustering" , basically putting closely related Norns in same color groups. You can use that to identify families, and the overall mixing level of your population ( similar colors mean a lot of interconnections, differently colored clusters mean unrelated families ). Note once again the Grendel and Ettin Branches being colored in a contrasting way compared to the main Norn group.
See the "dead ends" on this graph ? You can quickly notice the initial "Adam,Amie,Baby" family on the bottom left didn't get much chance on the "Baby" side.Also notice that "Aron" on the upper right died without spawning any offspring.How sad.The purple family on his sister's "Aerie" side on the other hand, quickly got dominant thanks to Coco's sexual hyperactivity.

The no name/isolated nodes on these graphs are the default parent monikers that are used to generate first generation creatures eggs from the hatchery.

This is another sample of a C1 test world.The blue star on the upper left is all the genomes that are used to generate C1 eggs.It seems this test world only got exteral imported creatures such as purple mountain norns on the left orange cluster.

There's much more powerful analysis and visualisation results that could be achieved thanks to the SNA approach. Many automatic computations and visualisations could be added with additional information ( adding a Norns sex, generation, time of birth/death, number of genetic mutations, diffs with precedent generation genes ...).

What about a concentric layout placing each node further and further from an origin node ?

Here we are plotting the central "j254" genome spread.Notice how close nodes are red and blue ones further away ? ( in terms of generations, that is ) 

The concentric aspect of it is hard to see on such few nodes, but it gets much more meaningful on bigger datasets.

Organising data based on an actual timeline is also possible, allowing to draw more realistic/traditionnal family trees.

This was only a quick primer on SNA, but much more interesting possibilities and visually stunning representations are waiting in the myriad of plugins available for Gephi. I hope this small preview sparked some interest in advanced Norn study possibilities.

Many other possibilities are offered by SNA techniques on all kind of Norn data !

Closing words

Writing custom tools to exploit the game data leads to many interesting things.
In an upcoming article I will present a similar script to extract all chemical reactions happening inside the Norns
We will then map the chemical reactions using similar techniques so we can visually inspect Norns biochemistry and know at a glance what every chemical does and what reactions it takes part to.

Using such tools will enable us to dive deep into the Creatures model and finally understand what happens inside the Norns in a practical way (no more painful browsing of undocumented genome files without even a search function !)

All of this also gets much more exciting with bigger datasets, unfortunately I only had this Wolfing run in a virgin world as a working base to make those screenshots.

Last but not least, Gephi supports a vast array of Data importing techniques (of which we only used the most primitive unpractical but quick approach of manually importing CSV files.)
Think about what can be achived once we get to stream it game data in realtime to such graphs.
This would certainly make for an interesting monitoring interface !

Additional SNA study material

Here is some additional material on Gephi and SNA, just so you can master all of it before we hit more intricate topics or just if you're curious about the topic.

An introduction tutorial to using Gephi.

A broad tutorial about Gephi , manipulating data, and getting meaningful representations.

A quick presentation  about SNA, explaining what the main metrics are and how they're meaningfull.

The python NetworkX library, for easily performing calculations and SNA visualisations of your data ( we will be using this one in upcoming articles )

An overview and samples of NetworkX possibilities.

The python Graphviz library, used to graph your data.

1 comment:

  1. That is so awesome. Seriously, the graphs we learned in school were so boring. I love it when people can take big data and make it look so interesting and pretty. Can't wait to see what else you come up with.