The Shee's lost knowledge: Reversing the C1 .exp file format part 1 (General layout and extracting the genome)

It's been all those years, and despite a lot of talented persons having contributed to making Creatures tools or documenting the game various features, there still isn't any available description of the .exp file formant anywhere as we're nearing 2015.

The .exp files are the format in which creatures are exported or imported from and to the game.

Obviously, they contain about anything one could need to know about a given Norn, and exploiting them for writing custom tools might allow for a lot of interesting options such as exporting a critically ill norn, performing offline surgery on the file, and reimporting it into the game.

This would allow curing a couple otherwise fatal issues since the ingame mechanics don't provide any means of removing a given chemical from a creature bloodstream or reinvigorating degraded C2 organs.
(See the case of C2 grendels being oversensitive to cyanide poisoning because they lack the chemical reaction genes that would allow them to lower their cyanide levels)

In this series of articles, I will describe the process of reverse engineering the C1 (and later C2) .exp file format so any programmer out there can use the information to implement .exp file manipulating programs.

The first articles of the series will show the process followed so more people can learn "how it's done" and maybe later take on reversing other undocumented game file formats.
Hopefully, if enough information can be gathered about the format, a final article will sum everything up into the cleanest and most complete description possible.

For this first article, we will analyse the general file layout, and learn how to extract a working genome file from an .exp file.

Let's dive into binary DNA !

What do we expect to find ?

Before diving head first into binary data, it might be useful to list what we're expecting to find in an exp file.
This might allow us to make sense of bigger chunks of data by recognizing already known patterns.

Think about it.
When you export a Norn from a given game and then re-import it in another place, what information needs to be carried around ?

- The first and most obvious part is that generic Norn info must be there: it's moniker, name, owner information, genetic code,filiation...
- A snapshot of the norn Internal state must also be kept (otherwise an imported Norn would have an empty chemical system, and instantly die of a lack of glycogen)
- Same goes with the brain, it's state must be saved somehow, along with all the Norn's experience, and learned words.
- Sprite data? This is unlikely. Think of the case of the purple mountain Norns.
They come as a full blown installer, and not as .exp files.
If you import a PMN into a virgin C1 game from a .exp file , it will not show the expected appearance but that of the default brown norn.This suggests that sprite and .att data is an external resource not included in the .exp.
(After all you have to distribute .spr and .att files when you design a new breed don't you ?)
- The Norns life stage, age...
-Pregnancy state is probably saved somewhere too (probably along with the baby genome) as exporting a pregnant Norn and reimporting it seems to remember if the Norn was pregnant.
-The photo album however does'nt seem to be stored in the Norn's .exp file as it size doesn't increase with the number of taken pictures.

The overall file layout

If we just open the .exp file in a hexadecimal editor, and randomly browse through it in search for meaningful information, there are a couple of things that pop out.

We can find a few human readable things or stuff that looks familiar:

A creature moniker

Looks like vocabulary status

This section looks like the pose genes one in gen files but lacks the genes structure

A huge blob of "gene" markers along with data, reminiscent of the .gen file format.

One easy early step in reversing any file format is looking for readable strings.

If we automate the process, along with the above information, we find a couple promising strings:

"CGallery", "CBrain", "CBiochemistry" and "CGenome". Those really do look like markers defining sections inside the file.They also seem to provide us with important insight as to what might be the following data.

Great! From on now, we will assume that a .exp file is made up of 5 sections delimited by those markers, and will be considering each section individually.

(We know from previously gained knowledge of Creatures internals that few file formats contain lookup tables that would allow reading of only interesting sections, and that raw complete reading of a file is often the only method to parse it.It then makes sense that if the .exp file is an aggregate of other file formats/data structures, markers might be needed to differentiate between chunks of data for which no previsible length can be guessed prior to parsing.)

From on now we will call the various sections as such:

Section 0: The Header of the file, the few bytes before the first "Cgallery" marker

Section 1: The CGallery section containing everything between the "CGallery" and "CBrain" markers

Section 2: The "CBrain" section, containing everything between the "CBrain" and the "CBiochemistry" markers

Section 3: The "CBiochemistry" section,containing everything between the "CBiochemistry" and the "CGenome" markers

Section 4: The "CGenome" section containing everything between the "CGenome" marker and the end of the file.

You can use the following python script to split an .exp file into those chunks for easier manipulation.

Making sense of Section 4: the "CGenome" section.

Let's start with an easy one.

If we look at where the "CGenome" string is, we can see that what follows strongly resembles a .gen file format:

The .exp file centered on the "CGenome" marker

Let's compare it with the corresponding .gen file for this creature:

Above, the .exp file, below, the .gen file for the same creature

We've got a perfect match !

It seems that this section of the file is made of a 22 byte header, and then followed by a valid .gen file.

The 22byte header preceding the .gen file in the "CGenome" section

If we measure the section beginning at the "gene" marker and ending at the end of the file, we've got a perfect 20 001 bytes(~20ko), which is the exact size of .gen files.

Hooray, we can now extract this part of the file, save it to a .gen file and import it into a Genkit to confirm that it is valid.(Indeed, it is also strictly similar to the "1UGB.gen" file for this Norn found in the Genetics directory. That's one huge chunk out of the way!

Making sense of the remaining "CGenome" header:

So let's try to decode the 22 remaining bytes so we can wrap up our understanding of section 4.

-We can see that the "CGenome" string is preceded by a 2 byte value reading "7".

7 is the length of "CGenome" string and we already know from our previous reverse engineering musings that this is the "CSTRING" format in which many strings are stored inside the game files( those strings are composed of a 1 or 2 word prefix specifying length, and then the corresponding string).

-We can also easily identify the string "1UGB" which is the Norn's Moniker.

That leaves us with only 9 unknown bytes to make sense of:

Colorized version of the "CGenome" section with explanation

What is that "214e" right after the "CGenome" marker ?

Well, translating this hexadecimal value to decimal gives us "20001" sounds familiar ?

Yes, this is the exact length of the .gen file data we found out just earlier and stored later in this section.

(Just remember that a multiple byte value might be stored in reverse order in files, here "214e" really means "4e21" )

Length of the .gen seciton selected in a hex editor is shown to be 4e21

7 more bytes to go and we will fully understand that CGenome section.

So what is in those remaining 7 bytes ?
There is a lot of "00"s in there.
This could either mean longer fields than what we thought ( maybe the .gen section length is "21 4e 00 00" ? )
Or just empty fields, we can't tell for sure for now.

Let's focus on the "02" value.
Maybe It has something to do with the creature's sex ? or it's life stage?
Both would make good candidates for such a low value.

Let's export a couple creatures of known characteristics and see how this field varies.

Comparing the field between a female Norn (above) and a male one (below)

We guessed right, this field varies for creatures of different sex, and the enumeration (1 = male, 2 = female) is consistent of what we know of the game's internal representation of sexes (see C1 CAOS documentation for example)

We also note that in the following field of "00"s, the last byte before the "gene" word varied between our two sample creatures (it took 00 and 02 as values).

Given the sample creatures used for the test, my guess is that this field might be the creature age.
Lets use the BORG to create a new norn, export it, save it, forcefully age it , and export again...
(Yeah, I know, I don't really like playing Frankenstein that way, but eh... for science right ?)
What do we get when we compare 4 Creatures being the same norn, forcefully aged a couple seconds apart ?

The field varies sequentially and in the expected range consistent with the creature ageing

That's it !
The last byte before the first "gene" marker is probably the creature's age then.
According to the creature documentation, we know that ages are interpreted 0-7 as : baby,child,adolescent,youth,adult,old,senile,dead from old age.

We now know what most of the fields mean:

We only have 5 bytes remaining to fully understand that section.
Unfortunately, it's hard figuring out what zeroes might mean...
I went through all my collection of c1 exp files and found no variation of those values whatsoever.
I also tested if they contained an indicator of Norn pregnancy.To no avail.

So let's just consider for now that those values are always 00 or that surrounding values( .gen section length, age and sex) may instead be coded on more bytes than required.
The later can be confirmed later by reverse engineering the game executable, but this is not worth the trouble for now.

Wrapping things up, the CGenome section structure:

So , In this first part we learned a couple things.

- A .exp file is composed of 5 sections, separated by text delimiters("CGallery", "CBrain", "CBiochemistry" and "CGenome").
- The last section ("CGenome") is composed of:

The "CGenome" marker, in a CSTRING format.
2 bytes specifying the extractable .gen file length
Two "00" bytes
The Norn Moniker(most likely the instruction of what filename to give to the exported .gen file)
The Norn sex coded on one byte (1= male, 2=female)
3 "00" bytes
The norn lifestage (0-7) on one byte.
A raw .gen file containing the Norns genetic data, that can be simply cut out of the file and pasted in a .gen file.(Observed length for all .gen files seems to always be 20 001 bytes, padded with "00"s at the end.Padding seems optional and the game doesn't complain about shorter .gen files stopping after the "gend" marker )

That's it for part 1.
In the upcoming parts of this series we will try to make sense of the remaining sections, to the point where we can make something useful of those .exp files.

The Shee's lost knowledge

Pages

Sunday, November 30, 2014

Reversing the C1 .exp file format part 1 (General layout and extracting the genome)

What do we expect to find ?

The overall file layout

Making sense of Section 4: the "CGenome" section.

Making sense of the remaining "CGenome" header:

Wrapping things up, the CGenome section structure:

No comments:

Post a Comment