“Stage e№” – An Introductory Guide To Re^perverse-Engineering Visual Novels

I’ve written this for one of my online friends back in 2010. Despite its age, even 16 years later it doesn’t look dated – IDA’s interface is just as horrible, WinAPI is just as stable and basic x86 assembler is no different from what it was 40 years ago. Reverse-engineering fundamentals did not change either – and probably never will.

This 30-page guide is intended to give an all-round insight into how RCE is performed. It requires no special knowledge except only for more or less common mechanics of how a computer works (but this does not mean it will be easy). It’s a big plus if you can program a little but you don’t have to be able to read assembly code – you’ll learn this and more things as you progress through these pages…

…If your motivation is a real thing, that is. People often ask me to teach them how to hack a Visual Novel but so far I don’t know of anybody from that group who has completed this guide. In fact, I’m not even sure if those people have started it at all. Well, it’s their call… but maybe you are different? O_~

Chapters

Part 1 – The Bruteforce
1. Number Notations: LE & BE
Part 2 – IDA Pro & The Assembler
1. Assembler Crash-Course
Part 3 – Finding Clues and The Case
Part 4 – Getting to the Crash Point
Part 5 – The Final Act
1. The Smells of _gOOP
Epilogue
1. A Final Word on Encryption

Files

It’s dangerous to go alone. Take this! ©

Scenario Runner – our test subject, a tiny “Visual Novel simulator” written in Delphi 7
WinAPI help (WIN32.HLP is the main file) – a desk-top reference for a Win32 hacker; the alternative is online MSDN
Intel’s Assembler Instruction Set

Foreword

Before we begin, let me tell you a few words about how it all started for me…

I became interested in Visual Novels a few years before I became interested in hacking them for translation purposes – which happened near September of 2008 when I suddenly realized that I could see through that gibberish coming out of OllyDbg, that same assembly code that was previously scaring me to faints. Later I have set up a page with a dozen of VN tools I’ve made. With this tutorial I want to give you a head start, something I clearly lacked back then – and also prove that RCE is hard, but not off-limits to mere mortals.

This is a “side note” block – optional information which may help you later. Instead of dumping out too much “mandatory” details at once, I put extras into such blocks spread all over the text. They are making this tutorial double as a comprehensive guide, practical for real-world tasks (where one needs a fairly broad view of things). Depending on your starting level, it may be better to skip some sections, get a general idea of what’s going on, then return and re-read it again in detail (perhaps repeating this several times, paying special attention to side notes on later re-reads).

If you are stuck – drop a line in the comments. If you see a typo – select it and press Ctrl+Enter.

P.S. Special thanks to someone from Moscow who’s reported over 50 typos. That helped, comrade :)

Part 1 – The Bruteforce

First thing you need is the test script interpreter from here. Extract the EXE and DAT files somewhere but don’t look at the source code until we’re done (it’s written in Delphi 7).

Second thing you need is a hex editor. If you don’t have one I suggest 010 Editor (a trial will do for now) – I used to use WinHex but it sucks when working in Japanese locale, becoming virtually unusable. 010 Editor is an amazing powerhouse for all things related to binary data.

Now we’re all set for the first part.

So, do you see that runme.dat in the Scenario Runner’s directory? That’s a “scenario” that it runs. Launch the EXE file. The scenario is simple, as well as the interpreter itself – it outputs a string, then asks you a question and depending on your answer it will either ask you the same question again or output a message and exit.

Your job in this tutorial is to change the first string it outputs – that’s what we do when we need to translate a game. And the scenario should remain workable!

We’ll go brute force for the start. Open runme.dat in a hexed. You can clearly see some strings like “Make me laugh!” and others. What if we change one of them?

Let’s edit the first string making it “Do not cry…” – and as you will notice we have an extra “!” left from the old string so we’ll just go ahead and delete it (too bad for the string). The file will become 1 byte shorter.

Orange – edited bytes. Note that the final “!” is gone. — Editing the string, the “rectilinear” apprach

Save your changes and run the EXE. Oh-huh, we get some exception and also we see that it output some “☺” before dying – while it’s good to stay positive it’s not exactly what we hoped for :)

We probably forgot to update something, for example, a string length. Naturally, the app should know how long the string is. How would it do that?

I’m sure you can look around and find where the problem is on your own. Riiiight?

『な、なによ、ソレ？もじばけか？わからない！わたし、バカバカバカバカ～』¹

¹ “I do not understand it at all, please help me!”

…A Lesson-Ze® For Real №0bz

When you open runme.dat in a hexed you see exactly what Scenario Runner sees too, just in a human-readable form (yes, this is a human-readable form). A hexed is like a Windows Notepad for nerds that can handle characters that humans cannot but computers can (much like infrared cameras in the 3D world). It means that the program starts from the first byte (in the left-top corner) – that is, 06. Then it sees 00, then 0E, etc. In other words, it reads this file byte-by-byte. (We are talking about the above screenshot.)

Why do we call runme.dat as if it was a story “script” or a “scenario” of a performance? Because it is one, for Scenario Runner! If you remove some stuff from this file, change it or do something else then the program will depending on your changes (and, therefore, depending on the file’s contents) alter its own execution. Just like if you give a stage performers a wrong sheet they’ll screw everything up. Or, if you’re more subtle, they will be alright – but they will sing a different song. Or talk in a different language. See what I’m getting at? That’s what we want a VN to do – speak in English instead of Japanese.

So the program fetches one or more bytes from its “scenario file”, interprets them, advances its internal “pointer” or “cursor” (like we humans do when playing a piano or a guitar), reads more bytes, interprets them and so on until something tells it to stop (like the user hitting a close button). By interpreting here I mean that a byte with the hex value of, say, FF doesn’t mean anything by itself, universally – this value won’t make Windows to burn a CD or MS Paint to invade 4chan. However, when our particular program reads a byte of this value it thinks: “my scenario instructs me to do the command No. 255 – let me look up what it means…”. It checks with its internal table of opcodes (operation, or instruction codes) and finds that “No. 255” means “the routine to output the message on screen and wait for user input”. And you can’t write this “output the message…” directly in the scenario file because it’s a speculative concept, much like you can’t convey another person an “intention” to “go get some pizza” without “converting” your intention into words which that other person interprets and converts back to “his” intention (if you are lucky or your diplomatcy skill is over 9,000).

Now let’s go deeper. Say, a program sees the FF byte which means “output a message” – but the program itself does not contain those messages, does it? The only thing it has is its scenario file – if you slip another file with new messages but leave the program untouched then it will still work. So the message must be there. And if we are rogue enough to change the file correctly?..

Any scenario file (or any file for that matter) is a sequence of bytes (also called “data stream”, etc.). When a program reads that byte valued FF in hex it must then read the string to output. Technically, nothing prevents the program from reading it from the end of the scenario file, from another file or from some network location but programmers are sane people in general (not all, no!) and they usually choose to read the message after the instruction’s opcode (FF). In other words, first goes the opcode, then goes its string.

So we’re speaking about this structure:

    <INSTRUCTION> <STRING>

For example:

    <FF> <06 H e l l o !>
             1 2 3 4 5 6      ← character #; length = 6 characters

Or, in bytecode:

    <FF> <06 48 65 6C 6C 6F 21>
             1  2  3  4  5  6

Does it ring a bell? Look at the window of your hexed – the numbers are different but the structure is clearly similar: first we see some opcode (FF, here it makes the program output some message), then we see some… what? Since the program reads the file byte-by-byte from left to right (even Arabic programs don’t read files from right to left) it would see FF first and then it would need to read the string… but how long is that string? There has to be another “technical” byte besides the opcode – and there is, that 06, the string’s length; the program reads this byte and thinks: “the string is going to be 6 characters long, let me read this many bytes now”. And so after reading 06 it reads all those 48 65 6C 6C 6F 21 bytes that actually correspond to ASCII codes of characters that make up the string “Hello!”. (There are other ways to indicate string length but this one is used most often. Also, “reading a string” is an abstract concept, it doesn’t necessary happen via an API function like ReadFile.)

Now, the problem of converting “speculative” (“human”) characters into “concrete” bytes is outside of the scope of this intro but you can finish this tutorial just knowing that hex values from 20 (32 in decimal) to 7E (126) represent all symbols, numbers and punctuation that an English speaker needs. For details see the ASCII table on Wikipedia and a side note in Part 5.

Finally, back to our problem. In the second panel of a hexed (on the right) you see not numbers but human characters and periods in place of non-printable characters (also called “control characters” – “infrared” using our earlier analogy with the 3D world). Text editors (Notepad, Word, etc.) display such characters as square boxes or question marks. Note that the space character is printable (its ASCII code is 32) and so it appears as an empty space but line break (10) and tabulation (9) are not and they appear as periods. So, you see the string “Make me laugh!”, right?

Suppose we want to make the program output “Do not cry...” instead of that one (naturally, if this were a VN it’d say something like “日本にようこそ” and we’d want to replace it with “Prepare to kick some arse!”). How do we do it, you ask? Well, what’s the first thing that comes to your mind? Right, simply overwrite the above bytes (from “M” 4D to “!” 21) with the new string – as if we were editing it in Notepad. In case of 010 Editor, we put the cursor on the letter “M” in the second panel, make sure that OVR (not INS) is displayed in the bottom-right corner of the program’s statusbar (by clicking on it or pressing Insert on the keyboard) and start typing as if in Notepad. OVR means “overwrite mode” while INS means “insert mode” (we want to replace the text, not append to it).

When you finish typing by inserting the final period (of “cry...”) the cursor will have stopped before the exclamation mark that was left from the original string. What we do about it? Long story short: hit Del and eradicate it.

Welcome to Stage _e№!

Here’s what we have got: 0E 00 44 6F 20 6E 6F 74 20 63 72 79 2E 2E 2E.

Those 0E 00 bytes look exactly like the string’s length, don’t they (blue selection on the above screenshot)? We need to do a quick conversion to check this supposition: in 010 Editor, hit F11 (Tools | Base Converter) and enter 0E in the hex field – it’s 14 in decimal notation. Actually, you could also do this by setting the caret before 0E and looking in the Inspector panel, under Unsigned Short

You can also use the lightweight Notepad 2e for Windows which can not only convert the bases but also calculate entire expressions, encode strings, etc.

A Quick Intro to Number Represention in Memory and on Disk

One thing I found confusing in the beginning was that numbers are stored in reversed order. For example, if we have a 2-byte number and its value is 255 then it will look like FF 00 in machine’s representation, not 00FF as we humans would write it ( with leading zeros – 000,…,255).

This notation is called Little-Endian or Intel’s byte order – and there’s also Big-Endian byte order which would be exactly 00 FF. However, Intel’s byte order is most widely used on the desktops. On the other hand, Big-Endian is used in network transfers so if you end up reversing an MMO’s protocol – you’ll be seeing it a lot ;)

Oh, and I forgot to tell you an important rule: before touching any files always make sure to back up the original versions in case you mess them up.

So now update that length byte and re-run the script. Wow, that’s cool, the script is working! Job done 8)

…Well, not exactly so as it turns out after a bit of investigation – the script works normally only if you pick the second choice on the following branch, otherwise the program crashes. It says that the bytecode is corrupted… Oh noes, did it die?

I’ll now allow you to explore the script with the hexed so you can try to find where the trouble is. It’s fine if you can’t fix it yet (else you wouldn’t be needing this tutorial) but avoid blindly following into my steps – poking around on your own will accelerate your learning. When you’ve used up all your mental powers – carry on to the most fascinating part, and one that’s a pluck for most acolytes – debugging.

Part 2 – IDA Pro & The Assembler

A debugger is an incredible tool that allows us to read other people’s minds… even if figuratively. But, it does allow us to literally read a program’s code and memory which is a good compromise, eh?

For this part we need IDA Pro (version 6 is better if you can get it). Unpack it somewhere (it works without installation) and Open our ScenarioRunner.exe (IDA might show a “friendly start-up dialog” but being friendly doesn’t go well with being IDA so just skip it). After picking the EXE file IDA will display a dialog box about its file type; we don’t need it – hit Enter.

After a few seconds IDA will have finished disassembling the EXE. You’ll notice this – the bulb icon on the right of the third panel row from the top changes color from yellow to green.

IDA has a terrible interface when you see it the first time, even compared to Olly (which is not pretty). IDA both looks counter-intuitive and works counter-intuitive until you have gained some experience with it. Luckily, it should not take you more than a few days to understand the basic concepts and remember basic keyboard shortcuts.

That being said, be ready (1) to use keyboard a lot, and (2) to remember many shortcuts – you simply can’t operate IDA using context menus alone because items in those menus show and hide chaotically in differnet contexts. Moreover, IDA’s main and context menus (and help files!) actually miss most commands so hotkeys are the only way to call them.

Oh, and also remember this: IDA has no “Undo” command. If you mess up your file and you don’t know how to repair it – you’ll have to reload it losing all changes since the last save… And saving is performed not by Ctrl+S but by Ctrl+W (which typically closes the tab or window in “normal” programs). Did I say IDA was counter-intuitive?..

So, we should be at public start right now. If for some reason you’re not there, hit G and type “start”, then hit Enter.

We’ll now jump straight into the action – let’s run the program from within IDA and see how it works. Unlike Olly, IDA has several debuggers but most of the time you will need just one – select Local Win32 Debugger from the menu and press F9 to run it. (In IDA Pro version 5 you don’t need to select it because Local Win32 Debugger is the only one it supports.)

Nothing extraordinary happened. The program runs as it does without a debugger – and that’s exactly what we want since we can peek at its inner mechanics while it’s running around peacefully… muhaha >:3

Beeル Break: A One-Page Intro to the Assembly Language

Here I present some very fundamental info, the bare minimum. You will learn the rest of asm as/if you keep going.

You might already know that any CPU has registers which acts like memory slots (RAM) but much faster. A “register” can hold a 32-bit value. The registers we will need for our purposes (omitting x86_64 registers and specialized registers like FPU’s):

EAX, EBX, EDX, ESI, EDI, EBP – general-purpose registers (exact use depends on a compiler). Consider them as simple variables (as in math – X, Y, etc.).
ECX – normally used as a counter in cycles: for (ECX = 0; ECX < length; ECX++) { ... }
ESP – “Stack Pointer”, points to the top of a thread’s stack. A stack grows every time a function is called and shrinks every time a function returns. Functions put their own variables there (in addition to the registers). Stack values on addresses < (less than) ESP are unused (free).
EIP – “Instruction Pointer”. Points to the next instruction to be executed by the CPU.

Most registers can be directly changed by instructions like MOV (“move”) except EIP which is changed by J*, CALL, RET, etc.

All these registers are prefixed with “E” for a reason: each “E*” (“Extended”) register holds a 32-bit value. In fact, you can address registers ending on “X” (which probably also meant “eXtended” in even older times) as 16-bit and 8-bit: EAX → AX (16-bit) → AL & AH (8-bit). If AL is “L” and AH is “H” then the layout of bits of a register (one letter – one bit) is like this: HHHHHHHH LLLLLLLL. Changing Low/High parts of a register doesn’t affect its other parts. Note that you can’t access top 16-bit register of “E*X” in this fashion.

Constructions like EAX:EDX create 64-bit “super-registers” but you will rarely see them – certainly not in this project and not in most VN engines.

Now for the basic assembly instructions:

MOV dst, src

Copies src into dst. MOV EAX, 2 is the same as EAX = 2; in C. Most operations that have the form of OP reg, otherReg (like this MOV) put the result back into reg thus modifying its content: reg = reg [op] otherReg;.

That said, some assemblers (notably on *nix) render arguments in reversed order: MOV src, dst. Windows assemblers tend to use Intel’s notation so we won’t be concerned about this here.

MOV/LEA dst, [src]

[ ] around an operand cause that memory address to be read (like *(src) in C or (src)^ in Pascal). If EDX = 0x401FDE20 then MOV EAX, [EDX+4] sets EAX to whatever DWord is at 0x401FDE24.

LEA does the calculation only, no dereference so LEA EAX, [EDX+4] sets EAX to the number 0x401FDE24 (not to the content at that address). This instruction exists because in x86 asm you cannot really write MOV EAX, EDX+4 – you have to use [ ] to calculate expressions, and using them for MOV automatically implies dereferencing.

PUSH src / POP dst

Puts src into / gets dst from the stack. A “stack” is simply a memory area which is aligned on a 4-byte boundary (i.e. each item in a stack is a DWord that starts on an address which can be divided by 4). The address of the last used stack value (“stack top”) is stored in ESP.

XOR reg, byKey

Boolean XOR: reg = reg ^ byKey;. Has two notable uses: encryption and assigning zero to a register (as anything XOR’ed against itself is 0). The latter is because on the x86 platform it’s faster and more space-efficient than MOV (just 1 byte instead of 2-4 bytes). So you can think of XOR EAX, EAX as of MOV EAX, 0 or EAX = 0;.

AND/OR reg, byReg

ADD/SUB/IMUL/IDIV reg, byReg

Other Boolean (bitwise) and integer arithmetic operators: reg = reg + byReg;.

TEST/CMP reg1, reg2

These two make asm’s conditions tick. It’s hard to explain in brief what exactly they do but usually it’s enough to keep in mind that the result of running these instructions is put into the flag register (technically called EFL) which is then accessed by J*. E.g. if we did TEST EAX, 0 and our EAX was 0 then ZF (zero flag) is set and if one of the following instructions is JZ addr then it will “jump” while if it is some other type of J* then it will be skipped over.

JMP/J* reg/addr

These make execution resume from another location. Used in conjunction with TEST and CMP. You can think of them as of MOV EIP, reg. JMP is an unconditional jump while its other forms (JNZ, JGE, etc. which I call J* here) are conditional jumps testing bits in the flag register, such as ZF (zero flag or zero bit).

CALL reg/addr / RET [bytesToPop]

As their names suggest, they have something to do with function calls. CALL can be thought of as a shortcut for PUSH EIP; JMP addr and RET – for POP EIP. RET can be sometimes written RETN.

Attention: RET xxx has nothing to do with the function’s return value as you might have thought. Functions usually (depending on their “calling convention”) return their result in EAX (note this!). xxx is the number of bytes to pop off the stack (“cleaning the stack” on return). So it’s like: SUB ESP, bytesToPop or like this cycle with POP:

while (bytesToPop > 0) { POP tmp; bytesToPop -= 4; }

If you’re interested why this form exists and why it’s rarely used (particularly in C and C++) read about calling conventions – stdcall (as used in WinAPI) and those of Pascal and C (also see a side note later). But it’s not vital to know in this tutorial.

Back To IDA

This should be enough to get you started. Now we can go back to the disassembly. I’ve uploaded some docs about Intel asm instructions so you can always consult them – they have every existing instruction, literally every one (yes, modern desktop CPUs are woefully complex).

So now we’re on public start. We don’t yet understand anything because everything has meaningless names like CALL sub_40276C – right, “sub_” is a prefix indicating a function (“subroutine”) but what does this name tell us? Nothing, that’s why we should start giving things proper names!

I’ll give you a pointer. Gray lines starting with “;” are comments and IDA (as well as Olly) puts some useful info into those areas. For example, we see this:

    MOV     EDX, offset aScenariorunner   ; "    * * * ScenarioRunner demo... "

Can you guess what is it? It’s a reference to a string (address of its first byte), and “aScenariorunner” is a name that IDA has auto-chosen for this string (if you go to Options | General | Strings you will see the Prefix field which you can change; default is “a”). This particular autogenerated name looks okay so we’ll leave it as it is.

Let’s think about the purpose of the code block where it’s used:

    CODE:00413F8E                 MOV     EDX, offset aScenariorunner
    CODE:00413F93                 CALL    sub_4049E4
    CODE:00413F98                 CALL    sub_4032E4
    CODE:00413F9D                 CALL    sub_40276C

We can make a deduction that one of these functions outputs a line to the console. Which one? Best guess is the first since it seems to accept EDX as an argument and because it’s closer to MOV than other CALLs. However, this might not be the case – maybe sub_4049E4 is only preparing something and its result is passed to a later CALL which is the actual “write-to-console” function.

So let’s find it out. What we need are breakpoints. A breakpoint is just a “point” at which normal program execution will “break” (pause) and the debugger will take control, allowing us to inspect the thing. Both in Olly and IDA breakpoints are set by F2; in Delphi – by F5.

Put a BP on the first function call (sub_4049E4). After you press F2 the line will be highlighted in red.

Now start the program by F9 and IDA will immediately pause on the BP we’ve just set. Look at the Scenario Runner’s console window – confirm that it’s yet empty. Hit F8 now to “walk-over” the current instruction. Now the cursor is on the line with a call to sub_4032E4. Look at the console again – huh, it’s still empty! Then my guess was wrong. But no problem – press F8 again and yup, that second function was indeed the one we were looking for because a line was produced on screen. We must have found the output function! Isn’t that great?

Put the cursor on the second CALL, to anywhere inside the sub_4032E4 text and press N – this opens a rename dialog. Enter some meaningful name for the function, e.g. “WriteLn” as it is called in Delphi and press OK. By the way, I suggest you prepend the names you give with some symbol (I use “$”), so that (1) you can quickly distinguish the names you gave from the autogenerated ones (2) they appear on top of the name list when sorted.

So, I’ve named this function $WriteLn.

We should also take care of sub_4049E4 – although we don’t know what it does we still need to give it some name so when we see it next time we can at least remember that we have already encountered it in this context. Since we don’t know the call’s purpose let’s name it something like $IsCalledBeforeWriteLn – we can always rename it later.

If we don’t do this we’ll most likely end up in a situation when we’re lost in a mess of unnamed functions, although we might have seen many of them – we just don’t recognize cryptic names like sub_40FB68 as something that we already know. Your own names, even if they are just “SpookyGizmo” or “CalledLastNight” make navigating the boundless assembly code more doable and eventually might point you in the right direction.

Now it’s time to take care of the weird IDA Debug workspace that it creates by default. You can customize it as you like. Here’s mine, for example:

So, we have used one method of determining a subroutine’s purpose – by examining string(s) that it accepts. In fact, strings are like beacons for us reversers in the ocean of asm code – strings are what we see and what connect us to the original program source, which is mangled but thriving deep within the disassembly…

And we’ve also used another method – by setting a BP before a function, stepping over it and looking what has changed after its execution. This doesn’t work always, especially in GUI apps but it’s the shortest way if it does.

You can explore the functions on your own now and when you’re done we’ll begin to search for the actual interpreter’s loop which is like Holy Graal for a VN hacker.

Part 3 – Finding Clues and The Case

Summon us and thou shall see the light for we are the Imported Ones…

True, strings are like beacons but there’s an even better thing – imported functions. They also connect us to the program’s source code, although in a more subtle way than strings because we don’t exactly see them on screen but rather feel them being used somewhere in the core, he-he >:3

The table of imported functions is the number one target for EXE “protectors” – programs implementing various tricks so that debuggers and disasms (which usually come packaged as a single tool) like IDA and Olly won’t see it… at least without some effort. However, our EXE is not protected (and most VNs are not) and so its import table is simply an array of DWords – Pointers to each function’s first instruction (in kernel32.dll or elsewhere) and hence direct arguments for CALLs.

Functions can be imported from any DLL (a DLL is a program similar to EXE but instead of being ran by a user it’s included into another EXE file without the user’s interaction) – usually from system DLLs in C:\Windows and therefore they are identical for all programs (all Windows apps access files solely via ReadFile).

However, rare VNs can create their own DLLs which contain all the game logic (including the interpreter’s loop that we’re seeking) while their “main” EXE file is importing functions from that DLL. In this case you’d see not ReadFile being imported in the EXE but something like LoadGameArchive and you’ll have to debug the DLL (which does import ReadFile) instead of debugging the EXE – but this is outside the scope of this tutorial (although the process is very similar).

Here is a practical example. Say, a program is drawing something on screen – some text. And this text just doesn’t look good when another language is used. Ｔｈｉｓ　ｉｓ　ａ　ｆｒｅｑｕｅｎｔ　ｉｓｓｕｅ　ｗｉｔｈ　Ｊａｐａｎｅｓｅ　ｇａｍｅｓ　ｓｉｎｃｅ　Ｊａｐａｎｅｓｅ　ｉｓ　ｕｓｉｎｇ　ｍｏｎｏｓｐａｃｅｄ　（ｓｑｕａｒｉｓｈ）　ｆｏｎｔｓ　ｔｈａｔ　ｆｏｒ　Ｗｅｓｔｅｒｎｅｒｓ　ｌｏｏｋｓ…　゛ｕｎｎａｔｕｒａｌ゛、　ａｔ　ｂｅｓｔ。

So we want to replace the standard font the program is using. We know that there’s a WinAPI function CreateFont which among other things accepts the name of the font to create. We search for it, change the name by patching the EXE file and voila – the game properly displays our texts in a neat font!

Exactly this patching was implemented in my localization of TrueWorld～真実のセカイ～ although there it is done on run-time and doesn’t involve modification of the EXE file on disk.

You will notice that almost every system function ends on either “A” or “W” (e.g. TextOutA and TextOutW). “A” stands for ASCII while “W” stands for Unicode (also called “Wide” because each symbol takes up 2 bytes instead of 1). ASCII strings take up less space than Unicode ones but they are only able to represent a limited set of characters which is a common PITA when localizing programs whose authors believed that the only natural language on Earth is Japanese… or English.

A Primer on Charsets. ASCII, Shift-JIS and “Wide” Unicode – Why Care?

Even though I’ve said that “A” stands for ASCII, it might have been true only in very early versions of Windows. Later the OS received support for languages other than English (which ASCII can’t represent) along with the setting in Control Panel | Region | Advanced | Language for Non-Unicode Programs which is affecting “A” functions (but not “W” which appeared even later, when the “A” legacy has already accumulated). For example, they will start using Windows-1251 charset if set to “Russian”. For Japanese a de-facto standard encoding is Shift-JIS which is widely used in VNs. (Note that here I’m using the terms “charset” and “encoding” interchangeably even though strictly speaking they mean somewhat different things.)

Some VNs use various forms of Unicode (like UTF-8 or UTF-16) or even homebrew encodings but this should be dealt with on a case-by-case basis. Of all cases, it’s easier to localize games that use Unicode, which is basically short for “Universal Encoding” and can represent symbols of virtually all languages existing on Earth (or that have existed in the past, like Old Slavonic). One quick way of determining if it’s the case with a particular game is to switch Language for Non-Unicode Programs to some exotic locale (e.g. Arabic), reboot and run the game (without AppLocale if under WinXP) – if it works fine then it’s Unicode-based but if it shows mojibake instead of kanji then it’s probably Shift-JIS and you are out of luck – the game will have to be patched.

Shift-JIS is easy to spot because each character is encoded as a pair of bytes, first of which is >= 0x81 (81-84 for English, Cyrillic, punctuation and kana, 88 and higher for kanji). For example, “日本にようこそ” from the lesson for n00bz is represented like this:

    93 FA 96 7B 82 C9 82 E6 82 A4 82 B1 82 BB

I suggest Notepad 2e for converting between different encodings and their disk representations (“hex dumps”). Open a new window, switch File | Encoding | More… (F9), paste the hex dump of an encoded string (like one above), select it and press Ctrl+Alt+Shift+A (or call Edit | Encode | Hex To String). To do the inverse, i.e. obtain hex dump from a string, select that string in the text area and call String To Hex (Alt+Shift+A). Tip: if you don’t make any selection then these commands will work on the entire document.

In Visual Studio you’d often use functions without any suffix even though such suffix-less functions don’t exist in system DLLs. This works because it is automatically “appended” depending on the existance of a macro like UNICODE so that you can easily switch your program between the “A”/“W” versions.

That said, AFAIK in WinNT 5.0+ all functions ending on “A” are just wrappers for “W” since the core of this OS operates solely on Unicode (and thankfully so!). So if you want to “trap” a function – trap its “W” version and “A” will come as a bonus.

Alright, now let’s get more specific to our problem. We need to find a function that is the interpreter’s loop – since generally a script interpreter has a loop which reads an instruction from a script, goes through a (potentially very large) switch..case block, “interpretes” it and… well, we shall see what’s next once we locate that.

To give you an idea here’s a sample interpreter’s loop written in pseudo-code that understands 3 opcodes, each taking a single argument of type “string”:

    function RunScript(script) {
      pos = 0;
      while (StringLength(script) > pos) {
        switch (script[pos++]) {
          case 0:
            WriteConsole("New message: ", ReadStringFrom(script, pos));
            break;
          case 1:
            varName = ReadStringFrom(script, pos);
            SetVar(varName, Random());
            break;
          case 2:
            scriptName = ReadStringFrom(script, pos);
            RunScriptNamed(scriptName);
            break;
          default:
            throw new Exception("Unknown command's opcode.");
        }
      }
    }

So, in short: I suggest that we find all calls to ReadFile, set BPs on them, run the program and watch for something to happen.

Open the Imports tab (Open | Subviews | Imports). We could find the function by looking through the list but a faster way is to type first few characters of the function’s name (simply with that list focused). If you press F1 you’ll get some help on IDA’s lists, they have other handy features like searching by Alt/Ctrl+T.

Press Enter and IDA will transfer you to the tab with the code, to that function’s record in the import table… But we need code that is using it, not just that record so let’s press Ctrl+X… Wait, there’s only one place? That’s strange. Press Enter to go to that part.

Ah, so this is some kind of a wrapper function: JMP DS:__imp_ReadFile. In fact, Delphi tends to use them a lot while Visual Studio’s compiler produces direct jumps. Anyway, we are dealing with Delphi here and we need the actual code so let’s find what refers to this function – again, press Ctrl+X. We got another two matches, great. Go to and set a BP on each of them.

Now, if you were paying attention then you must have noticed that there were in fact two ReadFile entries in the Imports list. This can be – and we’ll have to set BPs on all of them. For me that second entry is also a wrapper and Ctrl+X inside that wrapper reports 1 more call meaning that in total you must have 3 BPs set.

Now we have 3 BPs set. As a reminder, we are looking for a place hinting at some connection with runme.dat. Let’s roll! F9 – and we got the first client!

Among other arguments, ReadFile takes a file handle, a buffer and a number of bytes to read. The first is the best clue we can have since it’s an unique ID connecting with any given file… but to match it to a real file on disk we’d need to observe CreateFile (which takes a file name and returns that unique ID – different on each run!). We could set more BPs on calls to CreateFile, observe when it is called with lpFileName set to …\runme.dat and note down somewhere the file handle it returns (by the way, do you remember that functions usually return their result in the EAX register?)…

Nah, going for CreateFile is too troublesome. We’d like to avoid doing more steps than necessary (since that’s something we can always make up for). First, let’s examine other clues – we have two more useful arguments: bytes to read and buffer. Well, try bytes to read, shall we?

With execution paused on the CALL ReadFile line, examine the series of PUSHes before it – each one is “passing” a parameter (in reverse order, so the PUSH closest to CALL is the left-most argument if you look at the function’s declaration in C or Pascal below). The Stack view (reopen it from the Debugger | Debugger windows menu if it’s not visible) highlights last “passed” parameter (on “stack top”) with a blue line (if you don’t see it then right-click anywhere in this view and call Jump to ESP). The line under the highlighted one is the previous parameter (i.e. the second from the left in a C/Pascal declaration). By matching PUSHes with these lines you can see which arguments were passed to ReadFile. (Later we’ll see how to use IDA’s hints for this.)

    BOOL ReadFile(HANDLE hFile top, LPVOID lpBuffer top-1,
                  DWORD nNumberOfBytesToRead top-2, ...);

Stack structure before a call to ReadFile — Stack structure before a call to `ReadFile`

An Optional Lesson on Wandering Stack Frames

This stuff is facultative and you can get along without it quite well so feel free to skip this section.

A function may receive some parameters and it may also need some space for its local variables. Both may reside on stack (in C programs this is so, in Delphi – not so much, it’s using registers instead which are faster to work with than stack memory). But as soon as you push something on stack – ESP’s value is changed and if you used to reference to your first parameter as [ESP+4] – now you have to start referencing it as [ESP+8] and revert back to [ESP+4] when that “something” gets popped.

When a compiler builds a program it calculates precise stack offsets and uses them to determine which parameter is where in every given point during the function’s execution. It also tracks all cases when the stack is modified and adjusts the calculated offsets accordingly. This is a complicated job but a machine can handle it. However, sometimes it’s impossible even for a compiler to pre-calculate all offsets because whether a function POPs (or PUSHes) depends on some run-time conditions (e.g. user pressing a key) and the compiler isn’t a seer to know what will happen – in this case it employs “stack frames”.

Let’s say we have this code:

    PUSH EAX     ; let this be AFunc's first and only argument
    CALL AFunc
    NOP          ; the return point; NOP - "no operation", does nothing

    AFunc proc near
      ; this function simply copies its argument to ECX and returns
      MOV ECX, [ESP+04h]
      RET 4
    AFunc endp

Here’s what happens with ESP:

PUSH EAX

ESP’s position is shifted by 4 (since values on stack in x86 are each 32-bit, or 4 bytes) and the newly vacant piece of memory (after shifting) gets written (by PUSH) the current value of EAX (later it doesn’t change if EAX changes, of course). To refer to this newly pushed argument we could write: [ESP+0] – but only immediately after this PUSH.

CALL AFunc

Since CALL is a JMP that additionally PUSHes the return address (of the “return point” – NOP in our case) onto the stack then ESP is again shifted by 4 bytes so in order to access the first argument we could now write: [ESP+4] – and that’s exactly what AFunc does after the CALL.

MOV ECX, [ESP+04h]

AFunc refers to that argument we have pushed. The stack is unchanged.

RET 4

As I’ve mentioned in the asm intro the number after RET doesn’t mean what to return but how many bytes to pop off the stack when the function returns. In our example the function accepted 1 argument (there was one PUSH) so that needs to be removed from the stack. If it fails to do that the program will crash – RET expects the return address to be placed on the stack’s top (i.e. after all arguments were PUSHed, as CALL is just JMP+PUSH and RET is POP+JMP) and if we don’t remove our arguments it would “return”, or “jump” to some random address which was our argument that we forgot to clean up. Buffer overflow attacks exploit similar miscalculations on behalf of the program by “returning” to where the attacker can do nasty things.

To summarize, it’s critical that by the time the function is calling its RET, the value of ESP is identical to what it was immediately before that function was CALLed.

In our example RET 4 increments ESP by 4 bytes, reads return address from the stack (like POP, again incrementing ESP by 4) and finally sets EIP to that address and the execution continues after the original CALL (at the return point where NOP is executed).

A more complex example:

    MOV EBX, 123
    PUSH EAX    ; arg #2
    PUSH EBX    ; arg #1
    CALL AFunc  ; before CALL happens: arg #1 = [ESP+0], arg #2 = [ESP+4]
    ; EBX here is still (again) 123

    AFunc proc near
      PUSH EBX  ; saving registers that this...
      MOV EBX, [ESP+08h]  ; arg #2
      PUSH EAX  ; ...function is using internally
      MOV EAX, [ESP+08h]  ; arg #1 - note that it's also [ESP+8]

      ; ...here AFunc is doing something useful with EBX and EAX...

      POP EAX   ; it's of utter importance that values are...
      POP EBX   ; ...restored (popped) in proper order, i.e. in reverse
      RET 8
    AFunc endp

Notice how PUSHing in AFunc affects ESP offsets by which it refers to its arguments and how it restores original values of the registers that it has used before RETurning so that they’ll keep their values if the caller was using them too before calling AFunc.

Finally, you will encounter this construction in the beginning of almost every (C) function:

    PUSH EBP      ; the function's start ("prologue")
    MOV  EBP, ESP
    SUB  ESP, ...
    ; ...the function's actual code...
    MOV  ESP, EBP  ; the function's end ("epilogue")
    POP  EBP
    RET  ...

A function of this form is said to be creating a stack frame. This “frame” is used to save the original value of ESP (address of the top of the stack when the function was called) so the compiler doesn’t have to bother about shifting argument offsets after each PUSH instruction in the function’s body (like in the previous example) because it can always refer to its arguments and variables like so no matter what happened to the stack after the call:

    MOV EAX, [EBP+4]   ; EBP, not ESP anymore

Local function’s variables are also “pushed” on the stack but after the function was CALLed (unlike arguments which are pushed before). However, there are no PUSH instructions because reserving the space for these variables is done by just SUBing ESP. Such variables are referenced using a negative offset ([EBP-0] would give you the return address):

    MOV EAX, [EBP-4]   ; first local variable

My numbers are: hFile = 0x50 (doesn’t tell me much, and yours will be different) and nNumberOfBytesToRead = 0x0143. Let’s hit Shift+/ and open IDA’s calculator, which can be also used for base conversion (although 010 Editor’s or Notepad 2e’s are probably more convenient). Let’s enter 0x0143 – I see that’s 323 in decimal which I suspect is… Hurray! Check the size of runme.dat – it’s exactly 323 bytes. How handy, the program seems to read the entire file into memory.

If you’re getting 0x80, not 0x143 then you have missed the second import entry. Game over, try again!

To be honest, you’ll hardly ever land such a hit on the first call to ReadFile because (1) files usually have some kind of header which is dozens of bytes in size and which is read before the actual file’s content, (2) programs may read other unrelated files (like configuration) before reading scripts. Still, if you keep on skipping (F9) and watching ReadFile calls you may eventually find that a program reads some large chunk of data – and if you compare it with sizes of the known archive or scenario files you might determine that it differs only a little from one of them.

And while we’re on it: there are several notations for writing non-decimal numbers. Hexadecimal ones that we’re using a lot can be prefixed: 0x143 or suffixed: 143h (although sometimes in this text I’m omitting both the prefix and the suffix if it’s clear that it’s a hex number, like here: FF). IDA is not very consistent and can either omit them always or expect only a prefixed form or a suffixed form depending on the context – sadly, this can only be learned by trial & error. Regardless of the format, both 0x143 and 143h mean the same decimal number 323 (decimals are written with no prefix or suffix).

As for us, we got lucky – our test subject is naive, it’s reading the whole script into the buffer, right into our arms >:3

Now it’s time to track down what it’s gonna do with all that data it has just read.

The program is paused on the call to ReadFile. Before we let it continue, open a new tab with the view of the memory which is going to be filled with data (lpBuffer) after ReadFile returns. IDA offers multiple ways to do that:

Locate the expression with the address of that memory that was PUSHed (in our case it’s just the register ESI), right-click on it and call Jump in a new window, or set the cursor on it and hit Alt+Enter.
Alternatively, call Jump in a new hex window to get a view very similar to a hexed. This may be more convenient in our case.
If you were to do a double click or press only Enter then it would navigate you to that address in the same code tab. If that happened then go back any time by pressing Esc. This is useful when exploring sub-functions.
Finally, you can locate the line with this parameter (lpBuffer) in the Stack view, right-click on it and call Follow in disassembly (but this will navigate the current tab instead of opening a new one).
…And you could also do that from the context menu of the ESI register in the General registers tab.

Did it? Good, now leave it opened, switch back to the code tab and press F8.

Now IDA is highlighting the line after ReadFile meaning that Windows has read whatever data the program has requested. Switch to the memory tab opened just before – now this is the memory area that has the contents of the file (prior to that it had some junk). Doesn’t it look like our precious scenario bytecode? Compare with what you see in 010 Editor… It sure does!

It’s just about time we use hardware breakpoints to see what the program is doing with this data.

There are two types of breakpoints: hardware (“hwBP”) and software (“swBP”). What we’ve used until now were swBPs and they are triggered when the CPU is about to execute an instruction (in fact, a swBP is simply an asm instruction – INT 03 which the debugger writes over the “real” instruction). HwBPs don’t rely on INT 03 – they are triggered when the CPU executes an instruction that accesses the memory on which any hwBP was previously set (which can happen more than once for a single BP).

We’d like to see what accesses our buffer. We can’t just skip a few instructions or examine subroutines “nearby” the ReadFile call and hope that what we are looking for is there because a program may read data in advance and do something useful to it “much” later (in terms of asm code executed). Hence we are “trapping” it just like a wild beast >:3

As explained earlier, it’s a bad idea to set a swBP over that buffer because that data is not executed by the CPU. If we set a swBP there, we will overwrite some byte with CC (the opcode of INT 03) but this “breakpoint” will never get triggered (because that memory area is never being executed), we’d have simply corrupted the string. So we need a hwBP for this task.

The number of swBPs is unlimited but the maximum number of hwBPs depends on your CPU model (usually over 4).

SwBPs are triggered before the target instruction but hwBPs are triggered after or inside it (if it’s a complex command like REPE MOV* – you will see one later).

Finally, because memory layout usually changes when a program is restarted, it’s best to do as much as you can during a single debug session (I had many sessions lasting for days). But if you do restart then remember to re-set all memory-based BPs (like the one on the ReadFile’s buffer) on new addresses.

Assuming you have that buffer tab active, set the cursor somewhere inside the lpBuffer area (e.g. on its first byte), press F2 – a dialog will appear and IDA will likely have automatically checked the “hardware BP” flag for you. See that Mode is Read (“break when reading this value, not changing it”) and press OK.

We don’t need anything anymore from swBPs set on ReadFile calls so you could remove them – but I suggest only disabling them (from the context menu) so you can get back to them quickly if necessary. You can open the Breakpoints tab by Ctrl+Alt+B, by Debugger | Breakpoints | Breakpoint list or by a button on one of the (many…) toolbars.

Now press F9 and wait until something happens… Here we go – “Hardware breakpoint… has been triggered”. That’s nice, let’s see what we got here…

REPE MOVSD. Well, it might sound scary but it’s simply an asm instruction that copies a block of memory from one location to another. If you want more info consult the Intel docs and search for that instruction.

We can now undertake a challenge of setting BPs on every REPE instruction we get (that’s not the only one, I promise) unless we hit something useful… or run out of hwBPs… but we’ll take another route – press (or even better – hold down) F8 unless we find something of interest. This way (holding F8) we’ll go up the call tree towards the root (which is public start) because we won’t go inside new functions (we’re not holding F7) and will gradually RETurn from all subroutines. On the way we’ll need to be on lookout for something of potential value.

…After five functions or so I got tired of this and I decided to press F9 again – maybe I’d find something in another part faster. Duh, IDA has displayed the same function again, just another branch. I guess I need to be more patient with F8 this time… (Yes, RCE involves a great deal of improvisation and uncertainty.)

Looks like we found a case statement’s graph! — Looks like we found a *case* statement’s graph!

After a dozen of returns I stumble upon a wide Graph that looks like a switch..case statement. See those boxes going from one root and then joining together on the bottom? If you think about it, that’s exactly how a case statement could be visualized (more on this below).

IDA has even identified this case for us by putting comments like “switch jump” all around the disassembled code. Olly can do this too but not as good.

Could it be the interpreter’s loop we’re looking for? Let’s check the comments and strings we have in this function. Hmm…

Well, so far the code doesn’t tell me much about its purpose. The only thing that looks interesting to me is a referenced string that IDA put in a comment that says “opcode %.2x” (let’s take a note on this – I wonder why we don’t see it in the console?).

I’ve got an idea: I’ll disable all BPs for now and set one at the beginning of this function… Actually I’ll set it at the case’s beginning – which must be here as suggested by IDA:

    JMP     off_41374A[EAX*4]   ; switch jump

A Word on case and Why it Doesn’t Work on Strings in C/C++ and Delphi

You might think that a case statement for a computer is exactly the same as a series of if statements – and thus you might wonder why a compiler complains that it can’t take a string as a case variable. Back then I also thought a case was the same as if+if+if+… but for compilers it’s totally different. Each case label is actually an index in the case’s “jump table” which just like the import table is simply an array of addresses (but an import CALLs while a case JMPs). And because a case statement accepts an integer (index in that array) there’s in fact no need to compare anything at all – just call JMP caseArray[caseValue] and you land directly on the right spot. It’s so much faster than ifs.

In the code fragment that we have stumbled upon, off_41374A is nothing else than the base address of that caseArray and EAX is its caseValue, which should be multiplied by 4 because every address in a 32-bit CPU is a DWord (in other words, 4 bytes in size). So the target address (after the jump) for the EAX of 1 (i.e. case label 1) is at position 41374A+4.

Of course, interpreted languages like PHP and Ruby don’t have this limitation and for them a case is indeed a little else than a bunch of ifs.

So put a BP on that JMP and hit F9… Let’s look at the EAX register (you can either look at it in General registers window or put a mouse pointer over EAX and wait until IDA shows a hint).

The Got-Its and Got-Chas of Memory Hints in IDA Pro

If you hover over off_41374A[EAX*4] as a statement (not over a separate string “EAX” somewhere in the code above or below) IDA will calculate the address (off_41374A + EAX * 4) and show a hint for that location (i.e. a piece of the case’s code branch after the jump would have been taken) rather than the numerical value of EAX (the tested value for the case).

Hints can also help you see a function’s parameters without referring to the Stack view as we did earlier. For example:

    MOV     EAX, [EBX+14h]  ; (1)
    PUSH    EAX             ; lpBuffer  (2)
    MOV     EAX, [EBX]
    PUSH    EAX             ; hFile
    CALL    ReadFile

If you have paused at the CALL then hovering over EAX will show you the value of hFile (because it was used in the closest PUSH). However, note that EAX was also used for pushing lpBuffer earlier but was overwritten so to see the value of lpBuffer you have to back-track from (2) to the instruction which set the value of EAX, which in this case is (1).

Hovering somewhere over EBX+14h gives you the address¹ of the address² of the buffer: the DWord at EBX+14h (this evaluates to 0041605C¹ in my case because EBX = 00416048) is an address (00416194² for me) of the first byte of lpBuffer (**(EBX+0x14) in C). If you double-click within EBX+14h, it will bring you to a line saying something like dd offset unk_416194 and only the second double-click on unk_416194 will take you to lpBuffer (i.e. to the first byte of the area changed by ReadFile).

So, PUSH EAX at (2) is actually PUSH [EBX+14h] where [ ] mean: “take a DWord at EBX+14h² and put it on stack” – and that DWord, as we’ve just discovered, is an address of the buffer, not the “buffer” itself (you can’t “push” a buffer since it’s a memory area of arbitrary size). In other words, ReadFile accept a DWord which is an address to a memory area to write to (hence the name of its argument – lpBuffer, “p” for “pointer”, “l” for “long [integer]” which is another name for “DWord” in WinAPI – so, “lpBuffer” = a “DWord” that is a “pointer to” a “Buffer”).

Sans those brackets, PUSH EBX+14h would push the address of the address (0041605C¹) – which is exactly what’s happening with lpNumberOfBytesRead a few lines earlier:

    LEA     EAX, [EBX+10h]
    PUSH    EAX             ; lpNumberOfBytesRead

Recall that LEA puts the result of the address calculation, not the value at that address (that’s done by MOV) and so LEA EAX, [EBX+10h] = MOV EAX, EBX+10h and therefore
PUSH EAX = PUSH EBX+10h.

You can hover for hints over many other places too – like over values in General registers thus avoiding having to open a tab for inspection. When a memory hint is visible you can use Wheel Up/Down to make it display less/more lines.

So for me IDA’s hint says that EAX = 0x01. This doesn’t tell us anything, probably yet. However, I have a strong conviction that this is the function we were looking for so before we go deeper™ let’s rename it – I called it “$InterpretInstruction”.

Now roll back up a little and review how EAX gets this value. What we see is this (try to guess what it does before reading on):

    CODE:00413735 CALL    sub_413AA0
    CODE:0041373A XOR     EAX, EAX
    CODE:0041373C MOV     AL, BL
    CODE:0041373E CMP     EAX, 6           ; switch 7 cases
    CODE:00413741 JA      short loc_4137B5 ; default

Firstly, it clears EAX by XOR’ing it against itself like MOV EAX, 0 (likely because it held the return value of the preceding CALL), then it sets EAX’s lower part (AL) to some value of BL (as you remember BL is a low-word of the 16-bit register BX which is in turn part of the 32-bit register EBX). We need to track how BL is set…

Or do we? Let’s take a break and draw a deep breath.

What’s our goal with this quest? We went to change one line in the script but after we did that it started crashing. So we need to (or must, if we’re pressed by the group -_-’) find why.

This kind of self-conrol is important when reversing desktop programs – their code is dozens of Megabytes long, it’s all too easy to get swamped in there. We almost rushed into finding which function sets that BL – but that’s not necessary for us to know. We’re getting sidetracked! The thing we really need to know is what it does with that byte (that is now in AL/EAX), not to find which one of those zillion disassembled functions picked it from the bytecode stream (runme.dat).

So for now one part is done – we have supposedly found the interpreter’s case statement. We can verify this in a few different ways but in this tutorial I’ll show you how Olly shines with its marvellous BP logging facility (IDA has something similar but it involves Python which we really don’t want to do here).

Part 4 – Getting to the Crash Point

And then Olly the Mighty stood up and said: “I am the king of this hill!”.

Get the freeware OllyDbg and load ScenarioRunner.exe into it. That said, I am going to use v1.1 because v2.0 at the time of this writing (2010) still doesn’t have all the features of v1.1.

Since we’ve already analyzed quite a lot of code with IDA this task will be a piece of cake for us. Copy the address that IDA shows in disassembly listing on the left of the JMP instruction (case statement’s start) – for me it’s 00413743. Hit Ctrl+G in Olly and put in there.

Of course, Olly’s code listing is similar to IDA’s but because it doesn’t show the code as a graph we see that the case’s jump table is declared right under the JMP instruction (if you pressed Space in IDA it’d show it too).

    CMP EAX, 6                             ;  Switch (cases 0..6)
    JA  SHORT Scenario.004137B5            ;  <= jump for default case
    JMP DWORD PTR DS:[EAX*4+41374A]
    DD  Scenario.00413766                  ;  ** Switch table used at 00413743 **
    DD  Scenario.00413787
    DD  Scenario.00413790
    DD  Scenario.00413799
    DD  Scenario.004137A3
    DD  Scenario.004137AC
    DD  Scenario.004137C6
    LEA EDX, DWORD PTR SS:[EBP-10]         ;  Case 0 of switch 0041373E

Now we’re going to set a breakpoint that will log the, presumably, instruction codes (opcodes) that this function is passed. As we’ve already determined, the opcode is stored in EAX.

Select the line with JMP and press Shift+F4 (or right-click, then Breakpoints | Conditional log). Olly has support for complex conditions which is described in detail in its help file. I’ll only show you the basics – the interface is pretty much intuitive anyway (unlike IDA’s, kekeke).

There are 3 radio groups with 3 choices: when to Pause the program, when to Log expression result and when to Log function arguments. Each one can have a value of Never, On condition or Always. By combining these settings we can create very flexible breakpoints.

In our case we only need to log the value of the expression we enter so make your dialog look like one on the screenshot.

Let the program run freely now with F9 (by default when a program is run from the debugger, Olly will pause it at the entry point). The log is populated by lines similar to these:

    00413743   COND: Instruction code = 00000006
    00413743   COND: Instruction code = 00000000
    00413743   COND: Instruction code = 00000001

After that Scenario Runner’s console window presents us with its meaningful question followed by a crash. Perhaps it thinks it could stop us with that? Ha!

Get back to 010 Editor with the runme.dat file opened and carefully review the contents in the beginning:

    0000h: 06 00 0D 00 44 6F 20 6E 6F 74 20 63 72 79 2E 2E  ....Do not cry..
    0010h: 2E 01 29 00 59 6F 75 20 73 65 65 20 61 20 63 6C  ..).You see a cl
    0020h: 6F 75 64 2E 20 57 68 61 74 20 64 6F 20 79 6F 75  oud. What do you
    0030h: 20 74 68 69 6E 6B 20 61 62 6F 75 74 3F 02 21 00   think about?.!.
    0040h: 4E 6F 20 6A 6F 6B 65 2C 20 49 20 74 68 69 6E 6B  No joke, I think

I have highlighted the 4 bytes (06 00 01 02) whose purpose is unknown to us. Everything else is a part of some string – length bytes (see, their right-most bytes are all 00?) or characters (note how none of those are below 20 and above 7E? Check the ASCII table to understand why). Hmm, in fact… don’t these 4 bytes look familiar? Compare them with Olly’s log messages – they are exactly the same in the exact same order! MAH BOI, we’ve scored a real hit even though we still don’t know what that 02 byte is supposed to mean.

To reiterate, this means that we’ve found the function that executes script instructions – the interpreter’s loop. You can even guess what the 06 opcode does and why it points to the instruction after the case statement – the answer might sound strange at first but don’t worry, strange is the norm :)

A Little Talk on How to Identify the Cause for a Crash

Crashes in game engines are the second most important thing that we hackers fight (first is badly written code that doesn’t allow localization “out of the box”).

We have two options now. First is delving right into complete decompilation of the script engine, understanding what every of its 7 instructions does. In our situation this is the best approproach because all functions called from within our case seem to be very small (no, not this: very small). The brute force approach is straightforward, yields a guaranteed result and is mostly routine rather than creative (so less margin for error). It works as long as the amount of work is clearly understood. I would have chosen this option if this were not a demo project.

In real life, we often go bottom-up – that’s why the process is called Reverse Software Engineering after all. That’s what we will do here. Specifically, we’ll try to “trap” the function causing problems. I can say in advance that it won’t be hard in this project and that it’s harder in real-life engines but still easier than decompiling the entire thing because the amount of code is staggering.

Memory trapping is done with hwBPs but that’s easier said than done – naturally, you put a breakpoint on one location, the program copies that memory block into another location which you also need to trap (and trust me, this happens almost always – it already happened in our demo runner). You will need to stay organized and attentive to avoid getting lost in all these copy operations… and sometimes even do a barrel roll :^)

Close Olly, we’re getting back to IDA. Reactivate our ~~shenanigans~~ breakpoints on ReadFile, run, then set a hwBP inside the buffer which it reads our runme.dat into, then disable that new BP. In Olly you can disable a BP with Space in the Breakpoints window but in IDA you’ll have to use the context menu, for once.

Now we should wait until the program asks the question – but don’t choose anything when it does. Recall: when we answer “2” it exits correctly while if we choose “1” it crashes. Here goes the question… okay, now we can re-enable our hwBP to see what the app is going to do when we choose “1”. Gotcha, something’s been triggered. What a twist! It’s REPE MOVSD again – this must be our lucky charm :)

Hmm, let’s slide down a bit using F8… code, code, code… aah. Aah! What’s that? It says: “read str of len %d”. Interesting! Let’s slide down more… What? We’re already in $InterpretInstruction? So this parts seems okay. What does it say on the console? Aha, it output a message that we’ll need to try that again. No crash yet. You just wait, machine >:3

$InterpretInstruction returned successfully, nothing got broken so the error doesn’t seem to be in this instruction. That’s not surprising, actually, because why should it break on a simple message output? We probably broke a jump instruction or something more complicated than that.

I’m thinking… What if we set a BP on that case statement in $InterpretInstruction? At least we will learn after which instructions the program breaks.

On my side, this function received the following opcodes: 05 then 00 then 04 – I found this out by simply setting a BP on the case statement and quickly pressing F9. One thing worth our attention is that after 04 IDA says that the program has risen an exception. Must be our target >:3

Whatever you answer to IDA’s question on how to handle the exception the program is done for so we’ll need to restart it – but we will continue for educational purposes (choose to pass the exception to the program). Whoa! It looked like the program’s normal execution got somehow transferred to another place in one instant. Well, we don’t really care unless it works… Uh? Come on, how could we end up in the default case if we did enter the 5^th (04)? So the program actually doesn’t run sequentially?

Exceptions change normal program flow – execution starts jumping all around until enough functions were “unwound” (forcefully returned from), to a point where the exception could be handled (that’s the try..except block). Eventually it might even reach the program’s entry point (although this rarely happens as most compilers have their own custom exception handling routine set up that takes care of such uncaught exceptions). If the exception was caught the execution continues normally from after the try..except block. If it didn’t happen, “unwounding” of the entry point effectively results in “unwounding” of the entire program and the user sees this (in WinXP, later versions have different dialogs):

“This program has encountered an error and has to be closed. Would you like to send a report?”

That’s why you should always handle exceptions even if on the root level (in main()). Don’t forget about try..catch / try..except blocks!

Well, all of this is sure fascinating to know but it doesn’t help us understand why the script is crashing.

Let’s look at the bytecode, maybe it will reveal new secrets to us?

    0100h: 61 67 61 69 6E 21 20 3A 50 0A 04 12 00 00 00 00  again! :P.......
    0110h: 2F 00 57 6F 77 2C 20 74 68 61 74 27 73 20 73 75  /.Wow, that's su

I think we’ve got something here. Disregarding strings, we have some strange numbers (04, then 12 00 00 00 and 00) right after the opcode which writes the message after which the program rises that exception (I suggests we dub 04 as Crashing Opcode™ between ourselves). As for 00 in the end – it must be another opcode, maybe for displaying a message?.. Because look, it has a string length going right after it, and then the message itself. That’s something you should investigate once we’re done. Meanwhile, in between 04 and 01 we have 12 00 00 00 – I wonder if it’s just a coincidence that it looks like a DWord? Need to check this out.

Go back to IDA, open the tab with the bytecode buffer and scroll down where you can see the end of “You’ll need to guess it again! :P” message. We see the same bytes that we saw in the hexed above. Let’s put a hwBP on every of the following bytes!.. Okay, maybe that’s an overkill – we are only concerned about those 12 00 00 00 and in fact we can cover all of them with just one hwBP since a BP can span 1, 2 or 4 bytes (IDA supports other sizes but I personally wouldn’t advise to use them; OllyDbg too supports just those 3 sizes).

Now as the guide is nearing the end you’ll have to work this out on your own. To recap, what you need to do is this: set a hwBP in the bytecode buffer immediately after ReadFile returns, track all copies of that buffer and in the end find a place where the program does something useful with that number (that is, not just copying). Easy, huh?

Of course, the hwBP should be set on that DWord (12 00 00 00) so that we land right on the instruction that has done something with this 32-bit number. I know you can do it, and it took me just a few jumps :)

What will definitely help you is reading those Intel docs on instructions you don’t know – like REPE MOVSD. Don’t worry if you can’t find exactly REPE MOVSD, some other similar instruction (REPE MOVS) will do too since the only thing you need to know is the purpose of that instruction and, in case it copies something somewhere, which registers hold the source and destination addresses.

You can do it! Go-go-go!

Part 5 – The Final Act

Two crashes for the price of one!

I assume you did your homework and found a place where that number gets involved in some obscure machinations:

    CODE:0041390F loc_41390F:
    CODE:0041390F XOR     ECX, ECX
    CODE:00413911 MOV     EDX, [ESP+0Ch+var_C]  ; <= here it got trapped
    CODE:00413914 MOV     EAX, [EBX+14h]
    CODE:00413917 MOV     EBX, [EAX]
    CODE:00413919 CALL    DWORD PTR [EBX+14h]
    CODE:0041391C JMP     SHORT loc_41392f

Now is the time for a few last tips. First, ~~without love it cannot be seen~~ take a look at EDX’s value – it’s 0x12, exactly that number we were trapping for (12 00 00 00). If MOV’s brackets confuse you then check here again.

I got curious and peeked into the memory at [ESP+0Ch+var_C] (hover over it or use G) – well, it looks nothing like our bytecode from runme.dat, that 0x12 is sitting there all alone and sad. However, it must have been copied from our bytecode because we put hardware breakpoints on places originating from that ReadFile’s buffer and as for why it ended up there… we don’t really care. What’s important is that it’s getting assigned to EDX and then something gets CALLed – we need to know what happens there.

Oh… ECXiting! That’s How Objects Smell

Previously we saw only CALLs to sub_XXX but here we see DWORD PTR [EBX+14h] – this means the target function is determined on run-time: it will jump to whatever address is stored in memory, at the location = current value in EBX + 0x14 (in pseudo-C: (*(EBX+0x14))()). Most of the time you meet this in OOP code where method calls are going through a virtual table of (overridden) methods, as is the case here if you look at Scenarios.pas.

In Visual Studio and some other C++ compilers, another indication of OOP is ECX being set to a memory location before each call, especially if the first DWord at that location is a pointer to (i.e. is an address of) a list of sub_XXXs – that’s a VT. If so then ECX is essentially pointing to the “object” to which the called method belongs (since an object/instance is nothing else than a block of memory containing the VT of that object’s class and values of the object’s properties). If you see 2 functions getting called with the same address in ECX then they are called on the same object; if ECX is different but the VT’s address is the same then those are instances of two different objects of the same class.

Delphi does use VT in a similar fashion but it doesn’t use ECX for the purpose of passing Self.

Good for us, the code seems well-written… okay, that might have been a bit biased… Anyway, the functions are small and this one looks particularly small – just step inside it (inside that CALL) with F7 to see for yourself. It consists of only 6 instructions including RET:

  sub_412278 proc near

    SUB     CX, 1
    JB      SHORT loc_412287
    ...
    loc_412287:
    MOV     [EAX+0Ch], EDX
    JMP     SHORT loc_412297
    ...
    loc_412297:
    MOV     EAX, [EAX+0Ch]
    RETN

  sub_412278 endp

The highlighted part is about as a critical needle in the haystack that is a game engine as the interpreter’s loop is: it’s the program’s variable that holds the current position in the bytecode (its “EIP”). This is a core thing that lets us see how the program interprets the bytecode, how it transitions through it, what exactly it does along the way. Ultimately it leads us to specific places where those values are used – because to access something inside the bytecode you need to use a pointer like this one, and that access operation must be quite close to the code that actually uses the accessed value (not merely copies it). Basically, you just put a hwBP on this variable and no longer care about jumping through hoops by trapping ReadFile buffers.

So let’s see what happens when we run the original, unmodified runme.dat. When it reaches 04 12 00 00 00, it jumps to the 18th byte (0x12 = 18) just like asm’s JMP changing EIP. There it reads opcode 01 followed by a string “You see a cloud…” (29 00 59 …).

However, our modified version has made the first string (“Make me laugh!”) shorter by 1 byte. When 04 jumps, the 18th byte happens to be 29 – but it was supposed to be part of the string! So the runner tries to interpret 29 as an opcode and fails.

Woot! With this knowledge we are able to produce a localization tool for this script engine.

One thing you can do after locating this pointer is to log the motions of the script engine through the bytecode. Open OllyDbg and put a Conditional log BP on the location of $ScenarioPos (obtained from IDA) like we already did and let the program run. The last position in the scenario (inside the runme.dat file) before the exception occurs turns out to be 0x12 (which also happens right after the 04 opcode is interpreted) meaning we’ve just confirmed that 12 00 00 00 we saw earlier is an “offset” and 04 is a jump instruction.

True, we could have guessed all of that by simply looking at the bytecode. We could have also figured what kind of offset it was (relative to the current position or absolute from the beginning) without going through all this code hunting. However, in a real VN engine with a ton of code before yourself you won’t always be able to decipher instructions so easily – scenario files can be Megabytes in size, the engine may define a hundred of valid opcodes (ISM SCRIPT of Sisters ~Natsu no Saigo no Hi~ defines 122) and so our guess that 31 32 33 00 is a DWord number valued around 8,000,000,000 might be as good as that it’s 2 Bytes and a Word (0x31 – opcode, 0x32 – variable’s index, and 0x0033 – relative jump distance), or that it’s a null-terminated a string (“123” in ASCII), or that 31 32 is a per-string encryption key while 33 00 is its length, etc.

Word to Encryption!

This tutorial didn’t touch the subject of encryption but it’s only because it would be too much to put on your plate at once. If you are dealing with VNs then encryption (or, to be more technically correct, obfuscation) is everywhere because VN authors tend to be paranoid about someone stealing their CGs or (the horror!) translating their dialogues.

But here’s good news: 95% of the time they are using simplistic “wind encryption” which is “<data> XOR <key>” where key is stored somewhere within the EXE. XOR is very fast, symmetric (meaning double encryption is equivalent to decryption) and trivial to implement: CD xor 11 = DC – encryption, DC xor 11 = CD – decryption; CD is the original byte and 11 is the encryption key.

Sadly (or fortunately, depending on which side of the game you are) it’s also easy to locate and undo – a hwBP is triggered, you notice a small loop (i.e. a J* to an earlier instruction in the same code block whose arrow IDA draws in a thick line) with one or more XOR instructions, step through it line-by-line, see how the data transforms and just reproduce it in your own code.

Engine authors try to make it harder by dynamically calculating the key (e.g. based on the file name or string offset). For a notable example see my PatchuCon decoder that’s using a multitude of XOR tables, running the same data over different XOR loops. Another notorious example is Planetarian which I’m not even sure anyone has made a compatible encryptor for (a very rare thing!). But there’s not much you can do with XOR to make it more, uhm, vandal-proof.

As for the remaining 5%, one example is WARC that bundles a separate EXE file within its .war archives that does the decoding in addition to the “general” encryption of .war by the main engine making debugging harder. But those extremes are pretty rare.

A reader has suggested Fatal Relations as an example of a simple protection that even a novice hacker can break. Why not check it out later?

Before we wrap up, let me quickly outline another way of getting through the “crash point”. Previously, we made a supposition that 12 00 00 00 was indeed an offset (which is a valid supposition you could have made when working with a real VN) and breakpoints helped us locate $ScenarioPos. This was a “forward” approach because we went from the start – from the moment the program obtained the scenario data with ReadFile we were scooping every place using it until arriving at sub_412278. But we could go from the end instead, back-tracking from the point where it was crashing:

The problem:: Our ScenarioRunner.exe is raising an exception on a modified scenario.
-9.: It happens in the default case branch of $InterpretInstruction (we have already reliably identified this function) which means the case can’t handle that value as there is no separate branch for it.
-8.: Since we’re sure that this is the interpreter’s case statement we conclude that this scripting engine can only handle opcodes 00-06 and this “opcode” that it received (0x29 in EAX which we confirm by putting a swBP at the beginning of the default branch) is not an instruction’s code at all.
-7.: Thus we guess that it has read that opcodes from a wrong position, which is the same as saying that we’ve shifted something when we’ve altered a string in the scenario before this position. Obvious stuff so far – but ~~who holds the key~~ how does it determine from where to read?
-6.: We examine how EAX got its value (yes, this is exactly the step I have cautioned against in the beginning) by back-tracking from the case branch upwards – EAX ← AL ← BL ← EDX… step out of $InterpretInstruction – DL is getting set from some local variable for which the only way to receive a value is via the preceding CALL.
-5.: We set a BP on that unassuming CALL, restart the program, check the variable’s value, step over by F8, check again and confirm that it’s changed. This means the opcode is coming from that CALL.
-4.: Next we continue by F9, note down the opcodes this variable is receiving, observe the crash after it gets 04 (first) then 29 (second), restart, skip until it gets 04 again, press F9 just one more time – it breaks on that CALL but the variable hasn’t been filled with the following “opcode” yet (29). From here, we set a hwBP on the variable’s memory area (it’s a DWord) then chant a prayer and ~~pull the trigger~~ hit F8…
-3.: Sure enough, we land in another form of REPE – REP MOVSB. What’s worth of noting here is that before F8 the variable’s memory contained some junk but now it’s holding 29 which is exactly the “opcode” causing the crash. I feel with my gut that we’re excruciatingly close to the solution but where precisely did this value come from?
-2.: The Stack view shows a few calls between the “parent” function (with our breakpoint) and this trapped one. From quick glance we don’t see anything promising in the function with REP so we move up the stack, poke around instructions and registers on that level…
-1.: …And claim that we’ve ran out of the midnight oil and go to bed for we have just found variables holding not only the script pointer ($ScenarioPos) but also the pointer to the bytecode buffer itself (after undergoing all the copies!).

It's hard to say which approach is more optimal. There’s no silver bullet. Sometimes setting hwBPs on read buffers will bear fruit quickly (even though it’s done by hand and therefore is tedious). At other times you pin-point the issue faster if you go in reverse (from the last executed opcode) – for example, in our demo I reached the “-1” step within just two minutes. Try to do the same for practice – it’ll be easier than with a real VN because functions in this EXE are extremely small.

Ready to see the answer? It’s in sub_412240 (comments are mine):

    MOV     EDI, [EBX+0Ch]  ; [EBX+0Ch] - location of $ScenarioPos in memory.
    TEST    EDI, EDI        ; Is it <0?
    JL      SHORT loc_4     ; Yes - then jump. (It should always be >= 0.)
    ...
    MOV     EAX, [EBX+4]    ; The pointer to our bytecode buffer!
    ADD     EAX, EDI        ; = buffer's base address + current script position

Should have these two variables appeared separately (like in our first “forward” approach) it would have been harder to recognize their purpose. However, with this last statement everything is coming together. The rest is trivial now that we have identified them.

Our “reverse” understanding does not always match the structure of the original code. For example, you might think that these two were regular variables – but if you check Scenario Runner’s sources, you’ll see that it has none of them. Instead, they are buried under several levels of inheritance inside a TMemoryStream object which is a standard Delphi abstraction over data sources (memory, disk, etc.). The runner itself doesn’t advance the pointer – the subroutine we’ve found is one of TMemoryStream’s virtual methods.

RCE is working with extreme uncertainty. You can never know how deep the rabbit hole is or even if you are on the right track – as we’ve seen, if you manage to change the texts the game may crash due to a mismatching jump address and if you fix that – it may crash again because the size of the script (which is often recorded in a header of a file) does not match the actual size after your modifications. This kind of job is very unpredictable and being healthfully chaotic is more productive than trying to figure every function on your way.

At last, we’re coming out from the fog… unless we are moving in the opposite direction!

Regardless of the way you have arrived at $ScenarioPos, it’s safe to say that the 04 opcode, the most critical line of which is this:

    MOV [EAX+0Ch], EDX    ; $ScenarioPos, 0x12

…is doing nothing other than setting $ScenarioPos to the value of the DWord that it has just read (in our case it’s 12 00 00 00 → 0x12). If we needed a final proof that this instruction performed an absolute jump (making the script continue execution from an arbitrary position in the bytecode) then we just got it.

And we’ve modified a string right before the position to where it jumps…

No surprise the program crashes – it doesn’t understand why it’s getting some weird 0x29 “opcode”…

Gimme just a second to fix it!

Switch to the hexed, Ctrl+G to 010Bh, change 12 to 11 and run the program. Muhaha, it now works like a charm, totally disregarding the modification of the first line! >:3

Of course, game authors don’t write their scenarios in hexeds – they use regular text editors or even IDEs. Depending on the engine used, their text files may look more like a book or more like program code. In the end most engines compile this code to a binary form very similar to the one we have been exploring in this tutorial, or at least encrypt text files to protect from manipulations by outsiders.

Here’s an excerpt from a script written for a real game using NScripter (one of the oldest engines made back then for Nitro+ games which does no compilation, only encryption) – as you can see, instructions like bg (for displaying images) are mixed with the narrative which in itself contains special markers (@ and \ for pauses):

    *s1
    ;----------------------------------------------------------
    ;シーン１
    ;絵：桜
      bg white,10
      bg "sakura19.jpg",10
      bgm "windbird.mp3"
    ;----------------------------------------------------------

    　季節は春。@
    　穏やかな光の射す窓の外、時おり吹く強風に花びらが流れていく。@
    　打ち合わせの内容に耳を傾けながら、俺はすっかり窓の外に目を奪われていた。\

In contrast, the very popular KiriKiri/KAG is more on the “program code” side:

    property temporaryLayer
    {
      // ワークエリアとして一時的に使用できるレイヤを返す
      getter()
      {
        if(tempLayer === void)
        {
          tempLayer = new KAGLayer(this, primaryLayer);
          tempLayer.name = "一時ワークレイヤ";
        }
        return tempLayer;
      }
    }

For several VN engines like KiriKiri and Entis you can download technical documentation, tools that allow you to repack game archives, change texts, graphics, behaviour, etc. or even their source code. Some game makers thwart those tools by customizing these engines for their own games but hacking such an engine is still way easier because usually only the encryption scheme is new, everything else (archive format, etc.) is the same.

However, most VN engines are proprietary so we as users don’t have anything but a script file and a “scenario runner” (a EXE that can only execute that script’s commands). We don’t get tools to convert text-based scripts to binary-based files or vice-versa, we have no description of how the binary data is structured, what charset is used, how strings are encoded or if they are encrypted, nor we get any means to detect when and what we’re messing up in the script – nothing like Visual Studio telling you that a semicolon is missing here or a function with that name doesn’t exist. Indeed, if we take a wrong turn the game is starting to misbehave or outright crashes and a debugger is the only thing that can help us make sense of the situation.

As a result, hackers write and (usually) publish tools to de-compile scripts of proprietary engines, extract their data files, etc. – tools that duplicate functionality of private ones made by the engine’s original authors for their customers. Occasionally this leads to a situation when public tools “take over” – perhaps because the engine was abandoned by its authors or because somebody has written its open-source clone (read the story behind NScripter, ONScripter, ONScripter-EN and PONScripter and see this and this too).

Epilogue

You have been through this tutorial. If you managed to finish it and understand the backbone of what we were doing – then kudos! It means you have become part of the 31337… and also that you’re on your own now. You can enlarge your, uh, skillz by messing further with Scenario Runner which has a few hidden Easter eggs. Or, if you feel it’s not exciting enough then pick a VN (ideally one that hasn’t been translated yet) – but be warned that even a tiny VN has a much more complex engine than this demo.

Here’s more stuff you can do with ScenarioRunner.exe for practice:

Remember that suspiciously-looking string saying “opcode %.2x”? Actually, if you look over the entries under the Names subview of IDA (click on the Name column to sort the list) you’ll find other interesting strings (they should have been also listed in the Strings subview, Shift+F12 – but IDA doesn’t catch Delphi and/or Unicode strings by default).

For example, aReadStrOfLenD – what is this for? Why none of them show up in the console output? You can investigate this – maybe it will help you in tracking down crashes in Scenario Runner or give some insight into its core operations, who knows? (Spoiler: indeed it does if you manage to pull this trick with a VN like Haeleth did with RealLive.)
You can undertake a challenge of understanding every opcode function. We’ve already got the notion of the 04 opcode and I’m sure you’ve understood a few others but there are at least 3 of them left – and also that strange 06 case branch – what is it used for?
Try to translate (change) every other string in runme.dat – there are 6 lines in total, including 2 question strings. We’ve already “translated” the first line. There’s a surprise awaiting you when you modify one of the remaining lines – you’ll need to dig into the disassembly to solve it :)
Just for fun, try finding out quickly how to make ScenarioRunner.exe execute an arbitrary scenario file, with a name other than runme.dat.
Can you make your own custom scenario (or modify the default one) so that it would do things you want it to do? I made this runme.dat from scratch in a hexed – can you do something similar? It will likely require good knowledge of most opcodes – you will have some fun.
And the ultimate challenge – try writing a complete decompiler and compiler of Scenario Runner’s scenario files. Or at least try to make a translation tool like ones on my page (e.g. msdcomp) which can both extract texts (from a script file into a text file) and update texts (inside the script based on lines in a text file). Working out a finished solution like this one is bound to give you one or two level-ups.

But, you know… regardless of what you do from now on, remember just these two things, amirite?

Do it only for lulz
Lurk more

Thanks for reading!

~ 11 May 2010 & 25 February 2020

Visitors' Comments

Note: comments are premoderated