With the GRE out of the way, I should be wrapping up my thesis and looking for universities in the US to apply to. But oh no. I had to hit one of those creative streaks that keep me up until 5.40am, when the silliest of bugs is right in front of my eyes and I just can’t see it. I’ve been doing a lot of other stuff lately, including creating a Speech Recognition tool for NBA 2K13 (a basketball video game) and a lot of reverse engineering via hex editing. I’ve created a lot of hex editing automation tools, most of them used to customize various aspects in the NBA 2K13 game I mentioned, such as edit the rosters or edit the values of the lighting shaders used in the game that affect how the players are lit and shadowed. To get an idea of what I call “programming as a hobby”, you can take a look here at all the tools I created for last year’s iteration of the game. It’s been just a few days since the game was released, and I’ve been hard at work, juggling finishing my thesis and programming those tools, working 10-14 hours a day in front of my computer.
So, I was asked by a user at the OperationSports forum, “what’s hex editing?” Since, as most of you know by now, I aspire to be a teacher, I love “101” questions like that. Here’s my answer:
Try opening a save of any game in Wordpad. Can you discern any information about your game progress? No. That’s because most game files aren’t lines of text, because developers don’t care about the game files being readable by humans, or in sometimes they even don’t even want that.
To understand what’s the difference, here’s the difference between saving a number in its binary form and its text form.
So, simplest text encoding is ASCII. You may have heard of it. When you write text, if the text file is saved using ASCII, each character you’ve typed requires 8 bits (or 1 byte) out of your hard disk to be saved. So, if you write a text of 1,024 characters, you’re using up 1024 bytes.
Still, do we really need all those bits?
Using ASCII, any number you type, requires a full byte to be stored. Doesn’t sound like much, but it adds up.
However, if all you’re saving is numbers, what do you need letters A-Z and a-z for? Nothing. What do you need punctuation signs like a full stop or a comma? Nothing. All you need is a way to save the numbers. And the numbers are just 10. 0 to 9.
Here comes binary. So, the way computers work, the smallest amount of storage is a bit. It can be either 1, or 0. How many states is that? 2. So we can actually save anything that has just 2 different states in 1 bit. Consider wanting to keep a file of the state of all the light switches in your house.
You could do “on”/”off”. Which means that you’re using up 2 bytes for each light switch that is on, and 3 bytes for each light switch that is off. Seems like a waste, doesn’t it?
Working with binary allows you to say that since a light switch can be in either of 2 states, on or off, you can just use a single bit to represent it. That bit can be 1 if the light is on, 0 if the light is off. So simple. So you’ve gone from a maximum of 3 bytes, which is 24 bits, to a single bit. So you’ve saved 23 bits!
Let’s add some bits. Let’s say we have 4 bits. Now, consider that each bit can be either 0 or 1. So you can have these words:
That’s 16 different things in just 4 bits. You can save all numbers between 0 and 15 in just 4 bits, while saving them in text would require 1 byte for all single-digit numbers, and 2 bytes for all two-digit numbers.
Each time you add a bit, you get double the options.
So 4 bits gives us 16 different things, 5 bits gives us 32, 6 bits 64, and so on. 10 bits and we have 1024 choices. 20 bits and we have 1048576. That’s a little over one million. Do you know how many bits you require to save that “one million” number? 8 bits per digit * 7 digits = 56 bits. You’re saving 36 bits, which is 4 and a half bytes!
This adds up to big savings, and hides information you don’t want the regular user to see, or at least makes it harder for them to see it.
Hex Editing allows us to edit stuff in binary. Hex is short for hexadecimal, coming from the Greek word “decaexi/δεκαέξι”, which means 16.
The hexadecimal system consists of 16 choices (duh). Numbers 0 to 9 and letters A to F. It is used to represent information stored in binary. Just grab any 4 bits, and they can be represented with a hex character. Grab 8 bits, and it’s 2 hex characters.
So hex editors read the bits of a file and translate them to hex characters, which you can then edit to edit the actual binary values.
And since 2K doesn’t save nearly anything in ASCII other than player names (it needs those letters then :P), we need to use hex editors in order to find the underlying numbers saved in binary, to be able to edit them.
Consider the CF ID 1024. To get its binary representation, you divide by 2 until you’re only left with 1.
1024 / 2 = 512 + 0
512 / 2 = 256 + 0
256 / 2 = 128 + 0
128 / 2 = 64 + 0
64 / 2 = 32 + 0
32 / 2 = 16 + 0
16 / 2 = 8 + 0
8 / 2 = 4 + 0
4 / 2 = 2 + 0
2 / 2 = 1 + 0
1 / 2 = 0 + 1
So, grab all those remainders, put them in the opposite order, and you get
That’s 11 digits. Well, as we said, each hex character consists of 4 bits, so we need to pad the information with some 0s. We’re going to add them to the left because they add no value there. It’s the reason why 0482 = 482, but 4820 isn’t equal to 482. Get it?
So now we have
Break this into groups of 4.
0100 0000 0000
Now each one of these is a hex character. Which one is it? Well, when reading binary numbers, you have to understand each bit is a power of 2. Just like when I write 482, 4 is the hundreds, 8 is the tens, and 2 is the singles, so
4 * 100 + 8 * 10 + 2 * 1 = 482
4 * 10^2 + 8 * 10^1 + 2 * 10^0 = 482
Same goes for binary. You start at the right, and that’s the zeroth power of 2. Then to its left is the first power of 2. Then to its left the second, and to its left the third power of 2, like so.
0 * 2^3 + 1 * 2^2 + 0 * 2^1 + 0 * 2^0 =
0 * 8 + 1 * 4 + 0 * 2 + 0 * 1 =
0 + 4 + 0 + 0 =
The other two hex characters are both 0000 in binary, so it’s pretty easy to see they’re both 0 in hex as well.
Put all those together, and you get what?
So 400 in hex, is 1024 in decimal.
And that’s what we do by hex editing. We find the hex values of things we know only in decimal, look them up in the roster, and try to decode it.
So if I know a player’s CF ID is 1024, I look for 400 in the file, and find places in the file that seem like the player entry. How do I know? I see other hex values that give me a hint, like maybe the hex representation of the player’s Portrait ID.
If I want to change that player’s CF to 1253, I convert that number to hex (4E5), and substitute 400 with 4E5 to get the desired result.
And that, in a nutshell, is Hex Editing 101.