Very nice job on this, beats mine to a pulp : (. However, the majority of cabs I've seen are not XML based, and rather has a .000 file that describes it. In the interest of contributing something here, this is the structure of the .000 file to the best of my understanding:
All ids start at one.
The first 48 characters are garbage as far as I could tell, probably the version of the program that created them. I skip them.
The next section tells the number of different things. If you skip the first 48 characters, the next character will tell you the number of strings the cab uses. Skip one character, and it tells the number of directories. Skip one character, and it's the number of files. Skip one character, and it's the number of registry hives. Skip one character, and it's the number of registry keys. Skip one character, and it's the number of shortcuts.
I have no idea what the next 40 characters are, I skip them.
Strings of characters are terminated with 0s, obviously. So for these values I just read until I hit a 0:
The application name is immediately after the 40 characters I skipped.
The author's name is after the application name (just read until you reach a 0, then the next character is the first character of the author.)
The next section is the unsupported platforms, these are stupid and annoying, but they're there. I skip over these by looping until I find two 0s in a row.
Now the read pointer is at the list of strings. The format of the string is: the first character is the id of the string (0 up to the number of strings told before), skip one, the next character is the length of the string, skip one, and the next character is the first character of the string. Read up to length of the string given. Skip a character, repeat for the number of strings. The second string (id 1, I guess) is the InstallDir from the XML based cab files.
And now is the list of directories. I do some stuff here I do not understand, so I may start saying incorrect things for this section.
If you skip a character after the end of the last string, the pointer will be at the id of the first directory. Skip a character after that, and it will be on the length of the directory string. And then skip another character.
Now the pointer will be on a string id, that string is the top level directory name. Skip a character, and it will be on the secondary directory, etc.
Now is the file listing. If you skip a character after the last directory, the pointer will be on the id of the first file. If you skip a character after that, the pointer will be on the id of the directory it goes into. If you skip seven characters after that, the pointer will be on the length of the file name. Read all the characters for however long the string length says it is, and that's what the proper filename of the file currently named 000 or whatever should be.
Now you get to read the hives, hurray. If you skip a character after the last file name, the pointer will be on the id of the hive. If you skip a character, the pointer will be on the id of the root. I only found the ids of two roots, so: If root id is 1, then it's HKEY_CLASSES_ROOT, else it's HKEY_LOCAL_MACHINE. Yes, there's probably more and that's a hole in my knowledge. Now it's like the directory names, read a character and it's the id of a string. Skip a character, and there's another id of a string. All the string ids combine with the root to form a registry path.
Now it's time to read the registry keys. This is funky, I'll write this part later because I don't understand it at the moment. EDIT: Actually, I really don't understand what I did. There's the hive id, then there's a couple string ids that are the path of the key. But otherwise I'm very confused. It's in the method readRegistryKeys under BinaryCabType.java if you want to look at how I did it.
And last and least, it's time to read the shortcuts. If you skip 3 characters after the last character of the last registry key, you'll be at the directory id of the shortcut. This directory id might be 0, and if it is the next character is the id for one of those %blah% things. Skip three characters if the directory id is not 0, or read the next character to get the real directory id and then skip two characters. The id of the target file (name, I guess) is the character the read pointer is pointing at. Skip five characters after that, and you get the shortcut name id.
Supposedly, that's it.
This was all discovered by me staring at a hex editor for longer than it should have taken me, so there's probably something important I didn't notice. I'm a bit fuzzy on the details because I released this a long time ago. My disaster program's source (Java) is available under Apache -> New Core OEMs -> Roastpork_OEMs on the FTP, if you want to see how I badly dealt with the non-xml cabs. Someone deleted everything else related to it, so that's the only thing there. And yes, I do like having one comma per sentence.
Now go, and make my program look like a damned useless cave man.
Last edited by roastpork; 11-19-2007 at 12:29 AM.
Reason: Oh the confusion.
|