Why are there so many conflicting sources on the anatomy of Windows PE files? - executable

I was trying to do research on the Windows PE executable format, and every source I can find tells a different story about what sections are in a PE file. No two sources say the same thing. They all conflict. I finally had to go to the Microsoft website, because I think that would be the only reliable source at this point. Are there different versions of Windows PE or something? What is the reason for this discrepancy?

There are no 'conflicting' articles about what sections there are in a PE file. There may be a few but there most likely just plain wrong. In PE files the sections can come in different order, different names on different operating systems, etc. The whole structure of the PE file can changed just based on how the programmer programs it. The main reason that things appear so different is because there's no fixed format. Of course there are the same keywords such as PE but these can be in completely different places.
I recommend you look at PE101, it's a website that explains the PE format in great detail:
https://code.google.com/p/corkami/wiki/PE101
Also, please remove the windows-pe tag from your question. It refers to Windows PE the operating system, not the PE file format.

Related

How to extract licensing information from a bitbake recipe

I will keep it short. I have been handed a yocto repository and asked to audit it for the licences used by the build. My end goal is to:
List all the licenses used by the distro (i.e. licenses used by all the tools and utilities built with distro)
Get a copy of the license file
Get the URL on the internet, where that licence text can be found. (if someone else wants to compare it with what I have provided them)
Being a lazy "software engineer" I am, I want to avoid doing this task and just parse all the .bb files to extract all that information.
I have seen some recipes, which include headers, which in turn have the license information. It'd be nice to be able to follow the trail.
This project on GitHub looks promising. But might not get me exactly what I need.
I also have the entire source code and the license file text distributed with the source code. I should be able to write a simple script to achieve this, but the text in some licenses don't contain the type of license itself.
Any pointers will be greatly appreciated.
First of all, you probably want licenses used in your image, not distro, as you can build all kinds of recipes within any distro, so what matters is only what you ship, which is your image. The way to find out licenses used by software in an image is already described here, but your question differentiates a bit in that you also want full license texts. That's also easy, it's all there in per-package directories in build/tmp/deploy/licenses.
As for your third subquestion, it's not that easy because even something standard like GPLv2 has little variations from project to project, some have exceptions, some have "(c) $YEARS" written in different way, so what the Openembedded build system gives you is actually more reliable as it's extracted from the source. What is possible is to provide the source code itself (via archiver class) along with license information, anyone really curious could cross-check sources and licenses that way.
You should be able to address 1) and 2) via https://www.yoctoproject.org/docs/1.8/dev-manual/dev-manual.html#maintaining-open-source-license-compliance-during-your-products-lifecycle .

Macromedia Director: Decompile EXECUTABLE File

It is possible to extract executable files??
If it is what are the possible software may i use?
it depends how deep you want to dig into the executable and what kind of data you need from the exe.
As with all exe files you can analyse them on a low level. But I assume you want to have a high-level tool to get back the director files that were used to create the exe, right?
For the exe-file (the "projector" to use the appropriate director wording) there is no such tool known to me.
But very often the exe file is used together with files with extensions such as .dxr or .dir. Those are director files. DXR-Files are protected. But by importing them into director you can extract some of the cast memebers (graphics etc.) that are included. You need in-deep knowledge of lingo to do so.
You might also find .cst or .cxt files. Those are cast files. They can hold media, scripts etc. too. CXT are the protected versions. For these files the same is true as for the DXR and DIR files.
All in all - it is not easy and your chances are low to completly reveal all code and media. Most director programmer use the protected files for distributing their programs. Those do not allow to reveal all data included.
This Python script extracts Macromedia / Adobe Director movies and casts
from Windows and Mac executables.
https://github.com/n0samu/director-files-extract

Structure of QuickTime's 'dref' atom 'alis' element

I need to rewrite a QuickTime reference movie, making it point to another set of files.
I'm working in Windows environment, so I don't have acces to the QuickTime API, and being the referenced files unaccesible, I can't also use the COM interface to load the movie because it can't resolve the referenced paths.
The documentation in the "QuickTime File Format Specification" says that the 'dref' atom can have a list of 'alis', 'url ' and 'rsrc' data references. In this case I need to parse the 'alis' elements. According to the reference, "Data reference is a Macintosh alias".
So long, I have not been able to see a declaration of the structure or any related information. Do you know the structure of an alias record? Where can I find detailed information about it's structure?
Thank you a lot for your help!
The format is very similar to the sort of alias that you could generate in the Finder by right-clicking an item, and creating an alias to it.
Aside: When the QuickTime format was originally specified, Apple intelligently chose to incorporate a number of other standards and paradigms that were extensively already being used elsewhere in the OS. This is one of the reasons why QT is (or was) able to do really clever things like reference movies. Unfortunately, there's also now a lot of cruft leftover from OS features that are no longer relevant (ie. AppleShare). Back in its heyday, QuickTime was slick, especially compared to its competitors; today, it's vastly underappreciated due to the buggy Windows port, and the relatively low processing power of the desktop systems of its time.
Back ontopic, unfortunately, the format for alias files is not an open/published standard, and there is precious little documentation on the topic on the 'net. There's one really old doc that deconstructs the alias format used in Mac OS Classic. Although the structure used in OS X is very similar, the alias files themselves tend to be much larger, as they contain numerous extra data strings at the end of the file that are not documented in the above-linked documentation.
Also, aliases created in the finder do look a bit different from the ones contained within the dref atom, although I've never run through them bit-by-bit to deduce the actual differences. If you want to take a peek at what those files, and have the OS X Developer Tools installed, you can run
setfile -a a [filename]
on a Finder-generated alias to strip the file of its alias-ness so that you can look at its contents in a hex editor (otherwise, the OS will just redirect you to the linked file - doh!). You can re-set the file's alias attribute, or arbitrarily designate any file as an alias by running
setfile -a A [filename]
Unfortunately, during my experiments, dumping the alis portion of a QT movie's dref atom has never seemed to generate an alias that Mac OS was able to interpret.
Fortunately (or not, as it was in my case), the functions that Mac OS allegedly uses to create/handle aliases are part of a public API called the Alias Manager, which is part of the very-low-level CoreServices framework. If you've got time to delve into this further, you can write some code to experiment with Mac OS's built-in alias-generating and interpreting capabilities.
Unfortunately, if you're dealing with an old/buggy file, you have no way of knowing if the file was actually generated by CoreServices' Alias Manager, or if that framework has changed/evolved/regressed since then. Because it's a closed format, 3rd-party developers who opt to not use the Alias Manager can only take guesses as to the format's "legal" structure.
You can use this Java program to see what is in the header, and extract data (it's a bit old, but may still work). What is more useful, though, is the thorough discussion by the author about the Quicktime header.
But I think you may just be looking for the Apple documentation, currently found here.

How to decide on document file extension?

I'm writing a new document-based cross-platform chemistry application (Win, Mac, Unix), which saves files in its own format (no standard format exists for this field). I'm trying to decide on a file extension for the saved files. My questions are:
How important is it nowadays to stick to 3 characters?
Where can you check how much this file extension is already used? (Google helps, of course, but it does not tell me how much a given app is popular)
Do I really need to use a file-specific extension? My save format is gzip'ed XML, so I could name it .xml.gz, but I fear it would confuse beginning users (i.e. when you see it, it does not immediately "ring a bell").
Finally, do you have other important guidelines when choosing for your own programs?
PS: I tried to keep the right balance between "giving too little information" and "being too specific to be really useful to others". I'll happily provide more information in comments if the need arises.
FileInfo.com lists a lot of file extensions along with their own estimation of how much it is ued.
I suggest a unique extension (rather then xml.gz) so that the OS can identify the file type to users when looking at a file listing or whatever. 'Ringing a bell' is important, especially if you will have less sophisticated users.
I don't see any need to stick to 3 characters, but I wouldn't go bigger than 5 (I don't suppose I have a real reason for this, other than personal preference).
How important is it nowadays to stick to 3 characters?
It's not unless you have to support older operating systems. All current OSes handle >3 char file extensions without any problems. Think of .html, .config, .resx, and I'm sure there are more.
Where can you check how much this file extension is already used?
check out FileExt.
Do I really need to use a file-specific extension? My save
format is gzip'ed XML, so I could name
it .xml.gz, but I fear it would
confuse beginning users (i.e. when you
see it, it does not immediately "ring
a bell").
Remember that windows (and windows users) associate files with applications by extension, so using something too generic like .xml.gz may cause problems. You are probably better coming up with something that is more specific to your file type or application. Users don't care weather your format is gzipped xml internally, they care about what is in the file. Think about abstraction layers, your users will think of it as a file containing chemistry info not gzipped xml, so .chem is far more appropriate than .xml.gz
Some suggestions of things to thing about:
Obviously, don't clash with anything big - Don't use .doc, .xls, .exe, etc.
Don't clash with anything common in your industry domain that your user demographic is likely to have installed. For example, if you are writing a programming tool, don't use .cs or .cpp. You probably know your domain best, so write a list of all the apps you and your users are likely to have installed, and any of their competitors and avoid them.
Make sure your app includes the options to register and unregister the extension. don't just automatically do it in the installation, make sure it's an option.
Remember unix/linux and Mac are case sensitive, so consider sticking to always all lower case by default.
Remember CD/DVD file naming rules are stricter, so don't use non alpha numeric characters.
Finally, remember that most non-tech users are going to have file extensions turned off, so don't stress about it too much.
There is more info here.
Wikipedia has lists of files extensions here (by type) and here (alphabetical), and also some general information
Depends on the platform, but in general, not very important for newer Operating Systems. Check the documentation for the platforms you're targeting.
I'm not aware of better alternatives to Google. Hopefully someone else has a better suggestion for this one.
Not unless you have some reason to do so. Examples would be "I want to ensure that Windows always opens this program with my app". I'm not sure that your users need to be concerned with the extension anyway. The default configuration on Windows, for example, is to hide extensions for known file types. BUT if you have a compelling reason (such as allowing your program to easily identify files it should be able to handle, for example) then you could use the extension, or you could come up with something else.
I have only ever once written a program where I thought I needed to come up with my own extension. I used my initials. Then later I realized I didn't really need a special extension and reverted to ".xml". However, most extensions seem to be something that seems to mean something. (.doc for documents, etc.) so something meaningful is a good idea if you do need to go this route.
It sure depends on the OSes you want to support, but people have globally moved over the 3-characters extension limit these days: .html is well used for webpages, for example.
Of course, if you go to much longer extensions, people will stop visually recognizing it as a file extension, I think...
Barring your needing to be compatible with a specific OS that you know still has the three-letter limitation, no need to keep it to three characters. It may be useful to have a three-character version of it if you end up supporting those platforms.
The Wikipedia list of file formats is pretty good. Some mime mapping lists will list common extensions associated with those mappings. Ray already mentioned FileInfo.com.
It's a convenience thing; I'd probably go with your own but document the fact that they're just gzipped XML files conforming to a specific DTD and make it easy for users to use .xml.gz instead. Be sure that your software doesn't care about the extension, so that users could even choose their own if they wanted, although I'd tend to avoid encouraging them to by providing a reasonable default.
I'd go for typeability, clarity, uniqueness, and brevity -- in that order. For instance, .config is a lot easier to type than .q2z but it falls down on uniqueness. (I'm not suggesting it for your app; it's an example.) Similarly, .q2z is just a pain. :-) So for instance, .chemstuff is easy to type and probably not in wide use elsewhere. (Again, not a suggestion, just an example.)
Have it as document_name.app_name.xml.gz where document_name and app_name are variables, the latter some easily readable and recognisable short string of your application's title.
Modern systems are quite flexible, and there is absolutely no need to drag the 3-character extensions further along in time with us.
I agree that .xml.gz would confuse users, however keep in mind that modern systems are moving into recognizing files not based on extensions but by probing their headers and even contents instead. In fact, users do not often even see the extensions. For gzipped XML files, a system may decide to first unpack the file stream in memory, then find out it is a literal XML file, then it may take its 'xmlns' as the application identifier. However, such systems are not yet widespread use. In any case, don't make the mistake of only opening files by extension - be smart and raise the bar - do exactly the above to find out if the file can be considered a document for your application.

What file type starts with BOSS 7?

I am looking at some files generated in the early 90s. One of them seems to hold references to data packed in some binary format in a number of large files.
The first six bytes of the file are 0x42 0x4f 0x53 0x53 0x20 0x37 which spells BOSS 7.
My searches of various sources of file type information, including /usr/share/file/magic have not turned up anything. Does anyone know what software might have been used to generate files that start with these bytes? Any information on file layout would be great.
It looks like the file might have been generated by VisualWorks Smalltalk:
[BOSS 7.5]
Contains the Binary Object Streaming Service, which supports efficient storage and
retrieval of objects, including code, to and from files.
Note that for code storage, the parcel system now supercedes BOSS.
I tried to load the file using the IDE provided at http://www.cincomsmalltalk.com/ and it generated a meaningful exception:
The identifier MediaCollectionDictionary has no binding
The file does contain:
MediaCollectionDictionary
MediaCollection*
CallMediaVehDict2
etc which means, if I could now figure out what the rest of the files do and learn enough SmallTalk, I could disentangle this mess.
Of course, I am not sure if this analysis is correct. So, please if you have any other ideas, let me know. Thank you.
Much later: So, my initial assessment seems to be correct. I got some useful tips on comp.lang.smalltalk: http://groups.google.com/group/comp.lang.smalltalk/browse_thread/thread/5d55d857e2f80158#
Ask on comp.lang.smalltalk
Ask on the vwnc mailing list