Scripting Engines & Obfuscation - scripting

I'm trying to obfuscate a project that uses the Rhino engine. It has many scripts that call upon methods from classes and I've been having difficulties making it work.
When it obfuscates, it changes the method names and thus makes the scripts call on inexistent class methods.
The problem is, I could keep those classes from being obfuscated, but there are quite a number of them and I'd rather obfuscate them, for security purposes.
Is there any way I could make this work?

What obfuscator are you using? Looks like you're using a commercial one that does total obfuscation.
Obfuscation works in two modes:
1) Total obfuscation -- which means that you'll need to obfuscate ALL the source files. For things that you need to remain unchanged (so that you can call it from an outside script, say) you'll need to use your obfuscation software's "export" or "extern" or "prevent" lists. And you have to do that one by one. The good thing about this is that, if you throw in all your code, you don't have anything that you don't want to change (because there is, by definition, no outside code)
2) File obfuscation -- which means that the obfuscator will only change local variable names and optimize statements, but will not change public-facing names. Therefore, you "outside scripts" will continue to work. Most minifiers work in this mode, but the obfuscation value is very limited.
You have to pick among these two modes. They roughly correspond to the Closure Compiler's Simple and Advanced modes.
If you need to obfuscate your code to prevent reverse engineering, then you must use total obfuscation, in which case you'll just have to do the amount of work to prevent changing of unwanted names, or as I said, throw in all your code.

Related

Test-Automation using MetaProgramming

i want to learn test automation using meta programming.i googled it could not find any thing.can anybody suggest me some resources where can i get info about "how to use Meta Programming for making test automation easy"?
That's a broad topic and not a lot has been written about it, because of the "dark corners" of metaprogramming.
What do you mean by "metaprogramming"?
As background, I consider metaprogramming to be any activity in which a tool (which we call a "metaprogramming tool") is used to inspect or modify the application software to achieve some effect.
Many people consider "reflection" to be a kind of metaprogramming; other consider (C++-style) templates to be metaprogramming; some suggest aspect-oriented programming.
I sort of agree but think these are weak versions of what you want, because each has severe limits on what it can see or do to source code. What you really want is a metaprogramming tool that has access to everything in your source program (yes, comments too!) Such tools are called Program Transformation Systems (PTS); they work by parsing the source code and operating on the parsed representation of the program. (I happen to build one of these, see my bio). PTSes can then analyze the code accurate, and/or make reliable changes to the code and regenerate valid source with the changes. PS: a PTS can implement all those other metaprogramming techniques as special cases, so it is strictly more general.
Where can you use metaprogramming for testing?
There are at least 2 areas in which metaprogramming might play a role:
1) Collection of information from tests
2) Generation of tests
3) Avoidance of tests
Collection.
Collection of test results depends on the nature of tests. Many tests are focused on "is this white/black box functioning correctly"? Assuming the tests are written somehow, they have to have access to the box under test,
be able to invoke that box in a realistic ways, determine if the result is correct, and often tabulate the results to that post-testing quality assessments can be made.
Access is the first problem. The black box to be tested may not be easily accessible to a testing framework: driven by a UI event, in a non-public routine, buried deep inside another function where it hard to get at.
You may need metaprogramming to "temporarily" modify the program to provide access to the box that needs testing (e.g., change a Private method to Public so it can be called from outside). Such changes exist only for the duration of the test project; you throw the modified program away because nobody wants it for anything but the test results. Yes, you have to ensure that the code transformations applied to make things visible don't change the program functionality.
The second problem is exercising the targeted black box in a realistic environment. Each code module runs in a world in which it assumes data and the environment are "properly" configured. The test program can set up that world explicitly by making calls on lots of the program elements or using its own custom code; this is usually the bulk of a test routine, and this code is hard to write and fragile (the application under test keeps changing; so do its assumptions about the world). One might use metaprogramming to instrument the application to collect the environment under which a test might need to run, thus avoiding the problem of writing all the setup code.
Finally, one might want to record more than just "test failed/passed". Often it is useful to know exactly what code got tested ("test coverage"). One can instrument the application to collect what-got-executed data; here's how to do it for code blocks: http://www.semdesigns.com/Company/Publications/TestCoverage.pdf using a PTS. More sophisticated instrumentation might be used to capture information about which paths through the code have been executed. Uncovered code, and/or uncovered paths, show where tests have not been applied and you arguably know nothing about what the program does, let alone whether it is buggy in a straightforward way.
Generation of tests
Someone/thing has to produce tests; we've already discussed how to produce the set-up-the-environment part. What about the functional part?
Under the assumption that the program has been debugged (e.g, already tested by hand and fixed), one could use metaprogramming to instrument the code to capture the results of execution of a black box (e.g., instance execution post-conditions). By exercising the program, one can then produce (by definition) "correctly produces" results which can be transformed into a test. In this way, one might construct a huge variety of regression tests for an existing program; these will be valuable in verifying the further enhancements to the program don't break most of its functionality.
Often a function has qualitatively different behaviors on different ranges of input (e.g., for x<10, produced x+1, else produces x*x). Ideally one would like to provide a test for each qualitively different results (e.g, x<10, x>=10) which means one would like to partition the input ranges. Metaprogrammning can help here, too, by enumerating all (partial) paths through module, and providing the predicate that controls each path.
The separate predicates each represent the input space partition of interest.
Avoidance of Tests
One only tests code one does not trust (surely you aren't testing the JDK?) Any code consructed by a reliable method doesn't need tests (the JDK was constructed this way, or at least Oracle is happy to have you beleive it).
Metaprogramming can be used to automatically generate code from specifications or DSLs, in relaible ways. Such generated code is correct-by-construction (we can argue about what degree of rigour), and doesn't need tests. You might need to test that DSL expression achieves the functionaly you desired, but you don't have to worry about whether the generated code is right.

Converting Actionscript syntax to Objective C

I have a game I wrote in Actionscript 3 I'm looking to port to iOS. The game has about 9k LOC spread across 150 classes, most of the classes are for data models, state handling and level generation all of which should be easy to port.
However, the thought of rejiggering the syntax by hand across all these files is none too appealing. Are there tools that can help me speed up this process?
I'm not looking for a magical tool here, nor am I looking for a cross compiler, I just want some help converting my source files.
I don't know of a tool, but this is the way I'd try and attack your problem if there really is a lot of (simple) code to convert. I'm sure my suggestion is not that useful on parts of the code that are very flash-specific (all the DisplayObject stuff?) and also not that useful on lots of your logic. But it would be fun to build! :-)
Partial automatic conversion should be possible, especially if the objects are just 'data containers', watch out for bringing too much as3-idiom over to objective-c though, it might not always be a good fit.
Unless you want to create your own (semi) parser for as3 you'd need some sort of a parser, apparently FlexPMD has one (never used it), and there probably are others.
After getting your hands on a parser you have to find some way of suggesting to the system what parts could be converted automatically. You could try and add rules to the parser/generator script for the general case. For more specific cases I'd use custom metadata on the actual class/property/method, assuming a real as3 parser would correctly parse those.
Now part of your work will shift from hand-converting files to hand-annotating files, but that might be ok for you.
Have the parser parse your classes and define actions based on your metadata that will determine what kind of objective-c class to generate. If you get this working it could at least get you all your classes, their simple properties and method signatures (getting the body of the methods converted might be a bit too much to ask but you could include it as a comment so you'd have a nice reference while hand-translating).
PS: if you make this into a one way process be very sure you don't need to re-generate it later - it would be bad if you find out that you have been modifying the generated code and somehow need to re-generate all those classes -- that would mean you'll have to redo all your hard work!
I've started putting a tool together to take the edge off the menial aspects of this process.
I'm trying to figure out if there's enough interest to make it clean and stable enough to release for others to use. I may just do it anyway.
http://meanwhileatthelab.blogspot.com.au/2012/08/automating-process-of-converting-as3-to.html
It's so far saving me a lot of time while porting one of my fairly large games from AS3 to objc.
Check out the Sparrow Framework. It's purported to be designed with Actionscript developers in mind, recreating classes that sort of emulate display list and things like that. You'll have to dive into some "rejiggering" for sure no matter what you do if you don't want to use the CS5 packager.
http://www.sparrow-framework.org/
even if some solution exists, note that architectural logic is DIFFERENT, and many more other details.
Anyway even if posible, You will have a strange hybrid.
I am coming back from WWDC2012, and the message is (as always..) performance anf great user experience.
So You should rewrite using a different programming model.

How to Decide when to Implement a DLL?

At which point do you decide that some of your subroutines and common code should be placed in a class library or DLL? In one of my applications, I would like to share some of my common code between different projects (as we all know, it's a programming sin to duplicate code).
The vast majority of my code is all within a single project. I also have one small utility that's partitioned from the main executable that runs with elevated permissions for a sole purpose. The two items have, at most, three subroutines in common. Should these common subroutines be placed and called from a class library? How do you decide when to do this? When you have at least one shared subroutine? Twenty-plus lines of code?
I don't believe that this should be language specific or framework dependent, but if so, I'm using the .NET framework.
There's more ways to share code between applications than with a DLL. From what it sounds like, you're not talking about a lot of code, so it sounds like you don't need to worry about it too much.
In general, I use the following rule of thumb:
For trivial code duplication (a couple simple 1-2 line functions, that are easy to understand and debug) I'll just copy and paste the code.
For more complicated functions (a small library of stand-alone helper functions, contained in a file or two, which require a modest level of maintenance and debugging) I'll simply include the file in both projects (either by linking, or defining a subrepository, or something like that).
For more extensive code sharing (a group of interrelated classes, or a database communication layer, which is useful for multiple projects) I'll refactor them out into a standalone library, and package and distribute them using whatever's appropriate for whatever I'm programming in.
Because the complexity of managing your code increases by an order of magnitude for each step (when you're packaging DLLs for multiple projects you now need to think about versioning issues) you only want to move to the next level when you need to. It doesn't sound like you're feeling the pain of handling your common code yet, and if that's the case there's no real need.
If code is shared between multiple applications, then it has to reside in a DLL or class library.
For a larger application you might also want to break different subsystems of the application into separate libraries. That way each project can focus on one particular task. This simplifies the structure of your application and makes it easier to find any one piece of code. For example, you might have a GUI application with different DLL's (.NET projects) for:
Working with a specific Network protocol
Accessing common code, for example utility classes
Access to legacy code (via PInvoke)
etc...

Are code generators bad?

I use MyGeneration along with nHibernate to create the basic POCO objects and XML mapping files. I have heard some people say they think code generators are not a good idea. What is the current best thinking? Is it just that code generation is bad when it generates thousands of lines of not understandable code?
Code generated by a code-generator should not (as a generalisation) be used in a situation where it is subsequently edited by human intervention. Some systems such the wizards on various incarnations of Visual C++ generated code that the programmer was then expected to edit by hand. This was not popular as it required developers to pick apart the generated code, understand it and make modifications. It also meant that the generation process was one shot.
Generated code should live in separate files from other code in the system and only be generated from the generator. The generated code code should be clearly marked as such to indicate that people shouldn't modify it. I have had occasion to do quite a few code-generation systems of one sort or another and All of the code so generated has something like this in the preamble:
-- =============================================================
-- === Foobar Module ===========================================
-- =============================================================
--
-- === THIS IS GENERATED CODE. DO NOT EDIT. ===
--
-- =============================================================
Code Generation in Action is quite a good book on the subject.
Code generators are great, bad code is bad.
Most of the other responses on this page are along the lines of "No, because often the generated code is not very good."
This is a poor answer because:
1) Generators are tool like anything else - if you misuse them, dont blame the tool.
2) Developers tend to pride themselves on their ability to write great code one time, but you dont use code generators for one off projects.
We use a Code Generation system for persistence in all our Java projects and have thousands of generated classes in production.
As a manager I love them because:
1) Reliability: There are no significant remaining bugs in that code. It has been so exhaustively tested and refined over the years than when debugging I never worry about the persistence layer.
2) Standardisation: Every developers code is identical in this respect so there is much less for a guy to learn when picking up a new project from a coworker.
3) Evolution: If we find a better way to do things we can update the templates and update 1000's of classes quickly and consistently.
4) Revolution: If we switch to a different persistence system in the future then the fact that every single persistent class has an exactly identical API makes my job far easier.
5) Productivity: It is just a few clicks to build a persistent object system from metadata - this saves thousands of boring developer hours.
Code generation is like using a compiler - on an individual case basis you might be able to write better optimised assembly language, but over large numbers of projects you would rather have the compiler do it for you right?
We employ a simple trick to ensure that classes can always be regenerated without losing customisations: every generated class is abstract. Then the developer extends it with a concrete class, adds the custom business logic and overrides any base class methods he wants to differ from the standard. If there is a change in metadata he can regenerate the abstract class at any time, and if the new model breaks his concrete class the compiler will let him know.
The biggest problem I've had with code generators is during maintenance. If you modify the generated code and then make a change to your schema or template and try to regenerate you can have problems.
One problem is if the tool doesn't allow you to protect changes you've made to the modified code then your changes will be overwritten.
Another problem I've seen, particularly with code generators in RSA for web services, if you change the generated code too much the generator will complain that there is a mismatch and refuse to regenerate the code. This can happen for something as simple as changing the type of a variable. Then you are stuck generating the code to a different project and merging the results back into your original code.
Code generators can be a boon for productivity, but there are a few things to look for:
Let you work the way you want to work.
If you have to bend your non-generated code to fit around the generated code, then you should probably choose a different approach.
Run as part of your regular build.
The output should be generated to an intermediates directory, and not be checked in to source control. The input must be checked in to source control, however.
No install
Ideally, you check the tool in to source control, too. Making people install things when preparing a new build machine is bad news. For example, if you branch, you want to be able to version the tools with the code.
If you must, make a single script that will take a clean machine with a copy of the source tree, and configure the machine as required. Fully automated, please.
No editing output
You shouldn't have to edit the output. If the output isn't useful enough as-is, then the tool isn't working for you.
Also, the output should clearly state that it is a generated file & should not be edited.
Readable output
The output should be written & formatted well. You want to be able to open the output & read it without a lot of trouble.
#line
Many languages support something like a #line directive, which lets you map the contents of the output back to the input, for example when producing compiler error messages or when stepping in the debugger. This can be useful, but it can also be annoying unless done really well, so it's not a requirement.
My stance is that code generators are not bad, but MANY uses of them are.
If you are using a code generator for time savings that writes good code, then great, but often times it is not optimized, or adds a lot of overhead, in those cases I think it is bad.
Code generation might cause you some grief if you like to mix behaviour into your classes. An equally productive alternative might be attributes/annotations and runtime reflection.
Compilers are code generators, so they are not inherently bad unless you only like to program in raw machine code.
I believe however that code generators should always completely encapsulate the generated code. I.e. you should never have to modify the generated code by hand, any change should be done by modifying the input to the generator and regenerate the code.
If its a mainframe cobol code generator that Fran Tarkenton is trying to sell you then absolutely yes!
I've written a few code generators before - and to be honest they saved my butt more than once!
Once you have a clearly defined object - collection - user control design, you can use a code generator to build the basics for you, allowing your time as a developer to be used more effectively in building the complex stuff, after all, who really wants to write 300+ public property declarations and variable instatiations? I'd rather get stuck into the business logic than all the mindless repetitive tasks.
The mistake many people make when using code generation is to edit the generated code. If you keep in mind that if you feel like you need to edit the code, you actually need to be editing the code generation tool it's a boon to productivity. If you are constantly fighting the code that gets generated it's going to end up costing productivity.
The best code generators I've found are those that allow you to edit the templates that generate the code. I really like Codesmith for this reason, because it's template-based and the templates are easily editable. When you find there is a deficiency in the code that gets generated, you just edit the template and regenerate your code and you are forever good after that.
The other thing that I've found is that a lot of code generators aren't super easy to use with a source control system. The way we've gotten around this is to check in the templates rather than the code and the only thing we check into source control that is generated is a compiled version of the generated code (DLL files, mostly). This saves you a lot of grief because you only have to check in a few DLLs rather than possibly hundreds of generated files.
Our current project makes heavy use of a code generator. That means I've seen both the "obvious" benefits of generating code for the first time - no coder error, no typos, better adherence to a standard coding style - and, after a few months in maintenance mode, the unexpected downsides. Our code generator did, indeed, improve our codebase quality initially. We made sure that it was fully automated and integrated with our automated builds. However, I would say that:
(1) A code generator can be a crutch. We have several massive, ugly blobs of tough-to-maintain code in our system now, because at one point in the past it was easier to add twenty new classes to our code generation XML file, than it was to do proper analysis and class refactoring.
(2) Exceptions to the rule kill you. We use the code generator to create several hundred Screen and Business Object classes. Initially, we enforced a standard on what methods could appear in a class, but like all standards, we started making exceptions. Now, our code generation XML file is a massive monster, filled with special-case snippets of Java code that are inserted into select classes. It's nearly impossible to parse or understand.
(3) Since so much of our code is generated, using values from a database, it's proven difficult for developers to maintain a consistent code base on their individual workstations (since there can be multiple versions of the database). Debugging and tracing through the software is a lot harder, and newbies to the team take much longer to figure out the "flow" of the code, because of the extra abstraction and implicit relationships between classes. IDE's cannot pick up relationships between two classes that communicate via a code-generated class.
That's probably enough for now. I think Code Generators are great as part of a developer's individual toolkit; a set of scripts that write out your boilerplate code make starting a project a lot easier. But Code Generators do not make maintenance problems go away.
In certain (not many) cases they are useful. Such as if you want to generate classes based on lookup-type data in the database tables.
Code generation is bad when it makes programming more difficult (IE, poorly generated code, or a maintenance nightmare), but they are good when they make programming more efficient.
They probably don't always generate optimal code, but depending on your need, you might decide that developer manhours saved make up for a few minor issues.
All that said, my biggest gripe with ORM code generators is that maintenance the generated code can be a PITA if the schema changes.
Code generators are not bad, but sometimes they are used in situations when another solution exists (ie, instantiating a million objects when an array of objects would have been more suitable and accomplished in a few lines of code).
The other situation is when they are used incorrectly, or coded badly. Too many people swear off code generators because they've had bad experiences due to bugs, or their misunderstanding of how to correctly configure it.
But in and of themselves, code generators are not bad.
-Adam
They are like any other tool. Some give beter results than others, but it is up to the user to know when to use them or not. A hammer is a terrible tool if you are trying to screw in a screw.
This is one of those highly contentious issues. Personally, I think code generators are really bad due to the unoptimized crap code most of them put out.
However, the question is really one that only you can answer. In a lot of organizations, development time is more important than project execution speed or even maintainability.
We use code generators for generating data entity classes, database objects (like triggers, stored procs), service proxies etc. Anywhere you see lot of repititive code following a pattern and lot of manual work involved, code generators can help. But, you should not use it too much to the extend that maintainability is a pain. Some issues also arise if you want to regenerate them.
Tools like Visual Studio, Codesmith have their own templates for most of the common tasks and make this process easier. But, it is easy to roll out on your own.
It can really become an issue with maintainability when you have to come back and cant understand what is going on in the code. Therefore many times you have to weigh how important it is to get the project done fast compared to easy maintainability
maintainability <> easy or fast coding process
I use My Generation with Entity Spaces and I don't have any issues with it. If I have a schema change I just regenerate the classes and it all works out just fine.
They serve as a crutch that can disable your ability to maintain the program long-term.
The first C++ compilers were code generators that spit out C code (CFront).
I'm not sure if this is an argument for or against code generators.
I think that Mitchel has hit it on the head.
Code generation has its place. There are some circumstances where it's more effective to have the computer do the work for you!
It can give you the freedom to change your mind about the implementation of a particular component when the time cost of making the code changes is small. Of course, it is still probably important to understand the output the code generator, but not always.
We had an example on a project we just finished where a number of C++ apps needed to communicate with a C# app over named pipes. It was better for us to use small, simple, files that defined the messages and have all the classes and code generated for each side of the transaction. When a programmer was working on problem X, the last thing they needed was to worry about the implentation details of the messages and the inevitable cache hit that would entail.
This is a workflow question. ASP.NET is a code generator. The XAML parsing engine actually generates C# before it gets converted to MSIL. When a code generator becomes an external product like CodeSmith that is isolated from your development workflow, special care must be taken to keep your project in sync. For example, if the generated code is ORM output, and you make a change to the database schema, you will either have to either completely abandon the code generator or else take advantage of C#'s capacity to work with partial classes (which let you add members and functionality to an existing class without inheriting it).
I personally dislike the isolated / Alt-Tab nature of generator workflows; if the code generator is not part of my IDE then I feel like it's a kludge. Some code generators, such as Entity Spaces 2009 (not yet released), are more integrated than previous generations of generators.
I think the panacea to the purpose of code generators can be enjoyed in precompilation routines. C# and other .NET languages lack this, although ASP.NET enjoys it and that's why, say, SubSonic works so well for ASP.NET but not much else. SubSonic generates C# code at build-time just before the normal ASP.NET compilation kicks in.
Ask your tools vendor (i.e. Microsoft) to support pre-build routines more thoroughly, so that code generators can be integrated into the workflow of your solutions using metadata, rather than manually managed as externally outputted code files that have to be maintained in isolation.
Jon
The best application of a code generator is when the entire project is a model, and all the project's source code is generated from that model. I am not talking UML and related crap. In this case, the project model also contains custom code.
Then the only thing developers have to care about is the model. A simple architectural change may result in instant modification of thousands of source code lines. But everything remains in sync.
This is IMHO the best approach. Sound utopic? At least I know it's not ;) The near future will tell.
In a recent project we built our own code generator. We generated all the data base stuff, and all the base code for our view and view controller classes. Although the generator took several months to build (mostly because this was the first time we had done this, and we had a couple of false starts) it paid for itself the first time we ran it and generated the basic framework for the whole app in about ten minutes.
This was all in Java, but Ruby makes an excellent code-writing language particularly for small, one-off type projects.
The best thing was the consistency of the code and the project organization. In addition you kind of have to think the basic framework out ahead of time, which is always good.
Code generators are great assuming it is a good code generator. Especially working c++/java which is very verbose.

How do you organize code in embedded projects?

Highly embedded (limited code and ram size) projects pose unique challenges for code organization.
I have seen quite a few projects with no organization at all. (Mostly by hardware engineers who, in my experience are not typically concerned with non-functional aspects of code.)
However, I have been trying to organize my code accordingly:
hardware specific (drivers, initialization)
application specific (not likely to be reused)
reusable, hardware independent
For each module I try to keep the purpose to one of these three types.
Due to limited size of embedded projects and the emphasis on performance, it is often keep this organization.
For some context, my current project is a limited DSP application on a MSP430 with 8k flash and 256 bytes ram.
I've written and maintained multiple embedded products (30+ and counting) on a variety of target micros, including MSP430's. The "rules of thumb" I have been most successful with are:
Try to modularize generic concepts as much as possible (e.g. separate driver code from application code). -- It makes for easier maintenance and reuse/porting of a project to another target micro in the future.
DO NOT start by worrying about optimized code at the very beginning. Try to solve the domain's problem first and optimize second. -- Your target micro can handle a lot more "stuff" than you might expect.
Work to ensure readability. Although most embedded projects seem to have short development-cycles, the projects often live longer than you might expect and another developer will undoubtedly have to work with your code.
I've worked on 8-bit PIC processors with similar limitations.
One restriction you don't have is how many comments you make or what you choose to name your methods, variables, etc.. Take advantage. Speed and size constraints do sometimes trump organization, but you can always explain.
Another tip is to break up a logical source file into even more pieces than you need, then bind them by #includeing them in a compilation unit. This allows you to have lots of reusable code (even one routine per file) but combine in whatever order you need. This is useful e.g. when trying to meet compilation unit size restrictions, or to pick and choose which common subroutines you need on the next project.
I try to organize it as if I had unlimited RAM and ROM, and it usually works out fine. As mentioned elsewhere, do not try to optimize it until you absolutely need to.
If you can get a pin-compatible processor that has more resources, it's better to get it working on that, concentrating on good structure and layout, then optimize for size later when you understand the code better.
Except under exceptional circumstances (see note), the organisation of your code will have no impact on the final product. (contents of the code are obviously a different matter)
So with that in mind you should organise your code as you would any other project.
With that said, the following are fairly typical:
If this is a processor that you've worked on before, or will be working on in the future, you will usually want to keep a dedicated hardware abstraction layer that can be shared between projects in the future. Typically this module would contain items like routines for managing any uarts, timers etc.
Usually it's reasonable to maintain a set of platform specific code for initialisation and setup that performs all of the configuration and initialisation up to the point where your executive takes over and runs your application. It will also include platform specific hal routines.
The executive/application is probably maintained as a separate module. All of the hardware specific code should be hidden in the hal (as mentioned above).
By splitting your code up like this you also have the option of compiling and running your application as a simulation, on a completely different platform, just by replacing the hardware specific code with routines that mimic the hardware.
This can be good for unit testing and debugging and algorithmic problems you might have.
Exceptional circumstances as might be imposed by unusual compiler restrictions. eg. I've come across some compilers that expect all interrupt service routines to be compiled within a single object file.
I've worked with some sensors like the Tmote Sky, I too have seen poor organization, and I have to admit i have contributed to it. Anyway I'd say that some confusion has to be, because loading too much modules or too much part of program will be (imho) resource killing too, so try to be aware of a threshold between organization and usability on the low resources.
Obviously this don't mean let caos begin, but for example try to get a look on the organization of the tinyOS source code and applications, it's an idea on what I'm trying to say.
Although it is a bit painful, one organization technique that is somewhat common with embedded C libraries is to split every single function and variable into a separate C source file, and then aggregate the resulting collection of O files into a library file.
The motivation for doing this is that for most normal linkers the unit of linkage is an object, for every object you either get the whole object or none of it. Since there is a 1-1 relationship between C files and object files, putting each symbol in it's own C file gives each one it's own object. This in turn lets the linker pull in only that subset of functions and variables that are actually used.
This sort of game doesn't help at all for headers they can happily be left as single files.