overhead of exec() call? - cgi

I have a web script that is a simple wrapper around a perl program:
#!/bin/sh
source ~/.myinit // pull in some configurations like [PRIVATE_DIR]
exec [PRIVATE_DIR]/myprog.pl
This is really just to keep the code better compartmentalized, since the main program (myprog.pl) runs on different machines with different configuration modes and it's cleaner to not have to update that for every installation. However, I would be willing to sacrifice cleanliness for efficiency.
So the question is, does this extra sh exec() call add any non-negligible overhead, considering it could be hit by the webserver quite frequently (we have 1000s of users on at a time)? I ask, because I know that people have gone to great lengths to embed programs into httpd to avoid having to make an extra fork/exec call. My assumption has been that this was due to the overhead of the program being called (eg mod_perl bypasses the extremely slow perl startup), not the process of calling itself. Would that be accurate?
I've tried to benchmark this but I can't get the resolution I need to see if it makes a difference. Any advice on that would also be appreciated.
Thanks!

Try a test?
Assuming you have a test environmentn separate from production.
(and this is a little vague, and real testers will be able to amplify how to improve on this)
Run your current 2 level design in a tight loop, using the same param for 100,000? 1,000,000? ...
Then 'fix' your code so the perl is called directly, with the same params as above, looping for same count.
Capture performance stats for both runs. The diff between the two should be (roughly) the cost of the extra call.
If this works out, I'd appreciate seeing the results on this:-)
(There are many more posts for tag=testing than tag=benchmark)
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer

It's easy for a perl program to take over some envars from a shell script, see the answer to Have perl execute shellscript & take over env vars.
Using this there is no need for exec at all which should alleviate all your worries about exec overhead. :-)

Related

RSpec errors when run en masse, but not individually

Unfortunately I don't have a specific question (or clues), but was hoping someone could point me in the right direction.
When I run all of my tests (rspec spec), I am getting two tests that fail specifically related to Delayed Job.
When I run this spec file in isolation (rspec ./spec/controllers/xxx_controller_spec.rb) all the tests pass...... Is this a common problem? What should I be looking for?
Thanks!
You are already mentioning it: isolation might be the solution. Usually I would guess that you have things in the database that are being changed and not cleaned up properly (or rather, are not not mocked properly).
In this case though I would suggest that, because the system is quite under a high workload, the delayed jobs are not being worked off fast enough. The challenge is with all asynchronous tasks that should be tested: you must not let the system run the delayed jobs, but mock the calls and just make sure that the delayed jobs have been received.
Sadly, with no examples, I can hardly point out the missing mocks. But make sure that all calls to delay_jobs and similar receive the correct data, but do not actually create and run those jobs - your specs will be faster, too. Make sure you isolate the function under test and not call external dependencies.

Can calls to printf() block?

How does printf() (or writes to stdout in general) work regarding flow control and buffering? Is printf() guaranteed to return in reasonable amount of time, or will it block until it was actually able to write something to stdout?
I'm thinking about a slow SSH connection or something like that. Can that actually slow down some program at a printf()? Will that occur immediately, or only after some buffer has been filled up (some internal buffer of stdout, or possibly even the TCP send buffer)? Does using SSH make some other difference, besides possibly increasing latency and decreasing output speed?
If so, how is that circumvented? Are threads a common solution to that problem, or is there some easier way to set a flag or do a call to switch stdout to "non-blocking mode"? Or do I have some basic facts about the workings of Unix I/O completely wrong and the question doesn't make any sense? :)
I'm asking mainly out of interest, but the issue has recently come up on two occasions during experimenting with C and the shell, and I wonder where and how it should be addressed in general:
When developing an application that potentially does lots of output to the terminal for debugging or information purposes, can slow terminals potentially cause performance issues? Should I (as a programmer) avoid constructs like (for each x in y: print y) for large lists?
Or is that a problem that should best be handled by the application user instead of the developer, by redirecting the output to a file etc.? If so, are there any useful shell idioms to decouple program output from the actual execution, like mytool | cat (I'm not sure if that particular example would actually change anything, but cat could be replaced by something more sophisticated).
Yes, calls to printf() could block. One example is stdout connected to a pipe with a slow reader at the other end.
Whether this is an issue or how best to mitigate it really depends on the nature of the application, and why it's an issue in the first place.

General question: Adding new test code to embedded system

this maybe will be off topic, but I am preparing for an exam in real time. And I have been browsing the book and Internet for an answer for a problem.
Basically I wonder if by adding additional test code if it may change the real time behavior for an embedded system, and or also if it will introduce new errors.
Anyone who might know the answer for this, or refer me to some reading material for it?
Your question is too general.. So I guess the default answer would be it depends.. But considering the possibilities as an exercise of logic and thought, yes it surely can!
There are many schemes available to guarantee the 'real-timeness' of an embedded system. For example, one can have a pre-emptive timer based ISR to service the real-time task.. In such a case, your test code could possibly not affect the 'real-timeness'.. But if the testing takes too long, and the context switches are not pre-emptive, you could get into trouble..
But again it depends on what you're testing and how you're testing. Your test code can possible mess with the timers, interrupts or the memory of system. The possibilities to mess up stuff if you're not careful are endless..
Having an OS underneath will prevent some errors, but again depending on how it works, you may be saved from bad 'test code'..
Yes, when you add code (test, diagnostic, statistic) it may change the real time behavior. It depends on the design, the implementation and the CPU power if it will actually change the behavior. You also have more lines of code and the probability for errors may increase. But I wouldn't say, "it will introduce errors", since it can introduce errors.
Yes it can. See How can adding data to a segment in flash memory screw up a program's timing? for an example of how even adding non-executable code can adjust timing enough to screw up a system.
Yea, changing your code base could totally change its timing. Consider if you dumped some debug output to a serial port, it takes time to call that function, format the data, and if the function is synchronous, then for it to wait for data to go out. This kinda stuff definitely changes system timing behavior.

Executing remote script - Architecture

I want to make an application that executes a remote script. The user can create a script (probabily a LUA script) then stores it in the server. Then he can uses an API for execute the script. I was thinking that API could be a webservice.
So my questions are:
I need high performance to execute the script. So my first choice was LUA script. Someone has another sugestion?
Cause I need high perfomance, I was thinking if the webservice is the best solution. Maybe I could create a TCP/IP Windows Service that hold the users request. It is important to say that I will have many user executing scripts at the same time. So I will have a concurrency problem.
My scripts will query in a database. I will use Tokyo Cabinet or Tokio Tyrant. I think Tokio Tyrant is the only solution cause I will have many requests. For perfomance, Do I need to make a connection pooling? Is there anyway to share variables between webservices requests?
To make the webservice or the Windows service i was thinking to use C++.
Can someone help with these questions?
thanks
Lua is pretty high performance for a scripting language, especially if you use LuaJIT or something similar.
You speak of high performance. How much are we speaking about? Say you have a very simple webservice that executes scripts it receives via POST, then probably the HTTP overhead is comparably small when compared to the Lua compile, environment setup & execution time.
About the database I cannot tell you anything. There's many possibilities to do pooling and this also depends on how you execute the Lua scripts. Are they running in a common environment? One per session? One per request?
C++ surely is a good choice to host Lua, because Lua fits in pretty well. Though there are other good language bindings as well.
But keep in mind that your job is not over by just sandboxing scripts. User submitted scripts can do a lot other Bad Things(TM), intentionally or by mistake, like allocating a lot of memory or hogging the CPU. In Lua (and I think this is true of many, if not all, sandboxed environments) you cannot do much about this, except killing the offending instance or, if you disallowed using coroutines in your sandbox, yield out of the offending coroutine and do something smarter.

When/how frequently should I test?

As a novice developer who is getting into the rhythm of my first professional project, I'm trying to develop good habits as soon as possible. However, I've found that I often forget to test, put it off, or do a whole bunch of tests at the end of a build instead of one at a time.
My question is what rhythm do you like to get into when working on large projects, and where testing fits into it.
Well, if you want to follow the TDD guys, before you start to code ;)
I am very much in the same position as you. I want to get more into testing, but I am currently in a position where we are working to "get the code out" rather than "get the code out right" which scares the crap out of me. So I am slowly trying to integrate testing processes in my development cycle.
Currently, I test as I code, trying to bust the code as I write it. I do find it hard to get into the TDD mindset.. Its taking time, but that is the way I would want to work..
EDIT:
I thought I should probably expand on this, this is my basic "working process"...
Plan what I want from the code,
possible object design, whatever.
Create my first class, add a huge comment to the top outlining
what my "vision" for the class is.
Outline the basic test scenarios.. These will basically
become the unit tests.
Create my first method.. Also writing a short comment explaining
how it is expected to work.
Write an automated test to see if it does what I expect.
Repeat steps 4-6 for each method (note the automated tests are in a huge list that runs on F5).
I then create some beefy tests to emulate the class in the working environment, obviously fixing any issues.
If any new bugs come to light following this, I then go back and write the new test in, make sure it fails (this also serves as proof-of-concept for the bug) then fix it..
I hope that helps.. Open to comments on how to improve this, as I said it is a concern of mine..
Before you check the code in.
First and often.
If I'm creating some new functionality for the system I'll be looking to initially define the interfaces and then write unit tests for those interfaces. To work out what tests to write consider the API of the interface and the functionality it provides, get out a pen and paper and think for a while about potential error conditions or ways to prove that it is doing the correct job. If this is too difficult then it's likely that your API isn't good enough.
In regards to the tests, see if you can avoid writing "integration" tests that test more than one specific object and keep them as "unit" test.
Then create a default implementation of your interface (that does nothing, returns rubbish values but doesn't throw exceptions), plug it into the tests to make sure that the tests fail (this tests that your tests work! :) ). Then write in the functionality and re-run the tests.
This mechanism isn't perfect but will cover a lot of simple coding mistakes and provide you with an opportunity to run your new feature without having to plug it into the entire application.
Following this you then need to test it in the main application with the combination of existing features.
This is where testing is more difficult and if possible should be partially outsourced to good QA tester as they'll have the knack of breaking things. Although it helps if you have these skills too.
Getting testing right is a knack that you have to pick up to be honest. My own experience comes from my own naive deployments and the subsequent bugs that were reported by the users when they used it in anger.
At first when this happened to me I found it irritating that the user was intentionally trying to break my software and I wanted to mark all the "bugs" down as "training issues". However after reflecting on it I realised that it is our role (as developers) to make the application as simple and reliable to use as possible even by idiots. It is our role to empower idiots and thats why we get paid the dollar. Idiot handling.
To effectively test like this you have to get into the mindset of trying to break everything. Assume the mantle of a user that bashes the buttons and generally attempts to destroy your application in weird and wonderful ways.
Assume that if you don't find flaws then they will be discovered in production to your companies serious loss of face. Take full responsibility for all of these issues and curse yourself when a bug you are responsible (or even part responsible) for is discovered in production.
If you do most of the above then you should start to produce much more robust code, however it is a bit of an art form and requires a lot of experience to be good at.
A good key to remember is
"Test early, test often and test again, when you think you are done"
When to test? When it's important that the code works correctly!
When hacking something together for myself, I test at the end. Bad practice, but these are usually small things that I'll use a few times and that's it.
On a larger project, I write tests before I write a class and I run the tests after every change to that class.
I test constantly. After I finish even a loop inside of a function, I run the program and hit a breakpoint at the top of the loop, then run through it. This is all just to make sure that the process is doing exactly what I want it to.
Then, once a function is finished, you test it in it's entirety. You probably want to set a breakpoint just before the function is called, and check your debugger to make sure that it works perfectly.
I guess I would say: "Test often."
I've only recently added unit testing to my regular work flow but I write unit tests:
to express the requirements for each new code module (right after I write the interface but before writing the implementation)
every time I think "it had better ... by the time I'm done"
when something breaks, to quantify the bug and prove that I've fixed it
when I write code which explicitly allocates or deallocates memory---I loath hunting for memory leaks...
I run the tests on most builds, and always before running the code.
Start with unit testing. Specifically, check out TDD, Test Driven Development. The concept behind TDD is you write the unit tests first, then write your code. If the test fails, you go back and re-work your code. If it passes, you move on to the next one.
I take a hybrid approach to TDD. I don't like to write tests against nothing, so I usually write some of the code first, then put the unit tests in. It's an iterative process, one which you're never really done with. You change the code, you run your tests. If there's any failures, fix and repeat.
The other sort of testing is integration testing, which comes along later in the process, and might typically be done by a QA testing team. In any case, integration testing addresses the need to test the pieces as a whole. It's the working product you're concerned with testing. This one is more difficult to deal with b/c it usually involves having automated testing tools (like Robot, for ex.).
Also, take a look at a product like CruiseControl.NET to do continuous builds. CC.NET is nice b/c it will run your unit tests with each build, notifying you immediately of any failures.
We don't do TDD here (though some have advocated it), but our rule is that you're supposed to check your unit tests in with your changes. It doesn't always happen, but it's easy to go back and look at a specific changeset and see whether or not tests were written.
I find that if I wait until the end of writing some new feature to test, I forget many of the edge cases that I thought might break the feature. This is ok if you are doing things to learn for yourself, but in a professional environment, I find my flow to be the classic form of: Red, Green, Refactor.
Red: Write your test so that it fails. That way you know the test is asserting against the correct variable.
Green: Make your new test pass in the easiest way possible. If that means hard-coding it, that's ok. This is great for those that just want something to work right away.
Refactor: Now that your test passes, you can go back and change your code with confidence. Your new change broke your test? Great, your change had an implication you didn't realize, now your test is telling you.
This rhythm has made me speed my development over time because I basically have a history compiler for all the things I thought that needed to be checked in order for a feature to work! This, in turn, leads to many other benefits, that I won't get to here...
Lots of great answers here!
I try to test at the lowest level that makes sense:
If a single computation or conditional is difficult or complex, add test code while you're writing it and ensure each piece works. Comment out the test code when you're done, but leave it there to document how you tested the algorithm.
Test each function.
Exercise each branch at least once.
Exercise the boundary conditions -- input values at which the code changes its behavior -- to catch "off by one" errors.
Test various combinations of valid and invalid inputs.
Look for situations that might break the code, and test them.
Test each module with the same strategy as above.
Test the body of code as a whole, to ensure the components interact properly. If you've been diligent about lower-level testing, this is essentially a "confidence test" to ensure nothing broke during assembly.
Since most of my code is for embedded devices, I pay particular attention to robustness, interaction between various threads, tasks, and components, and unexpected use of resources: memory, CPU, filesystem space, etc.
In general, the earlier you encounter an error, the easier it is to isolate, identify, and fix it--and the more time you get to spend creating, rather than chasing your tail.*
**I know, -1 for the gratuitous buffer-pointer reference!*