Implementation differences between mod_python and mod_wsgi - mod-wsgi

I have an webapplication that was originall implemented written with mod_python in mind.
I have been able to port it to mod_wsgi, but I worry that there might be some subtle differences in the way that global variables or other features are handled.
Most of the information that I have seen on mod_python and mod_wsgi is a comparison of the memory footprint, such as the comparison here:
https://modwsgi.readthedocs.org/en/latest/frequently-asked-questions/
Are there any implementation issues in the two apache modules that will cause differences in behaviour between mod_python and mod_wsgi.

You will have to enumerate what you mean by 'other features'. As to global variables, it is explained in:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
That document was effectively a cut and paste, with some extra details, of what also occurred in mod_python. So how to deal with global and cross process data is more or less the same.

Related

$r Apache request in CGI?

In trying to find a solution for another question (passing variables in cgi) I ran into the concept of apache_notes which then led me to a series of modules on CPAN that use the mysterious $rvariable (both of which I have run into before while looking other stuff up, but did not delve too deeply into). None of the modules really explain where $r comes from, they just all assume you have it readily available. From what I can understand the variable has something to do with mod_perl (which I also know little to nothing about) and/or apache handlers.
So my question is, is there a way to get that $r apache request variable in CGI script calls, without the use of mod_perl or apache handler definitions?
Also, if there is a way to access apache_notes in CGI, I think that would answer the aforementioned question, and an answer there for how to achieve this would be greatly appreciated!)
No. It's the Apache (libapr) request object. If you're not embedded in Apache, there's no Apache request object.
But there's no use for notes in a CGI anyway; there isn't any other piece of code that the script could be sharing information with by hanging it off of the request object, so you can just use a variable instead.

Should .htaccess be versioned in source control?

If you follow the principle that an application should run "out of the box" when pulled out of version control, .htaccess should be included. Something doesn't feel right about that though, as it doesn't really feel part of the application. I'm conflicted, could someone put my mind at rest?
I typically do keep an application's .htaccess in source control which includes the Apache configuration required for the application to run, that is, rewrite rules which are not specific to the server it is running on, access to environment variables, etc.
Think about it this way - if the .htaccess file contains rewrite rules used by your application, those are effectively functioning as part of the application routing and therefore are part of the application.
If you are packaging an application for others to download and use, you should probably include a skeletal .htaccess file which includes the rules needed to make the application run. If your application is only intended to run on your own server and you keep all its relevant Apache config in .htaccess other than VirtualHost configuration, I would say it does indeed belong in source control.
This is a common problem for us, too, and I know what you mean about 'not feeling right'.
However, for me the issue is not about being feeling that .htaccess is not part of the application, as bits of it clearly are. The issue is more that that the file mixes application-specific code (routing, etc.) with installation-specific stuff (local URL rewrites, authentication/access rules, etc.). Ideally you would version control the application-specific rules but not the installation-specific rules, but of course this is not possible as both need to be in the same file.
I can think of four approaches to managing this. The optimal one will depend on your particular situation (and may vary from project-to-project):
Michael's second suggestion is the best option, if you control the deployments. Namely, keep application-specific code in the .htaccess file, under version control, and any installation-specific code in your main Apache VirtualHost directive. This gives full separation but is not a viable solution if you don't have direct access to the main Apache config or are distributing to third-parties.
Version control the application-specific elements of your .htaccess file, with clear comment markers, e.g. ### FOO APPLICATION SETTINGS - DO NOT CHANGE ### and ### ADD ANY ADDITIONAL LOCAL CONFIG BELOW THIS LINE #### and do not version anything that is installation-specific. This is fine if the rules are simple enough to not really cause conflicts, but is not great if your skeletal file requires users to modify existing lines, as you will likely end up with merge conflicts down the line. It also runs the risk of unwanted edits getting into your repository if deployments are version-controlled (as is probably the case for dev versions, at least) and it runs the risks of local changes being blown away by an upgrade if not (e.g. public zipfile distributions).
What we settled on as a good alternative is to version control a file called .htaccess.sample which can contain both application-specific rules and (where relevant) suggestions for installation-specific rules that users may find useful. This is easy to maintain and deploy whether or not version control is being used, but makes upgrades slightly harder as users will need to hand-merge any changes into their local .htaccess file. In our case, upgrades are always done by developers therefore this is not a big issue, assuming appropriate diff tools are available.
Same as #3, except you also provide a script to automatically apply any modifications. More work to set up, not necessarily more work to maintain, depending how you implement it (e.g. if it just replaces a block of code between some markers) and possibly worthwhile if you are distributing to a wide user base. This is what Wordpress does, for example.

How to make XAMPP (Apache; lookups) faster on Windows 7?

When using XAMPP (1.7.5 Beta) under Windows 7 (Ultimate, version 6.1, build 7600), it takes several seconds before pages actually show up. During these seconds, the browser shows "Waiting for site.localhost.com..." and Apache (httpd.exe, version 2.2.17) has 99% CPU load.
I have already tried to speed things up in several ways:
Uncommented "Win32DisableAcceptEx" in xampp\apache\conf\extra\httpd-mpm.conf
Uncommented "EnableMMAP Off" and "EnableSendfile Off" in xampp\apache\conf\httpd.conf
Disabled all firewall and antivirus software (Windows Defender/Windows Firewall, Norton AntiVirus).
In the hosts file, commented out "::1 localhost" and uncommented "127.0.0.1 localhost".
Executed (via cmd): netsh; interface; portproxy; add v6tov4 listenport=80 connectport=80.
Even disabled IPv6 completely, by following these instructions.
The only place where "HostnameLookups" is set, is in xampp\apache\conf\httpd-default.conf, to: Off.
Tried PHP in CGI mode by commenting out (in httpd-xampp.conf): LoadFile "C:/xampp/php/php5ts.dll" and LoadModule php5_module modules/php5apache2_2.dll.
None of these possible solutions had any noticeable effect on the speed. Does Apache have difficulty trying to find the destination host ('gethostbyname')? What else could I try to speed things up?
Read over Magento's Optimization White Paper, although it mentions enterprise the same methodologies will and should be applied. Magento is by no means simplistic and can be very resource intensive. Like some others mentioned I normally run within a Virtual Machine on a LAMP stack and have all my optimization's (both at server application levels and on a Magento level) preset on a base install of Magento. Running an Opcode cache like eAccelerator or APC can help improve load times. Keeping Magento's caching layers enabled can help as well but can cripple development if you forget its enabled during development, however there are lots of tools available that can clear this for you from a single command line or a tool like Alan Storms eCommerce Bug.
EDIT
Optimization Whitepaper link:
https://info2.magento.com/Optimizing_Magento_for_Peak_Performance.html
Also, with PHP7 now including OpCache, enabling it with default settings with date/time checks along with AOE_ClassPathCache can help disk I/O Performance.
If you are using an IDE with Class lookups, keeping a local copy of the code base you are working on can greatly speed up indexing in such IDEs like PHPStorm/NetBeans/etc. Atwix has a good article on Docker with Magento:
https://www.atwix.com/magento/docker-development-environment/
Some good tools for local Magento 1.x development:
https://github.com/magespecialist/mage-chrome-toolbar
https://github.com/EcomDev/EcomDev_LayoutCompiler.git
https://github.com/SchumacherFM/Magento-OpCache.git
https://github.com/netz98/n98-magerun
Use a connection profiler like Chrome's to see whether this is actually a lookup issue, or whether you are waiting for the site to return content. Since you tagged this question Magento, which is known for slowness before you optimize it, I'm guessing the latter.
Apache runs some very major sites on the internets, and they don't have several second delays, so the answer to your question about Apache is most likely no. Furthermore, DNS lookup happens between your browser and a DNS server, not the target host. Once the request is sent to the target host, you wait for a rendered response from it.
Take a look at the several questions about optimizing Magento sites on SO and you should get some ideas on how to speed your site up.

What was the evolution of interaction paradigm between web server program and content provider program?

In my opinion, web server is responsible to deliver content to client. If it is static content like pictures and static html document, web server just deliver them as bitstream directly. If it is some dynamic content that is generated during processing client's request, the web server will not generate the conetnt itself but call some external proram to genearte the content.
AFAIK, this kind of dynamice content generation technologies include the following:
CGI
ISAPI
...
And from here, I noticed that:
...In IIS 7, modules replace ISAPI
filters...
Is there any others? Could anyone help me complete the above list and elabrate on or show some links to their evolution? I think it would be very helpful to understand application such as IIS, TomCat, and Apache.
I once wrote a small CGI program, and though it serves as a content generator, it is still nothing but a normal standalone program. I call it normal because the CGI program has a main() entry point. But with the recenetly technology like ASP.NET, I am not writing complete program, but only some class library. Why does such radical change happens?
Many thanks.
well, the biggest missing piece in your question is that you can have the webserver generating the content dynamically as well. This is common with most platforms outside of PHP and Perl. You often set that website behind apache or nginx used as a proxy, but it doesn't "call an external progam" in any reasonable sense, it forwards the http request to the proxied server. This is mostly done so you can have multiple sites on the same server, and also so you can have apache/nginx protect you against incorrect requests.
But sure, we can, for the sake of the question, say that "proxying" is a way to call an external program. :-)
Another way to "call the external program" is Pythons WSGI, where you do call a permanently running server. So again you don't start an external program, it's more like calling the module in ASP (although it's a separate program, not a module, but you don't start it with every request, you use an API).
The change from calling external programs as in CGI to calling modules like in ASP.NET, process with WGI or proxying to another webserver happened because with CGI you have to start a new prpogram for each request. The PERL/PHP interpreter needs to be laoded into memory, and all modules they use as well. This quickly becomes very heavy and process/memory intensive.
Therefore, to be able to use bigger systems that are permanently running, other techniques have been developed. Most of them are platform/language dependent, and the only one that is platform independent is really to make a complete webserver and then use apache/nginx as a proxy in front (in which case the apache/nginx strictly isn't necessary any more).
I hope this cleared things up a bit.
fastcgi and wsgi are two more interfaces content generators can use to talk to a webserver -- the reason more recent interfaces aren't complete programs is that forking and executing things that expect to be executables is costly.
OTOH, writing your little generator in such a way that it doesn't leak anything between invocations is harder than having the liberty to just exit at the end (and rely on environment variables and command line arguments like a normal executable).
This is all for performance reasons, but then you have more complicated content generators and process management in the webservers.

What is the purpose of the sub-interpreter API in CPython?

I'm unclear on why the sub-interpreter API exists and why it's used in modules such as the mod_wsgi apache module. Is it mainly used for creating a security sandbox for different applications running within the same process, or is it a way to allow concurrency with multiple threads? Maybe both? Are there other purposes?
I imagine the purpose is to create separate python execution environments. For instance, mod_wsgi (Apache Python module) hosts a single python interpreter and then hosts multiple applications within sub-interpreters (in the default configuration).
Some key points from the documentation:
This is an (almost) totally separate environment for the execution of Python code. In particular, the new interpreter has separate, independent versions of all imported modules, including the fundamental modules __builtin__, __main__ and sys.
The table of loaded modules (sys.modules) and the module search path (sys.path) are also separate.
Because sub-interpreters (and the main interpreter) are part of the same process, the insulation between them isn’t perfect — for example, using low-level file operations like os.close() they can (accidentally or maliciously) affect each other’s open files.
Because of the way extensions are shared between (sub-)interpreters, some extensions may not work properly; this is especially likely when the extension makes use of (static) global variables, or when the extension manipulates its module’s dictionary after its initialization.
As I understood it last, the idea was to be able to execute multiple applications as well as multiple copies of the same application within the same process.
This is a feature found in other scripting languages (e.g. TCL), and is of particular use to gui builders, web servers, etc.
It breaks in python because many extensions are not multiple-interpreter safe, so one interpreter's actions could affect the variables in another interpreter.