How can I rewrite URLs in XML with Apache 2.4? - apache

Apache 2.4 includes mod_proxy_html and that's great, it's catching all kinds of URLs inside the HTML coming back from the server and fixing them. But I've got a Seam app that sends back text/xml files to the client sometimes with fully qualified URLs that also need to be rewritten and mod_proxy_html doesn't fix them.
Apparently there was a mod_proxy_xml that used to exist separately from mod_proxy_html but Apache didn't include that. Is there a way to get mod_proxy_html configured to do the same thing? I need it to fix URLs in both the HTML and XML files coming back from a server.
Follow up:
I continue to fight with this and I've tried a few different solutions with no success including using mod_substitute (which somehow I'm configuring incorrectly because it never seems to substitute anything for anything) and using the force flag mod_proxy_html has to try and force it to do all files under a certain path.

This is an old question, but I just faced the same issue.
I tried with mod_proxy_html, compiled mod_proxy_xml, nothing worked.
#JonLin's suggestion is spot on, it works with mod_sed.
The only if is mod_sed is documented to work inside Directory nodes.
If you declare a Location though and do a SetOutputFilter instead of AddOutputFilter (which requires a mime type) it works beautifully.
The config that works is:
<Location "/">
SetOutputFilter Sed
OutputSed "s,http://internal:80,https://external.com,g"
</Location>

Related

Need to configure .htaccess, so multiple folders will act as if they are their own separate root folders - for the code running on them

For example:
mydomain.com/site1
mydomain.com/site2
I need to install an application on /site1 that will think that it is on the root folder. (In this case PHP, js, CodeIgniter, but could be anything)
So for example, links/references for files such as "/file.jpg" (in code that is in the site1 folder, such as at mydomain.com/site1/code.js) will really load from mydomain.com/site1/file.jpg
And also the code would not be able to see any folder below site1, so that is basically the root folder. And similar thing would be at site2, so the 2 are separate root folders.
I thought this would be some kind of simple .htaccess file installed at mydomain.com/site1 with a redirect, or some kind of a reverse proxy, but so far everything I tried did not work.
I can't seem to find even any such example even on stack overflow..
Any ideas?
The easiest way to do this would be to create an additional VirtualHost, for internal use, called internal1, whose RootDirectory is, you guessed it, /var/www/mydomain.com/htdocs/site1 where the main site is in /var/www/mydomain.com/htdocs.
Then in mydomain.com you reverse proxy /site1 to internal1 (you'll have to put it into /etc/hosts and alias for localhost). The second request will have its DOCUMENT_ROOT point to site1, as requested (and its ServerName changed to internal1):
ProxyPass /site1/ http://internal1/
ProxyPassReverse /site1/ http://internal1/
(Not sure about the trailing slashes)
Now, accessing yourdomain.com/site1/joe.html will trigger a second internal connection to internal1/joe.html, which will contain, say, 'src="/joe.jpg"'; and here's where ProxyPassReverse will come into play, rewriting this in 'src="yourdomain.com/site1/joe.jpg"' so that everything will work.
errata corrige
The above is not correct, thanks #MrWhite for pointing this out. ProxyPassReverse is not enough as it only rewrites headers. From the Apache documentation (emphasis mine):
Only the HTTP response headers specifically mentioned above will be
rewritten. Apache httpd will not rewrite other response headers, nor
will it by default rewrite URL references inside HTML pages. This
means that if the proxied content contains absolute URL references,
they will bypass the proxy. To rewrite HTML content to match the
proxy, you must load and enable mod_proxy_html.
(The method is dirty as all Hell: every HTTP call incurs one extra connection and two rewrites, one going in, a larger one going out).
Of course, if the link is built using e.g. Javascript, it might well be that the proxy code will not recognize it as a link, will leave it unchanged, maybe with the "internal1" name inside somewhere, and the app will break.
However, #arkascha has the right of it - you should cure the cause, not the symptom. You can maybe rewrite the environment of the apps so that they run without troubles even if they are in a subdirectory. Or you could try injecting <base href="https://example.com/site1"> in the output HTML.

Can mod_headers change headers generated by uWSGI?

I have a uWSGI service running behing an apache front-end. The part of my apache conf handling that lools like:
<Location /myapp>
SetHandler uwsgi-handler
uWSGISocket /var/run/uwsgi/myapp.sock
Allow from all
</Location>
and I'd like to add a custom header to the responses of my app. I know I can do that by adding some code in the app, but I would prefer doing it with mod_headers, by adding the following line in the Location directive
Header set Custom-Header "hello world"
It does not seem to work, although mod_headers documentation states
This directive can replace, merge or remove HTTP response headers.
The header is modified just after the content handler and output filters are run,
allowing outgoing headers to be modified.
What do I do wrong, or understand wrong?
As stated in the docs mod_uwsgi is very raw and uses the 'assbackwards' mode, unless you enable the CGI mode. This mode (assbackwards) gives superior performance but breaks basically all of the filters. You should use mod_proxy_uwsgi (fully apache-friendly) or let uWSGI do the hard work for you using the internal routing:
http://uwsgi-docs.readthedocs.org/en/latest/InternalRouting.html
(or the --add-header more invasive option)

Apache Reverse Proxy ReWrite

I have a apache instance setup to reverse proxy an internal application. I have this working using mod_proxy, but the end result is a lack of images and other content due to hard coded paths in the application itself. I think I have two options.
Mod_Rewrite
Mod_HTML
The basic problem is this.
External site: http://external.customer.com (Port 80)
Internal site: http://internal.supplier.com:8080/testcustomer
I need to get apache to proxy the connection, but it must use the full URL when talking to the internal server internal.supplier.com:8080/testcustomer and paths must be rewritten so that images etc will render on the end client.
Can anyone give me some guidance here? help would be much appreciated.
Thanks
That may be becuse you have used absolute paths like src=/app/favicon.jpg and src=/app/icons/smiley.jpg......instead of relative paths like using src="favicon.jpg".
This problem can be solved by adding module mod_proxy_html which helps in parsing html.
Then LoadModule proxy_html in your httpd.conf and then add following directives :-
ProxyHTMLEnable On
OR
SetOutputFilter proxy-html
mod_proxy_hmtl has pre-requisite installs libxml2 and libxml2-devel.You can install it through yum.
If you could share your configuration file then may be we can help more.

pages are displaying plain text instead of html

I am hosting multiple sites on the same server and using a http-vhosts file to specify virtual host info for them. It is working great. The problem is I changed in Movable Type the way entries are created. I want them to not have file extension. So it is currently domain.com/entry/15 instead of domain.com/entry/15.html. Because I took out the .html I'm assuming apache doesn't know what to do so it is spitting out the page as plain text. How can I fix this? I added in a virtualhost block:
DefaultType text/html
I also added that in the httpd.conf hoping it would fix it globally for all my sites. I restarted apache and still the same problem. Any ideas?
Is it possible that this is a content negotiation problem? In a few cases I've seen Apache try to determine what sort of file is being requested by looking at the first few bytes of the file being served.
I have seen problems like this be solved by commenting out mod_negotiation in http.conf and restarting. See the mod_negotiation documentation for more details.
I just solved the same issue by disbabling Magic Mime in httpd.conf (some files would display as html and some as text for no apparent reason).
edit /etc/httpd/conf/httpd.conf
Comment the lines for the Mime Magic Module:
MIMEMagicFile /usr/share/magic.mime
MIMEMagicFile conf/magic
Restart Apache and clear your browser cache
source
That seems correct. Care to share one link, or the HTTP headers that are returned for one of those pages? P.S. as well as the whole Apache config block where you placed that directive, for context?
if u create a .htaccess in ur site's root dir,and the .htaccess's content is:
DefaultType text/html
then the issue is fixed imm.

apache mod_proxy_html on Ubuntu ProxyHTMLEnable not working

I'm trying to use mod_proxy_html on Ubuntu which I installed from apt-get. The module is loading properly and all ProxyHTML* directives work except for the one that matters the most. When I do "ProxyHTMLEnable on" in my apache2.conf or vhost conf files, apache complains that it's an invalid directive and I must have misspelled it. Is anyone else having this issue on Ubuntu and what can be done to fix it?
Have you tried leaving out "ProxyHTMLEnable on" entirely? I think that directive is new and not in the version in Ubuntu.
Do put "SetOutputFilter proxy-html" in its place
While this isn't necessarily specific to the question, I figured I'd throw this out there for anyone else getting here from the Google super-highway.
I tried just removing the ProxyHTMLEnable On and adding SetOuputFilter proxy-html, but still wasn't working for me. The "gotcha" in my case was the content mod_proxy_html was trying to process was compressed.
Adding SetOutputFilter INFLATE;proxy-html;DEFLATE instead of SetOuputFilter proxy-html did it for me. (will obviously lead to more processing being done)
This site explains it much better than I can: http://wiki.uniformserver.com/index.php/Reverse_Proxy_Server_2:_mod_proxy_html_2#Cause_and_Solution_3