For speeding up a MediaWiki site which has content that uses a lot of templates but otherwise pretty much has static content when the templates have done their jobs I'd like to setup a squid server
see
https://www.mediawiki.org/wiki/Manual:PurgeList.php
and
https://www.mediawiki.org/wiki/Manual:Squid_caching
and then fill the squid server's cache "automatically" by using a script doing wget/curl calls that hit all pages of the Mediawiki. My expecation would be that after this procedure every single page is in the squid cache (if I make it big enough) and then each access would be done by squid.
How would i get this working?
E.g.:
How do I check my configuration?
How would I find out how much memory is needed?
How could I check that the pages are in the squid3 cache?
What I tried so far
I started out by finding out how to install squid using:
https://wiki.ubuntuusers.de/squid
and
https://www.mediawiki.org/wiki/Manual:Squid_caching
I figured out my ip address xx.xxx.xxx.xxx (not disclosed here)
via ifconfig eth0
in /etc/squid3/squid.conf I put
http port xx.xxx.xxx.xxx:80 transparent vhost defaultsite=XXXXXX
cache_peer 127.0.0.1 parent 80 3130 originserver
acl manager proto cache_object
acl localhost src 127.0.0.1/32
# Allow access to the web ports
acl web_ports port 80
http_access allow web_ports
# Allow cachemgr access from localhost only for maintenance purposes
http_access allow manager localhost
http_access deny manager
# Allow cache purge requests from MediaWiki/localhost only
acl purge method PURGE
http_access allow purge localhost
http_access deny purge
# And finally deny all other access to this proxy
http_access deny all
Then I configured my apache2 server
# /etc/apache2/sites-enabled/000-default.conf
Listen 127.0.0.1:80
I added
$wgUseSquid = true;
$wgSquidServers = array('xx.xxx.xxx.xxx');
$wgSquidServersNoPurge = array('127.0.0.1');
to my LocalSettings.php
Then I restarted apache2 and started squid3 with
service squid3 restart
and did a first access attempt with
wget --cache=off -r http://XXXXXX/mediawiki
the result is:
Resolving XXXXXXX (XXXXXXX)... xx.xxx.xxx.xxx
Connecting to XXXXXXX (XXXXXXX|xx.xxx.xx.xxx|:80... failed: Connection refused.
Assuming Apache 2.x.
While not Squid related, you can achieve this using just Apache modules. Have a look at mod_cache here: https://httpd.apache.org/docs/2.2/mod/mod_cache.html
You can simply add this to your Apache configuration and ask Apache to do disk caching of rendered content.
You need to ensure your content has appropriate cache expiry information in the resulting PHP response, MediaWiki should take care of this for you.
Adding such a cache layer may not have the desired outcome as this layer does not know if a page has changed, cache management is difficult here and should only be used for actually static content.
Ubuntu:
a2enmod cache cache_disk
Apache configuration:
CacheRoot /var/cache/apache2/mod_disk_cache
CacheEnable disk /
I would not recommend pre-filling your cache by accessing every page. This will only cause dormant (not frequently used) pages to take up valuable space / memory. If you still wish to do this, you may look at wget:
Description from: http://www.linuxjournal.com/content/downloading-entire-web-site-wget
$ wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains website.org \
--no-parent \
www.website.org/tutorials/html/
This command downloads the Web site www.website.org/tutorials/html/.
The options are:
--recursive: download the entire Web site.
--domains website.org: don't follow links outside website.org.
--no-parent: don't follow links outside the directory tutorials/html/.
--page-requisites: get all the elements that compose the page (images, CSS and so on).
--html-extension: save files with the .html extension.
--convert-links: convert links so that they work locally, off-line.
--restrict-file-names=windows: modify filenames so that they will work in Windows as well.
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and
resumed).
A better option: Memcached
MediaWiki also supports the use of Memcached as a very fast in-memory caching service for data and templates only. This is not as brutal as a website-wide cache like Squid or Apache mod_cache. MediaWiki will manage Memcached so that any changes are immediately reflected in the cache store, meaning your content will always be valid.
Please see the installation instructions at MediaWiki here: https://www.mediawiki.org/wiki/Memcached
My recommendation is not to use Apache mod_cache or Squid for this task, and instead to install Memcached and configure MediaWiki to use it.
Related
I have a node.js environment deployed using AWS Elastic Beanstalk on an Apache server. I have run a PCI scan on the environment and I'm getting 2 failures:
Apache ServerTokens Information Disclosure
Web Server HTTP Header Information Disclosure
Naturally I'm thinking I need to update the httpd.conf file with the following:
ServerSignature Off
ServerTokens Prod
However, given the nature of Elastic Beanstalk and Elastic Load Balancers, as soon as the environment scales, adds new servers, reboots etc the instance config will be overwritten.
I have also tried putting the following into an .htaccess file:
RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} =http
RewriteRule .* https://%{HTTP:Host}%{REQUEST_URI} [L,R=permanent]
# Security hardening for PCI
Options -Indexes
ServerSignature Off
# Dissallow iFrame usage outside of loylap.com for PCI Security Scan
Header set X-Frame-Options SAMEORIGIN
On the node js side I use the "helmet" package to apply some security measures, I also use the "express-force-https" package to ensure the application is enforcing https. However, these only seem to be taking effect after the Express application is initiated and after the redirect.
I have Elastic Load Balancer listeners set up for both HTTP (port 80) and HTTPS (port 443), however the HTTP requests are immediately routed to HTTPS.
When I run the following curl command:
curl -I https://myenvironment.com --head
I get an acceptable response with the following line:
Server: Apache
However when I run the same request on the http endpoint (i.e. before redirects etc):
curl -I http://myenvironment.com --head
I get a response that discloses more information about my server than it should, and hence the PCI failure:
Server: Apache/2.4.34 (Amazon)
How can I force my environment to restrict the http header response on HTTP as well as HTTPS?
Credit to #stdunbar for leading me to the correct solution here using ebextensions.
The solution worked for me as follows:
Create a file in the project root called .ebextensions/01_server_hardening.config
Add the following content to the file:
files:
"/etc/httpd/conf.d/03_server_hardening.conf":
mode: "000644"
owner: root
group: root
content: |
ServerSignature Off
ServerTokens Prod
container_commands:
01_reload_httpd:
command: "sudo service httpd reload"
(Note: the indentation is important in this YAML file - 2 spaces rather than tabs in the above code).
During elastic beanstalk deployment, that will create a new conf file in /etc/httpd/conf.d folder which is set up to extend the httpd.conf settings in ELB by default.
The content manually turns off the ServerSignature and sets the ServerTokens to Prod, achieving the PCI standard.
Running the container command forces a httpd reboot (for this particular version of Amazon linux - ubuntu and other versions would require their own standard reload).
After deploying the new commands to my EB environment, my curl commands run as expected on HTTP and HTTPS.
An easier and better solution exists now.
The folder /etc/httpd/conf.d/elasticbeanstalk is deleted when the built-in application server is restarted (e.g. when using EB with built-in Tomcat). Since .ebextensions are not re-run the above solution stop working.
This is only the case when the application server is restarted (through e.g. Lambda or the Elastic Beanstalk web-console). If the EC2 instance is restarted this is not an issue.
The solution is to place a .conf file in a sub-folder in the .ebextensions.
.ebextensions
httpd
conf.d
name_of_your_choosing.conf
Content of the file is the same as the output of the .ebextensions above, e.g.
ServerSignature Off
ServerTokens Prod
This solution will survive a restart of the application server and is much easier to create and manage.
You will ultimately need to implement some ebextensions to have this change applied to each of your Beanstalk instances. This is a mechanism that allows you to create one or more files that are run during the initialization of the beanstalk. I have an older one that I have not tested in your exact situation but it does the HTTP->HTTPS rewrite like you're showing. It was used in the Tomcat Elastic Beanstalk type - different environments may use different configurations. Mine looks like:
files:
"/tmp/00_application.conf":
mode: "000644"
owner: root
group: root
content: |
<VirtualHost *:80>
RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule ^(.*)$ https://%{HTTP_HOST}$1 [R,L]
</VirtualHost>
container_commands:
01_enable_rewrite:
command: "echo 'LoadModule rewrite_module modules/mod_rewrite.so' >> /etc/httpd/conf/httpd.conf"
02_cp_application_conf:
command: "cp /tmp/00_application.conf /etc/httpd/conf.d/elasticbeanstalk/00_application.conf"
Again, this is a bit older and has not been tested for your exact use case but hopefully it can get you started.
This will need to be packaged with your deployment - i.e. in Java a .jar or .war or a .zip in other environments. Take a look at the documentation link to learn more about deployments.
There is a little change in configuration file path as AWS has introduced Amazon Linux 2
.ebextentions
.platform
httpd
conf.d
whateverFilenameyouwant.conf
in .platform/httpd/conf.d/whatever-File-NameYouWant.conf
add below two line
ServerSignature Off
ServerTokens Prod
Above is for Apache
Since AWS by default uses nginx for reverse proxy
replace httpd to nginx it should work
I set up a controller host with rabbitmq-server running. From the nova host, I see that the nova-conductor cannot be reached. I checked on the controller host and I see the following in the logs :
access to vhost 'None' refused for user 'openstack'
I have the following configuration settings for rabbitmq on the controller host:
rabbitmqctl list_users
Listing users ...
guest [administrator]
openstack []
When I list permissions I see openstack can access all resources
list_permissions
Listing permissions in vhost "/" ...
guest .* .* .*
openstack .* .* .*
I am able to authenticate with the rabbitmq-server - just unable to access the / vhost. For debugging, I would like to set up so that any client can access any resource (turn off access control altogether). Is that possible.
Thanks
This is caused by a relatively new change in either Kombu or oslo.messages. Previously if a virtual_host was not provided, it would default to /. This is no longer the case.
For it to work your transport_url needs to at the very least have one / at the end.
transport_url = rabbit://stackrabbit:secretrabbit#127.0.0.1:5672/
You can take a look at for example devstack here as a reference.
The actual upstream fix for the issue is available here.
access to vhost 'None' refused for user 'openstack'
Something is trying to access a vhost named None, which doesn't exist. The default vhost is named /. Since None is a keyword in Python I suspect there is an application bug or mis-configuration somewhere.
It's not possible to disable access control so I suggest creating a well-known user and password to use.
You can also send http requests to the vhost to get/set user permissions. You have to specify the host name as part of the url, but since the default vhost is '/' you have to substitute it for '%2F' i.e. http://<ip>:<port>/api/vhosts/%2F/permissions
e.g.
curl -i -u guest:guest -h "content-type:application/json" -GET http://localhost:15672/api/permissions/%2F/guest
You can find a full list of api options by going to http://<ip>:<port>/api/index.html
I am not sure how or if this can be done. I have a home network and would like to see a computer,not the server, via a remote location. I have Apache on my server. Example: the network computers I would like to see ip 152.254.1.33. Is there a way to add this ip to Apache root directory? I have tried to add a shortcut with in the root directory and it only works on the home network, will not via remote connection.
I need some clarification here on what you are trying to acomplish, are you trying to access the Apache website outside of the local network?
If that is the case, Apache is automatically set to listen on all network interfaces, you can check this in your virtual host configuration in the sites-enabled directory of your apache installation.
You should see something like in the 000-default.conf
You can test if apache is serving pages up correctly using the command
curl 127.0.0.1
You should see the HTML of the page being served.
If this is the case, then it's likely the firewall on your machine/router or your ISP is blocking the required ports. You can allow Apache through the firewall on Ubuntu using sudo ufw allow Apache Full
If you give me some more info in comments we can probably work this out.
I want to setup Apache and Glassfish on Ubuntu 16.04 server.
I have installed
apache2
libapache2-mod-jk
glassfish
The following are the steps I have followed
Configuring the MPM module
Set MaxRequestWorkers to 400 in /etc/apache2/mods-available/mpm_event.conf
Configuring the JK Module
<IfModule mod_jk.c>
JkWorkersFile /usr/share/glassfish4/glassfish/domains/<domain-doamin1>/config/workers.properties
JkLogFile /var/log/apache2/mod_jk.log
JkLogLevel error
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories
JkRequestLogFormat "%w %V %T"
JkMountCopy all
</IfModule>
JkMount /myapp/* ajp13
<Location "/myapp/WEB-INF/">
require all denied
</Location>
Create a workers.properties file in your GlassFish domain's config directory
worker.list=ajp13
worker.ajp13.type=ajp13
worker.ajp13.host=localhost
worker.ajp13.port=8009
# load balancing only: worker.ajp13.lbfactor=50
connection_pool_size=10
connection_pool_timeout=600
worker.ajp13.socket_keepalive=False
worker.ajp13.socket_timeout=30
Create the JK listener in GlassFish using these commands
asadmin create-http-listener --listenerport 8009 --listeneraddress 0.0.0.0 --defaultvs server jk-listener
asadmin set server-config.network-config.network-listeners.network-listener.jk-listener.jk-enabled=true
then I restarted glassfish domain successfully but when i try to restart apache2 with sudo /etc/init.d/apache2 restart I get the error below
[....] Restarting apache2 (via systemctl): apache2.serviceJob for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service" and "journalctl -xe" for details.
failed!
This error occurs when I edit the file jk.conf located under /etc/apache2/mods-available/jk.conf
Where am I going wrong. Is there a complete guide to accomplishing this? Finally the newer apache2 doesn't have the file httpd.conf and all the tutorials allover the Internet rely upon this file. Thanks in advance.
Since your objective is just to forward requests from Apache to GlassFish, not to loadbalance requests from Apache to multiple GlassFish servers, I would recommend avoiding mod_jk. You can certainly achieve your goal with it, but if you are new to the concepts involved, you will find it difficult to understand and maintain.
Instead you can use mod_proxy and, optionally, mod_proxy_ajp.
First, a definition:
AJP vs HTTP
AJP is a protocol like HTTP, but binary rather than text based. It has no secure/insecure options like HTTPS/HTTP since it is normally used behind a firewall and performs much better than HTTP for these scenarios. When you mark any GlassFish network listener as jk-enabled, you are enabling AJP communication, rather than HTTP.
You've installed Apache via the ubuntu apache2 package which has its own example structure to configuration which is different to the layout you would get if you downloaded and unzipped it. This has advantages, but we need to understand the Apache configuration file before getting to that.
Apache Configuration
Generally, you will see internet guides refer to httpd.conf as the configuration file to edit. This is just the default "parent" configuration file. In Debian/Ubuntu systems (and their derivations, like Linux Mint), the file to look for is apache2.conf.
This file is read, and its directives applied, from top to bottom, so if you have set the same property to two different values, the second will apply. (More accurately, they will both apply but the first will only apply until the second setting is read).
This file can also specifically "include" files and folders (where any *.conf file in an included folder will be included). These will be read in and merged with the main configuration at the point where the "include" statement is written. So the very last line in the main configuration file (if it is not specifying another file) will be the last line of configuration to be set, no matter what.
Debian config layout
I would highly recommend you read the opening comment in the apache2.conf file, since it will tell you all you need to know about the layout. Suffice it to say that keeping all the config in one file is very painful to maintain. The Debian package separates configuration into three categories:
sites
Sites are single configuration files for a website or web project. This could be anything: PHP, static HTML or a Java EE application deployed to an app server like GlassFish.
mods
Modules are subdivided into *.load files which load the actual libraries needed to run them, and *.conf files which have global configuration for the modules. Note that this configuration applies to every site that uses the module, so it is best to put any site/app specific module configuration in the appropriate site.conf file
conf
These files are just for any other general configuration which fits into a nice group. This could be SSL configuration like keystore and truststore locations.
When you look at the directory structure, you will see that each of these have 2 folders: *-available and *-enabled. This is because the Debian Apache package comes with 6 helper tools, a2ensite and a2dissite; a2enmod and a2dismod; a2enconf and a2disconf. The idea is that you follow these rules:
Never directly edit the apache2.conf file
only ever add or change files in the *-available folders
Use the helper tools to enable or disable sites/modules/conf files.
Answer
So to (finally) answer your question, I would do the following steps:
Enable mod_proxy_ajp
a2enmod mod_proxy_ajp
Create a new myApp.conf in sites-available. You can copy the default one, which is a good example. Assuming you have just want to forward all requests to GlassFish, you can use the default VirtualHost settings of ` which will process a request for any hostname on port 80. Use port 443 if you want to add HTTPS.
Add ProxyPass and ProxyPassReverse directives to the location of your server. If Apache and GlassFish are on the same server, it is likely you will want to use ajp://localhost:8080
ProxyPass / ajp://host_name:0000
ProxyPassReverse / ajp://host_name:0000
Note: This assumes you are using AJP. If that causes you problems, switch to HTTP by changing ajp to http above and disabling the jk-listener in GlassFish.
Once you have completed your myApp.conf configuration, remember to disable the default site:
a2dissite 000-default-site.conf
And enable your new site:
a2ensite myApp.conf
Those commands will appropriately modify the main apache2.conf and create the appropriate links in the sites-enabled folder.
That should be all you need. Now, everything that points to your hostname after the root / of the URL will be forwarded to the root context / of GlassFish.
I've installed subversion and apache on my pc. I can access to my repository using followinf url
http://localhost/svn/repos/
Now I want other members of my group to access the project files I've put in my repository. As it's my first time using svn I looked for the solutions and I think I'm a bit lost.
I read about port forwarding in my router so I opened my router interface. I went to NAT/PAT section of my router configuration and added a new rule with following caracteristics:
Application: svn
External port:3690
Internal port:80
protocol : TCP
equipment: myPC
And Checked the option "Active". But I think I'm missing something.
I read in an article that to verify if the remote access is working i have to go to
svn://83.200.108.71
While it doesn't work. "unable to connect.."
Can someone please help me .
Wait... You can access your repository via http://? Why not let others access the repository using http://?
Don't do anything with your router. Don't muck with ports. Apache httpd is serving your repository just fine off of Port 80. Tell your users to simply access your repository via http://<machineName>/svn/repos. That's all there is to it.
svn:// is a completely different protocol than http://. Port 3690 just happens to be the default port of svn://, but that doesn't mean if you reroute your http:// protocol there, everything will work.
Most of the time, people who first use Subversion set up the svnserve server instead of Apache httpd because it's easier than using Apache http. Here's how you setup a repository to use svn://:
$ svnadmin create my_repos #
$ vi my_repos/conf/svnserve.conf #Need to denop 'password-db=passwd' line
$ vi my_repos/conf/passwd #Need to setup user accounts
$ svnserve -r my_repos -d
And that's it. Now your users can access the repository via svn://<machineName>.
Although svnserve is simpler and easier than Apache (and faster), there are many reasons to use Apache httpd over svnserve:
Port 80 is likely not blocked by network while port 3690 maybe blocked
You can let Apache httpd use LDAP for authentication (which can also allow Windows Active Directory authentication)
Apache httpd can service multiple repositories while svnserve can only service a single repository on port 3690.