if i have a web application running from a yaws web server, how would i count the number of hits from users to my site?
I have tried to use rudimentary methods of counting the number of lines in the .access file of my site found in the yaws logs like this:
$ cat PATH_TO_YAWS_LOGS/www.my_site.com.access | wc -l
Point me to a better way of finding out how many hits i have received sofar on my site running on top of yaws.
You're lucky, Yaws uses the "Common Log Format" and thus any analytics software supporting Apache should do (for example "Webalizer", as mentioned there).
Related
Is it somehow possible to see how many times a specific file was requested from the server, or set up a system to see it in the future?
In my case, it is a Javascript file on my server that is included on many different websites.
My setup is using LAMP.
You can see requests in an access log of Apache server.
Simply scan for occurrences of requests to that file by hands or write some script.
So that JSON file is requested by HTTP requests going to your Apache server.
That server probably manages log files. (the location of that log file is configurable).
You then need to scan the access log file to find every access. You can use something like
grep yourfile.json /var/log/www/access.log | wc
(to count occurrences)
where /var/log/www/access.log is your access log file.
BTW, many log files are managed by logrotate. Details are system specific. Take than into account too.
For a few days now the backup of google sites using google-sites-liberation stopped working.
The call
java -cp google-sites-liberation.jar com.google.sites.liberation.export.Main -d "$DOMAIN" -w wiki -u "$USER" -p "$PASSWORD" -f "$DIR/" 2>&1
which always worked before now fails with:
May 29, 2015 1:48:23 PM com.google.sites.liberation.export.Main doMain
SEVERE: Invalid User Credentials!
Exception in thread "main" java.lang.RuntimeException: com.google.gdata.util.AuthenticationException: Error authenticating (check service name)
at com.google.sites.liberation.export.Main.doMain(Main.java:89)
at com.google.sites.liberation.export.Main.main(Main.java:97)
Caused by: com.google.gdata.util.AuthenticationException: Error authenticating (check service name)
at com.google.gdata.client.GoogleAuthTokenFactory.getAuthException(GoogleAuthTokenFactory.java:614)
at com.google.gdata.client.GoogleAuthTokenFactory.getAuthToken(GoogleAuthTokenFactory.java:490)
at com.google.gdata.client.GoogleAuthTokenFactory.setUserCredentials(GoogleAuthTokenFactory.java:336)
at com.google.gdata.client.GoogleService.setUserCredentials(GoogleService.java:362)
at com.google.gdata.client.GoogleService.setUserCredentials(GoogleService.java:317)
at com.google.gdata.client.GoogleService.setUserCredentials(GoogleService.java:301)
at com.google.sites.liberation.export.Main.doMain(Main.java:79)
... 1 more
I checked the credentials, the credentials of the account are correct. However it is the main account's password, which probably has more strict security settings on Google now.
I tried to find a solution using Google-Search but only stumbled over old suggestions which had solutions which are no more available today. Also I did not find a way to add an user/password application login to the account used to backup the wiki.
Has anybody a pointer how to fix that and make backup of google site available again?
All answers are good which offer a solution to backup a site:
Use some other fully^2 automated tool which does the job of copying an entire site to a directory or archive format, for example .tar.bz2
Change google-sites-liberation such, that it uses another authentication method then given in the docs which are a couple of years old now. I did not manage to find it.
Note that the account used for backup must not have full google apps for domains administrator access, as this is crucial.
Please no external vendor links except if it is from Google. The data of the site(s) must not be shared with a third party, only Google and me.
Note that the process must be fully^2 automated, but I would like to have it even fully^4 automated:
fully^1, because it must run at regular intervals.
fully^2, because it must start without user intervention whatsoever (some people define "fully automated" as to start something manually such that it runs by itself, while "automated" means to have a script which still may ask for some additional input)
fully^3, because it should not involve user intervention to get the process started (like issuing something like a google authenticator token) at the first run (even if it later runs fully^2 automated)
fully^4, because I want to be able to setup the process for several thousands sites in an automated, noninteractive way, when the process which prepares the setup runs on a host which is offline (so the setup can be uploaded to the fully^3 automated system without any additional manual setup steps for example using IPoAC. YKWIM).
Not much of a problem if it is only fully^2 automated, as I only want to backup my little single site (only a few thousand pages with attachments). However I am curious how to get it fully^4 automated, because automating everything (including, but not limited to, the Universe) was my motivation getting into the computer business several decades ago ..
Thanks.
Links:
https://code.google.com/p/google-sites-liberation/ a bit dated code to retrieve sites
https://www.google.com/settings/takeout does not include google apps for domain sites
http://blog.famzah.net/2014/08/06/authentication-for-google-sites-liberation/ the noted account setting is not (no more) available
Was unable to find any suitable link how to implement a google apps for domain backup with another tool, the all result pages I looked at (several!) seem to be exclusively for third party vendors on this matter with more or less unknown trustworthyness. So perhaps I am unable to define the right google search on this matter.
Update 2015-06-23:
My scripts run every day and they tell if something goes wrong, but not if they work as intended. So I oversaw that it suddenly worked for a few days. But today it failed again:
2015-05-27 to 2015-06-11 (15 days) authentication failure
2015-06-12 to 2015-06-22 (11 days) it works again
2015-06-23 (today) authentication failure again
I have no idea why it suddenly worked for 11 days. I'll probably update this question again on the next ok-to-fail transition. ;)
Google uses OAuth2 instead of user account/password.
I fixed the GUI interface.
https://github.com/sih4sing5hong5/google-sites-liberation
But I have no idea about OAuth2 with auto scripts.
I developed a console script in Python which exports Google Sites:
https://github.com/famzah/google-sites-backup
This works with automated scripts. It needs more testing but functions properly for my sites.
Because of the nature of OAuth2, the first time you ever start the script, you will need to obtain a token manually by visiting a web page. There is no other way. Once you've done this, the Python script caches the authentication token and the backup works in a completely non-interactive mode. It is a decision by Google when this cached token expires.
I have a dedicated server that runs a few lightweight game servers. The server is already running Apache. However I am cheap and the server hardware is not exactly robust and not all the servers we use run concurrently. I want to be able to generate a web page say /stats that has some info like:
Game 1: Online <uptime>
Game 2: Offline
...etc
I'm certain that I could run a script using a cronjob that just uses ps + grep logged into a file, and then parse that file for information on the server but I'm looking for a more dynamic option that checks as the page is generated.
You have at least a few options (other people may have additional suggestions beyond what is listed here):
Cron a shell script to generate a stats.html or stats.txt
PHP's shell_exec (could run ps |grep... for example) or exec
PHP's variety of posix functions may help (http://php.net/manual/en/ref.posix.php)
If you have PERL available there may be a few options there as well
My suggestion is to evaluate shell_exec or exec before any of the others.
If you need additional assistance please post what you have tried and the results.
I am using name based hosting with 2 ips on my EC2 instance. I used Route53 for my DNS.
I can load the sites fine, an I tested from 20 locations (friends and VPN), all fine. Then I go to one house and one office that are having the problem (one on Cox and one on Century Link) and the page wont load.
When I visit the page it says 'waiting'. If I change it to use Google DNS 8.8.8.8 8.8.4.4 it loads fine.
Tail -f access log show initial connection from that IP, status 200.
No other hits from user/no images.css etc pulled from server, just initial line.
When I try to call a specific image, Chrome logs show 200 success, but ony 1k.
If I put up test.html with plain text in it, ie Hello, it loads fine.
Could this be something with my EC2 / Name Based setup? Since it loads for 95% off all people, I'm stuck.
CentoS.
Thx
After thorough testing with an AWS rep, no reason could be found. Intermittent connections, and could not FTP or SSH in from locations URL did not come up.
Fix: Stop and Start instance. This will launch instance on new hardware without issues. Worked great.
Make sure you have an AMI or at least a Snapshot.
Is there a way to detect if a site is on a Content Delivery Network and if yes, can we tell which service are they using?
A method that is achievable from the command line is using the 'host' command, with the -a flag set to see the DNS record e.g.
host -a www.visitbritain.com
Returns:
www.visitbritain.com. 0 IN CNAME d18sjq5nyxcof4.cloudfront.net.
Here you can see that the CNAME entry tells us that the site is using cloudfront as the CDN.
Just take a look at the urls of the images (and other media) of the site.
Reverse lookup IP's of the hostnames you see there and you will see who own them.
I built this little tool to identify the CDN used by a site or a domain, feel free to try it.
The URL: http://www.whatsmycdn.com/
You might also be able to tell from the HTTP headers of the media if the URL doesn't give it away. For example, media served by SimpleCDN has Server: SimpleCDN 5.6a4 in its headers.
cdn planet now have their cdn finder tool on github
http://www.cdnplanet.com/blog/better-cdn-finder/ The tool installs on the command line and allows you the feed in host names and check if they use a CDN.
If Website using GCP CDN you simply check it using curl
curl -I <https://site url>
In reponse you can find following headers there available
x-goog-metageneration: 2
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 17393
x-goog-meta-object-id: 11602
x-goog-meta-source-id: 013dea516b21eedfd422a05b96e2c3e4
x-goog-meta-file-hash: cf3690283997e18819b224c6c094f26c
Yes you can find by
host -a www.website.com
Apart from some excellent answers already posted here which include some direct methods which may or may not work for all the websites out there, there is also an indirect way to see if a CDN is there. And especially if its your own website and you want to know if you are getting what you are paying for !
The promise of a CDN is that connections from your users are terminated closer to them so that they get less TCP / TLS connection establishment overhead and static content is cached closet to them so that it loads faster, puts less strain on your origin servers.
To verify this, you can take measurements of site load times across the globe and see if all the users get similar loads times. No you dont have to get a machine everywhere in the world to do that ! Someone has already done that for you
Head to https://prober.tech/ and the URL you wish to test for load times.
Because this site itself is in Cloudflare's CDN, you can put that link itself in the test box and use it as baseline !
More information on using the tool can be found here