Trivial goaccess log parsing not working - apache

I'm trying to set up goaccess to analyse some apache output which is highly customised. I didn't fancy my chances writing a .goaccessrc file straight off, so decided to simplify the log (in a text editor) and start slowly. However, I can't even get this trivial example to work. I've also tried some examples from SO that are marked as 'Answered', but I'm still getting the rather terse 'Nothing valid to process' message.
Here's a line from my simplified log file:
2014-05-14 06:26:18 "GET / HTTP/1.1" 200 37.157.246.146
and here's my .goaccessrc:
date_format %Y-%m-%d %H:%M:%S
log_format %d "%r" %s %h
I'm sure the .goaccessrc file is in the right place and being read, because if I remove it, I get the Log Format Configuration window when running goaccess. I'm sure it's something trivial, but I just can't see it. Here's the full output of my recent terminal session:
[root#dev ~] # cat .goaccessrc
date_format %Y-%m-%d %H:%M:%S
log_format %d "%r" %s %h
[root#dev ~] # cat /var/log/apache2/simple.log
2014-05-14 06:26:18 "GET / HTTP/1.1" 200 37.157.246.146
[root#dev ~] # goaccess -f /var/log/apache2/simple.log
GoAccess - version 0.7.1 - Apr 18 2014 21:28:20
An error has occurred
Error occured at: goaccess.c - render_screens - 456
Message: Nothing valid to process.

OK, see here for the full answer. It basically boils down to this. All parsing seems to be driven by log_format, and the token separator is the space character. So in the example above, the first %d placeholder in log_format matches up to the end of 2014-05-14 and then stops. Then the next token ("%r") then fails when it finds the beginning of the time portion.
Solution to the above is:
date_format %Y-%m-%d
log_format %d %^ "%r" %s %h
which matches the date (only, not time), then ignores everything up to the first " character, then matches the request URL and then finally the status and host address.
Note it seems that unless the date and time are a single token (no whitespace), you can't match the time portion successfully.

Related

zsh declare PROMPT using multiple lines

I would like to declare my ZSH prompt using multiple lines and comments, something like:
PROMPT="
%n # username
#
%m # hostname
\ # space
%~ # directory
$
\ # space
"
(e.g. something like perl regex's "ignore whitespace mode")
I could swear I used to do something like this, but cannot find those old files any longer. I have searched for variations of "zsh declare prompt across multiple lines" but haven't quite found it.
I know that I can use \ for line continuation, but then we end up with newlines and whitespaces.
edit: Maybe I am misremembering about comments - here is an example without comments.
Not exactly what you are looking for, but you don't need to define PROMPT in a single assignment:
PROMPT="%n" # username
PROMPT+="#%m" # #hostname
PROMPT+=" %~" # directory
PROMPT+="$ "
Probably closer to what you wanted is the ability to join the elements of an array:
prompt_components=(
%n # username
" " # space
%m # hostname
" " # space
"%~" # directory
"$"
)
PROMPT=${(j::)prompt_components}
Or, you could let the j flag add the space delimiters, rather than putting them in the array:
# This is slightly different from the above, as it will put a space
# between the director and the $ (which IMO would look better).
# I leave it as an exercise to figure out how to prevent that.
prompt_components=(
"%n#%m" # username#hostname
"$~" # directory
"$"
)
PROMPT=${(j: :)prompt_components}

Getting oh-my-zsh 'history' to display command date and time

The .zshrc has the following lines:
# Uncomment the following line if you want to change the command execution time
# stamp shown in the history command output.
# You can set one of the optional three formats:
# "mm/dd/yyyy"|"dd.mm.yyyy"|"yyyy-mm-dd"
# or set a custom format using the strftime function format specifications,
# see 'man strftime' for details.
# HIST_STAMPS="mm/dd/yyyy"
But uncommenting and running history does not work.
The .zshrc comment text is misleading.
Use:
HIST_STAMPS="%d/%m/%y %T"
To show day, month, year and time respectively.
I'm running zsh 5.7.1 (x86_64-apple-darwin19.0) with omz.
HIST_STAMPS="mm/dd/yyyy" now works as intended.

apache mod_headers Date: Header

Folks,
Need to convert the following request header to a different format:
RequestHeader set Date "%{TIME_WDAY}e"
The %t variable looks like :
t=1367272677754275
Would like the Date= to look like:
Date: Tue, 27 Mar 2007 19:44:46 +0000
How is this done?
Thanks!
You cannot do that with the documented functionality of mod_headers. This module only supports the follwing variables (from the doc):
%t The time the request was received in Universal Coordinated Time since the epoch (Jan. 1, 1970) measured in microseconds. The value is preceded by t=.
%D The time from when the request was received to the time the headers are sent on the wire. This is a measure of the duration of the request. The value is preceded by D=. The value is measured in microseconds.
%{FOOBAR}e The contents of the environment variable FOOBAR.
%{FOOBAR}s The contents of the SSL environment variable FOOBAR, if mod_ssl is enabled.
Unless you continually want to set an environment variable to your current date and pull it in using mod_env, I suggest you use mod_rewrite.
Correct answer here is a mod_headers.c patch to add additional authentication information required by AWS and GCS

Awstats - LogFormat doesn't match the Amazon S3 log file contents

I'm trying to setup Awstats to formate Amazon S3 log files,but it keeps saying the log doesn't match the LogFormat, below is the configuration and log content:
LogFormat="%other %extra1 %time1 %host %logname %other %method %url
%otherquot %code %extra2 %bytesd %other %extra3 %extra4 %refererquot
%uaquot %other"
0dfbd34f831f30a30832ff62edcb8a93158c056f27cebd6b746e35309d19039c
looxcie-data1 [18/Dec/2011:04:30:15 +0000] 75.101.241.228
arn:aws:iam::062105025988:user/s3-user E938CC6E4B848BEA
REST.GET.BUCKET - "GET
/?delimiter=/&prefix=data/prod/looxciemp4/0/20/&max-keys=1000
HTTP/1.1" 200 - 672 - 44 41 "-" "-" -
Then I execute the command and get following result:
root#test:/usr/local/awstats/wwwroot/cgi-bin# perl awstats.pl -update - config=www.awstats.apache.com
Create/Update database for config "/etc/awstats/awstats.www.awstats.apache.com.conf" by AWStats version 7.0 (build 1.971)
From data in log file "/var/log/httpd/access.log"...
Phase 1 : First bypass old records, searching new record...
Searching new records from beginning of log file...
Jumped lines in file: 0
Parsed lines in file: 1
Found 0 dropped records,
Found 0 comments,
Found 0 blank records,
Found 1 corrupted records,
Found 0 old records,
Found 0 new qualified records.
Can anyone help to figure it out?
===========================================
I found that the format "%logname" can not match such name
arn:aws:iam::062105025988:user/s3-user
It is wired, but "%lognamequot" is able to match "arn:aws:iam::062105025988:user/s3-user";
This is the cause of this problem;
But our system log file does include logname like arn:aws:iam::062105025988:user/s3-user;
Is there anyone can help to figure it out why it doesn't match it?

Processing apache logs quickly

I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow.
Here is the awk script:
#!/bin/bash
awk '{t=$4" "$5; gsub("[\[\]\/]"," ",t); sub(":"," ",t);printf("%s,",$1);system("date -d \""t"\" +%s");}' $1
EDIT:
For non-awkers, the script reads each line, gets the date information, modifies it to a format the utility date recognizes and calls it to represent the date as the number of seconds since 1970, finally returning it as a line of a .csv file, along with the IP.
Example input: 189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
Returned output: 189.5.56.113,124237889
#OP, your script is slow mainly due to the excessive call of system date command for every line in the file, and its a big file as well (in the GB). If you have gawk, use its internal mktime() command to do the date to epoch seconds conversion
awk 'BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
gsub(/\[/,"",$4); gsub(":","/",$4); gsub(/\]/,"",$5)
n=split($4, DATE,"/")
day=DATE[1]
mth=DATE[2]
year=DATE[3]
hr=DATE[4]
min=DATE[5]
sec=DATE[6]
MKTIME= mktime(year" "date[mth]" "day" "hr" "min" "sec)
print $1,MKTIME
}' file
output
$ more file
189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
$ ./shell.sh
189.5.56.113 1264110895
If you really really need it to be faster, you can do what I did. I rewrote an Apache log file analyzer using Ragel. Ragel allows you to mix regular expressions with C code. The regular expressions get transformed into very efficient C code and then compiled. Unfortunately, this requires that you are very comfortable writing code in C. I no longer have this analyzer. It processed 1 GB of Apache access logs in 1 or 2 seconds.
You may have limited success removing unnecessary printfs from your awk statement and replacing them with something simpler.
If you are using gawk, you can massage your date and time into a format that mktime (a gawk function) understands. It will give you the same timestamp you're using now and save you the overhead of repeated system() calls.
This little Python script handles a ~400MB worth of copies of your example line in about 3 minutes on my machine producing ~200MB of output (keep in mind your sample line was quite short, so that's a handicap):
import time
src = open('x.log', 'r')
dest = open('x.csv', 'w')
for line in src:
ip = line[:line.index(' ')]
date = line[line.index('[') + 1:line.index(']') - 6]
t = time.mktime(time.strptime(date, '%d/%b/%Y:%X'))
dest.write(ip)
dest.write(',')
dest.write(str(int(t)))
dest.write('\n')
src.close()
dest.close()
A minor problem is that it doesn't handle timezones (strptime() problem), but you could either hardcode that or add a little extra to take care of it.
But to be honest, something as simple as that should be just as easy to rewrite in C.
gawk '{
dt=substr($4,2,11);
gsub(/\//," ",dt);
"date -d \""dt"\" +%s"|getline ts;
print $1, ts
}' yourfile