Need to get specific data block from a scan report - awk

I completed a nmap scan on a large-ish network and now I am trying to organize the data.
The report I have is the result of :
nmap -A -p 0-65535 -iL [filename] -oX [filename]
So what I am trying to do now is to extract the findings for each IP address that I scanned. I found another post here where the solution was to use awk :
awk 'BEGIN {RS="< host ";} /^starttime/ {print RS $0;}' [filename]
This didnt work for me because instead of stopping after the first block it ran right through the report. I realize of course that this is because '< host ' and 'starttime' are found in the output for all the IP addresses in the range.
Is there anyway for me to run through the nmap report and to extract the scan report for each IP address and save it in a separate file? A For loop will be required to do this of course... once the extraction and writing to file of one block is figured out then that can be expanded using the for loop (i think)...
Or does anyone, from experience or sheer inspiration, have a more refined solution/suggestion?
Any help in the matter will be greatly appreciated.

Don't use awk to parse XML data. Nmap's XML output format is well-documented and there are parsers for it in Python (Ndiff also installs as a Python 2 library and has a parser built-in), Ruby, Perl, or you can use a number of command-line XML parsers.

Related

Buffering output with AWK

I have an input file which consists of three parts:
inputFirst
inputMiddle
inputLast
Currently I have an AWK script which with this input creates an output file which consists of two parts:
outputFirst
outputLast
where outputFirst and outputLast is generated (on the fly) from inputFirst and inputLast respectively. However, to calculate the outputMiddle part (which is only one line) I need to scan the entire input, so I store it in a variable. The problem is that the value of this variable should go in between outputFirst and outputLast in the output file.
Is there a way to solve this using a single portable AWK script that takes no arguments? Is there a portable way to create temporary files in an AWK script or should I store the output from outputFirst and outputLast in two variables? I suspect that using variables will be quite inefficient for large files.
All versions of AWK (since at least 1985) can do basic I/O redirection to files or pipelines, just like the shell can, as well as run external commands without I/O redirection.
So, there are any number of ways to approach your problem and solve it without having to read the entire input file into memory. The most optimal solution will depend on exactly what you're trying to do, and what constraints you must honour.
A simple approach to the more precise example problem you describe in your comment above would perhaps go something like this: first in the BEGIN clause form two unique filenames with rand() (and define your variables), then read and sum the first 50 numbers from standard input while also writing them to a temporary file, then continuing to read and sum the next 50 numbers and write them to a second file, then finally in an END clause you would use a loop to read the first temporary file with getline and write it to standard output, print the total sum, then read the second temporary file the same way and write it to standard output, and finally call system("rm " file1 " " file2) to remove the temporary files.
If the output file is not too large (whatever that is), saving outputLast in a variable is quite reasonable. The first part, outputFirst, can (as described) be generated on the fly. I tried this approach and it worked fine.
Print the "first" output while processing the file, then write the remainder to a temporary file until you have written the middle.
Here is a self-contained shell script which processes its input files and writes to standard output.
#!/bin/sh
t=$(mktemp -t middle.XXXXXXXXX) || exit 127
trap 'rm -f "$t"' EXIT
trap 'exit 126' HUP INT TERM
awk -v temp="$t" "NR<500000 { print n+1 }
{ s+=$1 }
NR>=500000 { print n+1 >>temp
END { print s }' "$#"
cat "$t"
For illustration purposes, I used really big line numbers. I'm afraid your question is still too vague to really obtain a less general answer, but perhaps this can help you find the right direction.

How to redirect output of a running process to a file in Linux Shell

I am trying a bit of experiments with airmon-ng script in Linux. Meanwhile i want to redirect output of a process "airodump-ng mon0" to a file. I can see the instantaneous output on the screen. The feature of this process is that it won't stop execution(actually it is a script to scan for AP and clients, it will keep on scanning) unless we use ctrl+c.
Whenever i try
airodump-ng mon0 > file.txt
i won't get the output in the file.
My primary assumption is that the shell will write it to the file only after completing the execution. But in the above case i stopped the execution(as the execution won't complete).
So to generalize i can't pipe the output of a running process to a file. How can i do that?
Or is there any alternative way to stop the execution of the process(for example after 5 seconds) and redirect the current output to a file?
A process may send output to standard output or standard error to get it to the terminal. Generally, the former is for information and the latter for errors, but in some cases, a process may mix them up.
I'm supposing that in your case, the standard error is being used. To get both of these to the output file, you can use:
airmon-ng mon0 > file.txt 2>&1
This says to send standard output to file.txt and to reroute 2 (which is the file id for standard error) into 1 (the file id for standard output) so that it also goes to the file.

How to manipulate how getline sees lines

I have some code on openVMS where getline doesn't split the lines the same way as VMS editors for example.
Is there some way to manipulate how getline return lines?
It worked well with files ftped over, put it doesnt work with some other files - i think it is RMS fixed length, with a lot of binary zeroes in them.
I am using ifstream.getline(buffer, maxsize), but it can be any getline.
This may be an issue with the RMS record attributes, notably a missing implied newline.
Check out $ help set file /attr
Look for the various RAT options.
How did this file come to be?
If you would like further help, then please provide detail on the file with 'issues'.
Attached, or include, output from
$ DIRECTORY/FULL x.x
and if possible
$ DUMP/RECORD=COUNT=3/WID=80 x.x
as well as
$ DUMP/BLOCK=COUNT=1/WID=80 x.x
Hope this helps,
Hein

correct way to write to the same file from multiple processes awk

The title says it all.
I have 4 awk processes logging to the same file, and output seems fine, not mangled, but I'm not sure that just redirecting print output like this: print "xxx" >> file in every process is the right way to do it.
There are many similar questions around the site, but this one is particularly about awk and a pragmatic, code-correct way to approach the problem.
EDIT
Sorry folks, of course I wasn't "just redirecting" like I wrote, I was appending.
No it is not safe.
the awk print "foo" > "file" will open the file and overwrite the file content, till the end of script.
That is, if your 4 awk processes started writing to the same file on different time, they overwrite the result of each other.
To reproduce it, you could start two (or more) awk like this:
awk '{while(++i<9){system("sleep 2");print "p1">"file"}}' <<<"" &
awk '{while(++i<9){system("sleep 2");print "p2">"file"}}' <<<"" &
and same time you monitoring the content of file, you will see finally there are not exactly 8 "p1" and 8 "p2".
using >> could avoid the losing of entries. but the entry sequence from 4 processes could be messed up.
EDIT
Ok, the > was a typo.
I don't know why you really need 4 processes to write into same file. as I said, with >>, the entries won't get lost (if you awk scripts works correctly). however personally I won't do in this way. If I have to have 4 processes, i would write to different files. well I don't know your requirement, just speaking in general.
outputting to different files make the testing, debugging easier.. imagine when one of your processes had problem, you want to solve it. etc...
I think using the operating system print command is save. As in fact this will append the file write buffer with the string you provide as log. So the system will menage the actual writing process of the data to disc, also if another process will want to use the same file the system will see that the resource is already claimed and will wait for 1st thread to finish its processing, than will allow the 2nd process to write to the buffer.

Apache grep big log file

I need to parse Apache log file to look for specific suspicious patterns (like SQL injections).
For example I'm looking for id='%20or%201=1;
I am using grep to check the log file for this pattern (and others) and because these logs are huge it takes a long amount of time
Here my command:
grep 'id=' Apache.log | egrep "' or|'%20"
Is there a better or a faster method or command I need use to make the search faster?
For starters, you don't need to pipe your grep output to egrep. egrep provides a superset of grep's regular expression parsing, so you can just do this:
egrep "id='( or|%20)'" apache.log
Calling egrep is identical to calling grep -E.
That may get you a little performance increase. If you can look for fixed strings rather than regular expressions, that might also help. You can tell grep to look for a fixed string with the -F option:
grep -F "id='%20or" apache.log
But using fixed strings you lose a lot of flexibility.
I assume most of your time is spent while getting the data from disk (CPU usage is not maxed out). Then you can't optimize a query. You could try to only log the interesting lines in a seperate file though....
Are you looking for grep -E "id=(' or|'%20)" apache.log ?