numerical comparison to less or equal with awk - awk

I'm trying to build a script (not command line) in awk that will pull a list of users on a Linux system then save them to a file.
I have most of it working but I can't figure out how to filter for users that are not system users, ie have an ID over 1000. However when I built the code and ran it, it returned an empty file. I'm saving it to a file in the command line.
Any advice here would be fantastic as I have been pulling my hair out trying to figure out why this isn't working. The code I currently have is this:
#! /usr/bin/awk -f
BEGIN { FS=":" }
/$3<=1000/ { print "Username :",$1,"User ID :",$3}

Use:
$3<=1000 {print "Username :",$1,"User ID :",$3}
or
{if($3<=1000) {print "Username :",$1,"User ID :",$3}}

Related

Linux parsing space delimited log files

I need to parse apache-access log files which has 16 space delimited columns, that is,
xyz abc ... ... home?querystring
I need to count total number of hits for each page in that file, that is, total number of home page hits ignoring querystring
For few lines the url is column 16 and for other its 14 or 15. Hence I need to parse each line in reverse order (get the last column, ignore query string of the last column, aggregate page hits)
I am new to linux, shell scripting. How do I approach this, do I have to look into awk or shell scripting. Can you give a small sample code that would perform such task.
ANSWER: perl one liner solved the problem
perl -lane | scalar array
Well for starters, if you are only interested in working on columns 14-16, I would start by running
cut -d\ -f14-16 <input_file.log> | awk '{ one = match($1,/www/)
two = match($2,/www/)
three = match($3,/www/)
if (one)
print $1
else if(two)
print $2
else if(three)
Note: there are two spaces after the d\
You can then pretty easily just count up the urls that you see. I also think this would be solved a lot easier using a few lines of python or perl.
You can read line by line of input using the read bash command:
while read my_variable; do
echo "The text is: $my_variable"
done
To get input from a specific file, use the input redirect <:
while read my_variable; do
echo "The text is: $my_variable"
done < my_logfile
Now, to get the last column, you can use the ${var##* } construction. For example, if the variable my_var is the string some_file_name, then ${my_var##*_} is the same string, but whith everything before (and including) the last _ deleted.
We come up with:
while read line; do
echo "The last column is: ${line##* }"
done < my_logfile
If you want to echo it to another file, use the >> redirect:
while read line; do
echo "The last column is: ${line##* }" >> another_file
done < my_logfile
Now, to take away the querystring, you can use the same technique:
while read line; do
last_column="${line##* }"
url="${last_column%%\?*}"
echo "The last column without querystring is: $url" >> another_file
done < my_logfile
This time, we have %%?* instead of ##*? because we want to delete what's after the first ?, instead of before the last. (Note that I have escaped the character ?, which is special to bash.) You can read all about it here.
I didn't understand where to get the page hits, but I think the main idea is there.
EDIT: Now the code works. I had forgotten the do bash keywork. Also, we need to use >> instead of > in order not to overwrite the another_file every time we do echo "..." > another_file. By using >>, we append to the file. I have also corrected the %% instead of ##.
It's hard to say without a few lines of concrete sample input and expected output, but it sounds like all you need is:
awk -F'[ ?]' '{sum[$(NF-1)]++} END{for (url in sum) print url, sum[url]}' file
For example:
$ cat file
xyz abc ... ... http://www.google.com?querystring
xyz abc ... ... some other http://www.google.com?querystring1
xyz abc ... some stuff we ignore http://yahoo.com?querystring1
$
$ awk -F'[ ?]' '{sum[$(NF-1)]++} END{for (url in sum) print url, sum[url]}' file
http://www.google.com 2
http://yahoo.com 1

cat, grep & awk - both while read line & while read file in 1 loop?

Hi,
Thanks to alot of searching on stackoverflow (great resource!) last couple of days I succeeded in this, and even succeeded in the following issue, that was the output resulted in doubling of the lines everytime I ran the command. Thanks to an awk command which was able to remove double lines.
I'm pretty far in my search, but am missing 1 option.
Using both MacosX and linux by the way.
What I'm trying to do is parse through my notes (all plain text .md files), searching for words/tags in a text file (called greplist.txt), and parsing matched lines in separate text files with the same name as the searchword/tag (eg #computer.md).
Selection of contents of greplist.txt are:
#home
#computer
#Next
#Waiting
example contents of 2 .md files:
school.md:
* find lost schoolbooks #home
* do homework #computer
fun.md
* play videogame #computer
With this terminal command (that works great, but not perfect yet)
$ cat greplist.txt | while read line; do grep -h "$line" *.md >> $line.md.tmp; mv $line.md.tmp $line.md; awk '!x[$0]++' < $line.md > $line.md.tmp && mv $line.md.tmp $line.md ;done
Results
The result for #computer.md :
* do homework #computer
* play videogame #computer
And #home.md would look like this
* find lost schoolbooks #home
So far so great! Already really really happy with this. Especially since the added moving/renaming of the files, it is also for me possible to add extra tasks/lines to the # tag .md files, and be included in the file without being overwritten the next time I run the command. Awesomecakes!
Now the only thing I miss is that I wish that in the output of the # tag .md files behind the task also the output also list the filename (without extensions) in between brackets behind the search result (so that nvalt can use this as an internal link)
So the desired output of example #computer.md would become:
* do homework #computer [[school]]
* play videogame #computer [[fun]]
I tried playing around with this with the -l and -H in the grep command instead of -h, but the output it just gets messy somehow. (Not even tried adding the bracket yet!)
Another this I tried was this, but it doesn't do anything it seams. It does however illustrate probably what I'm trying to accomplish.
$ cat greplist.txt | while read line; do grep -h "$line" *.md | while read filename; do echo "$filename" >> $line.md.tmp; mv $line.md.tmp $line.md; awk '!x[$0]++' < $line.md > $line.md.tmp && mv $line.md.tmp $line.md ;done
So the million Zimbabwean dollar question is: How to do this. I tried and tried, but this is above my skill level atm. Very eager to find out the solution!
Thanks in advance.
Daniel Dennis de Wit
The outline solution seems like a fairly long-winded way to write the code. This script uses sed to write an awk script and then runs awk so that it reads its program from standard input and applies it to all the '.md' files that don't start with an #.
sed 's!.*!/&/ { name=FILENAME; sub(/\\.md$/, "", name); printf "%s [[%s]]\\n", $0, name > "&.md" }!' greplist.txt |
awk -f - [!#]*.md
The version of awk on Mac OS X will read its program from standard input; so will GNU awk. So, the technique it uses of writing the program on a pipe and reading the program from a pipe works with those versions. If the worst comes to the worst, you'll have to save the output of sed into a temporary file, have awk read the program from the temporary file, and then remove the temporary file. It would be straight-forward to replace the sed with awk, so you'd have one awk process writing an awk program and a second awk process executing the program.
The generated awk code looks like:
/#home/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#home.md" }
/#computer/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#computer.md" }
/#Next/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#Next.md" }
/#Waiting/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#Waiting.md" }
The use of ! in the sed script is simply the choice of a character that doesn't appear in the generated script. Determining the basename of the file on each line is not 'efficient'; if your files are big enough, you can add a line such as:
{ if (FILENAME != oldname) { name = FILENAME; sub(/\.md$/, "", name); oldname = FILENAME } }
to the start of the awk script (how many ways can you think of to do that?). You can then drop the per-line setting of name.
Do not attempt to run the program on the #topic.md files; it leads to confusion.
Try this one:
grep -f greplist.txt *.md | awk ' match($0, /(.*).md:(.*)(#.*)/, vars) { print vars[2], "[[" vars[1] "]]" >> vars[3]".md.out"} '
What it does:
grep will output matched patterns in greplist.txt in the .md files:
fun.md:* play videogame #computer
school.md:* find lost schoolbooks #home
school.md:* do homework #computer
finally awk will move the file name to the back in the format you want and append each line to the corressponding #.md.out* file:
* play videogame #computer [[fun]]
* find lost schoolbooks #home [[school]]
* do homework #computer [[school]]
I added the .out on the file name so that the next time you execute the command it will not include the #* files.
Note that I'm not sure if the awk script will work on the Mac OS X awk.

awk: setting environment variables directly from within an awk script

first post here, but been a lurker for ages. i have googled for ages, but cant find what i want (many abigious topic subjects which dont request what the topic suggests it does ...). not new to awk or scripting, just a little rusty :)
i'm trying to write an awk script which will set shell env values as it runs - for another bash script to pick up and use later on. i cannot simply use stdout from awk to report this value i want setting (i.e. "export whatever=awk cmd here"), as thats already directed to a 'results file' which the awkscript is creating (plus i have more than one variable to export in the final code anyway).
As an example test script, to demo my issue:
echo $MYSCRIPT_RESULT # returns nothing, not currently set
echo | awk -f scriptfile.awk # do whatever, setting MYSCRIPT_RESULT as we go
echo $MYSCRIPT_RESULT # desired: returns the env value set in scriptfile.awk
within scriptfile.awk, i have tried (without sucess)
1/) building and executing an adhoc string directly:
{
cmdline="export MYSCRIPT_RESULT=1"
cmdline
}
2/) using the system function:
{
cmdline="export MYSCRIPT_RESULT=1"
system(cmdline)
}
... but these do not work. I suspect that these 2 commands are creating a subshell within the shell awk is executing from, and doing what i ask (proven by touching files as a test), but once the "cmd"/system calls have completed, the subshell dies, unfortunatley taking whatever i have set with it - so my env setting changes dont stick from "the caller of awk"'s perspective.
so my question is, how do you actually set env variables within awk directly, so that a calling process can access these env values after awk execution has completed? is it actually possible?
other than the adhoc/system ways above, which i have proven fail for me, i cannot see how this could be done (other than writing these values to a 'random' file somewhere to be picked up and read by the calling script, which imo is a little dirty anyway), hence, help!
all ideas/suggestions/comments welcomed!
You cannot change the environment of your parent process. If
MYSCRIPT_RESULT=$(awk stuff)
is unacceptable, what you are asking cannot be done.
You can also use something like is described at
Set variable in current shell from awk
unset var
var=99
declare $( echo "foobar" | awk '/foo/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
The awk END clause is essential otherwise if there are no matches to the pattern declare dumps the current environment to stdout and doesn't change the content of your variable.
Multiple values can be set by separating them with spaces.
declare a=1 b=2
echo -e "a=$a\nb=$b"
NOTE: declare is bash only, for other shells, use eval with the same syntax.
You can do this, but it's a bit of a kludge. Since awk does not allow redirection to a file descriptor, you can use a fifo or a regular file:
$ mkfifo fifo
$ echo MYSCRIPT_RESULT=1 | awk '{ print > "fifo" }' &
$ IFS== read var value < fifo
$ eval export $var=$value
It's not really necessary to split the var and value; you could just as easily have awk print the "export" and just eval the output directly.
I found a good answer. Encapsulate averything in a subshell!
The comand declare works as below:
#Creates 3 variables
declare var1=1 var2=2 var3=3
ex1:
#Exactly the same as above
$(awk 'BEGIN{var="declare "}{var=var"var1=1 var2=2 var3=3"}END{print var}')
I found some really interesting uses for this technique. In the next exemple I have several partitions with labels. I create variables using the labels as variable names and the device name as variable values.
ex2:
#Partition data
lsblk -o NAME,LABEL
NAME LABEL
sda
├─sda1
├─sda2
├─sda5 System
├─sda6 Data
└─sda7 Arch
#Creates a subshell to execute the text
$(\
#Pipe lsblk to awk
lsblk -o NAME,LABEL | awk \
#Initiate the variable with the text for the declare command
'BEGIN{txt="declare "}'\
#Filters devices with labels Arch or Data
'/Data|Arch/'\
#Concatenate txt with itself plus text for the variables(name and value)
#substr eliminates the special caracters before the device name
'{txt=txt$2"="substr($1,3)" "}'\
#AWK prints the text and the subshell execute as a command
'END{print txt}'\
)
The end result of this is 2 variables: Data with value sda6 and Arch with value sda7.
The same exemple in a single line.
$(lsblk -o NAME,LABEL | awk 'BEGIN{txt="declare "}/Data|Arch/{txt=txt$2"="substr($1,3)" "}END{print txt}')

Read form more files using awk, how?

I would like to read more input files with awk. In every file in my folder starting with ftp_dst_ I want to run this little awk script.
for i in ftp_dst_*;
do
gawk -v a="$a" -v b="$b" -v fa="$fa" -v fb="$fb" -v max="$max" '
BEGIN{
FS=" ";
OFS="\t";
}
{
if ($8 == "nrecvdatabytes_")
{
b=a;
a=$1;
if (b!=0)
{
fa=a-b;
if (fa>max && fa!=0)
{
max=fa;
}
}
}
}
END{
print "lol";
#print flowid, max;
}
'./ftp_dst_*
done
So now ftp_dst_5, ftp_dst_6, ftp_dst_7 are in the folder so I should get 3 lines with lol in the command line. Of course this "print lol" is only a try, I want to get 3 values from the 3 files.
So how can I read from all these files using awk?
By using a glob in the argument, all the files are taken together as if they were one file. Without the shell for loop, you would get output one time. Since you have the for loop, you should be getting the output three times. Part of your problem may be that you need a space after the closing single quote or you may need to change the argument to "$i" as Karl Nordström suggested if you want each file to be considered separately.

printing previous field in AWK

I think awk will be the solution to my problem. My tools are limited b/c I'm using busybox on ESXi 4.0u1. I have a log file from a VM backup program (ghettoVCB). I need to scan this file for the expression
"Failed to clone disk : There is not enough space on the file system for the selected operation"
In my file, this is around line '43'. The previous field (in AWK vocab) represents the VM name that I want to print to an output text file. In my example the VM name is TEST12-RH4-AtlassianTest.
awk 'RS=""
/There is not enough space/ {
print $17
} '
print $17 is hard-coded, and I don't want this. I want to find the field that is one less than the first field on the line returned by the regex above. Any suggestions are appreciated.
[Awk Input File]
Update (Optimized version)
awk 'NR==1{print $NF}' RS="Failed to clone" input-awk.txt
Proof of Concept
$ awk 'NR==1{print $NF}' RS="Failed to clone" input-awk.txt
TEST12-RH4-AtlassianTest
Update 2 (Uber optimized version)
Technically, the following would be the uber optimized version but it leaves too much chance for false hits on the record separator, although it works for your sample input.
awk 'NR<2{print $NF}' RS="Fa" input-awk.txt`
Update 3 (Ultimate mega-kill optimized version)
I wouldn't use this in production code, but it just goes to show you there is always a way to make it simpler. If somebody can beat this for code golf purposes, I'd certainly like to see it!
awk '!a++,$0=$NF' RS="Fa" input-awk.txt
Original
Assuming your VM name is always the last field in the record you want to print, this works:
awk '/not enough space/{split(pre,a);print a[pNF]}{pre=$0;pNF=NF}' input-awk.txt
So couldn't you use something like
'{if it matches, print foo; foo=$17}'