I would like to read more input files with awk. In every file in my folder starting with ftp_dst_ I want to run this little awk script.
for i in ftp_dst_*;
do
gawk -v a="$a" -v b="$b" -v fa="$fa" -v fb="$fb" -v max="$max" '
BEGIN{
FS=" ";
OFS="\t";
}
{
if ($8 == "nrecvdatabytes_")
{
b=a;
a=$1;
if (b!=0)
{
fa=a-b;
if (fa>max && fa!=0)
{
max=fa;
}
}
}
}
END{
print "lol";
#print flowid, max;
}
'./ftp_dst_*
done
So now ftp_dst_5, ftp_dst_6, ftp_dst_7 are in the folder so I should get 3 lines with lol in the command line. Of course this "print lol" is only a try, I want to get 3 values from the 3 files.
So how can I read from all these files using awk?
By using a glob in the argument, all the files are taken together as if they were one file. Without the shell for loop, you would get output one time. Since you have the for loop, you should be getting the output three times. Part of your problem may be that you need a space after the closing single quote or you may need to change the argument to "$i" as Karl Nordström suggested if you want each file to be considered separately.
Related
Is it possible to make an awk program to run on a file without giving it
the name of the file when you run the program.
i tried to do this but for some reason it does not work,can someone please explain
why this does not work and how do i make it work
#!/bin/awk -f
BEGIN{
ARGV[1]="books"
}
{
split($0,a,":")
if(a[4]!="-"){
n[a[2]]=n[a[2]]+1
print $0","n[a[2]]
}
}
END{
for(x in n){
print n[x]
}
}
in this program i tried to make it run on a file name books but it still does not work
As well as adding an element to ARGV, you need to increment ARGC:
#!/bin/awk -f
BEGIN{
ARGV[1]="books"
++ARGC
}
From the GNU AWK manual:
Storing additional elements [in ARGV] and incrementing ARGC causes additional files to be read.
I have several .seq files containing text.
I want to get a single text file containing :
name_of_the_seq_file1
contents of file 1
name_of_the_seq_file2
contents of file 1
name_of_the_seq_file3
contents of file 3
...
All the files are on the same directory.
It´s possible with awk or similar?? thanks !!!
If there can be empty files then you need:
with GNU awk:
awk 'BEGINFILE{print FILENAME}1' *.seq
with other awks (untested):
awk '
FNR==1 {
for (++ARGIND;ARGV[ARGIND]!=FILENAME;ARGIND++) {
print ARGV[ARGIND]
}
print FILENAME
}
{ print }
END {
for (++ARGIND;ARGIND in ARGV;ARGIND++) {
print ARGV[ARGIND]
}
}' *.seq
You can use the following command:
awk 'FNR==1{print FILENAME}1' *.seq
FNR is the record number (which is the line number by default) of the current input file. Each time awk starts to handle another file FNR==1, in this case the current filename get's printed trough {print FILENAME}.
The trailing 1 is an awk idiom. It always evaluates to true, which makes awk print all lines of input.
Note:
The above solution works only as long as you have no empty files in that folder. Check Ed Morton's great answer which points this out.
perl -lpe 'print $ARGV if $. == 1; close(ARGV) if eof' *.seq
$. is the line number
$ARGV is the name of the current file
close(ARGV) if eof resets the line number at the end of each file
Hi,
Thanks to alot of searching on stackoverflow (great resource!) last couple of days I succeeded in this, and even succeeded in the following issue, that was the output resulted in doubling of the lines everytime I ran the command. Thanks to an awk command which was able to remove double lines.
I'm pretty far in my search, but am missing 1 option.
Using both MacosX and linux by the way.
What I'm trying to do is parse through my notes (all plain text .md files), searching for words/tags in a text file (called greplist.txt), and parsing matched lines in separate text files with the same name as the searchword/tag (eg #computer.md).
Selection of contents of greplist.txt are:
#home
#computer
#Next
#Waiting
example contents of 2 .md files:
school.md:
* find lost schoolbooks #home
* do homework #computer
fun.md
* play videogame #computer
With this terminal command (that works great, but not perfect yet)
$ cat greplist.txt | while read line; do grep -h "$line" *.md >> $line.md.tmp; mv $line.md.tmp $line.md; awk '!x[$0]++' < $line.md > $line.md.tmp && mv $line.md.tmp $line.md ;done
Results
The result for #computer.md :
* do homework #computer
* play videogame #computer
And #home.md would look like this
* find lost schoolbooks #home
So far so great! Already really really happy with this. Especially since the added moving/renaming of the files, it is also for me possible to add extra tasks/lines to the # tag .md files, and be included in the file without being overwritten the next time I run the command. Awesomecakes!
Now the only thing I miss is that I wish that in the output of the # tag .md files behind the task also the output also list the filename (without extensions) in between brackets behind the search result (so that nvalt can use this as an internal link)
So the desired output of example #computer.md would become:
* do homework #computer [[school]]
* play videogame #computer [[fun]]
I tried playing around with this with the -l and -H in the grep command instead of -h, but the output it just gets messy somehow. (Not even tried adding the bracket yet!)
Another this I tried was this, but it doesn't do anything it seams. It does however illustrate probably what I'm trying to accomplish.
$ cat greplist.txt | while read line; do grep -h "$line" *.md | while read filename; do echo "$filename" >> $line.md.tmp; mv $line.md.tmp $line.md; awk '!x[$0]++' < $line.md > $line.md.tmp && mv $line.md.tmp $line.md ;done
So the million Zimbabwean dollar question is: How to do this. I tried and tried, but this is above my skill level atm. Very eager to find out the solution!
Thanks in advance.
Daniel Dennis de Wit
The outline solution seems like a fairly long-winded way to write the code. This script uses sed to write an awk script and then runs awk so that it reads its program from standard input and applies it to all the '.md' files that don't start with an #.
sed 's!.*!/&/ { name=FILENAME; sub(/\\.md$/, "", name); printf "%s [[%s]]\\n", $0, name > "&.md" }!' greplist.txt |
awk -f - [!#]*.md
The version of awk on Mac OS X will read its program from standard input; so will GNU awk. So, the technique it uses of writing the program on a pipe and reading the program from a pipe works with those versions. If the worst comes to the worst, you'll have to save the output of sed into a temporary file, have awk read the program from the temporary file, and then remove the temporary file. It would be straight-forward to replace the sed with awk, so you'd have one awk process writing an awk program and a second awk process executing the program.
The generated awk code looks like:
/#home/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#home.md" }
/#computer/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#computer.md" }
/#Next/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#Next.md" }
/#Waiting/ { name=FILENAME; sub(/\.md$/, "", name); printf "%s [[%s]]\n", $0, name > "#Waiting.md" }
The use of ! in the sed script is simply the choice of a character that doesn't appear in the generated script. Determining the basename of the file on each line is not 'efficient'; if your files are big enough, you can add a line such as:
{ if (FILENAME != oldname) { name = FILENAME; sub(/\.md$/, "", name); oldname = FILENAME } }
to the start of the awk script (how many ways can you think of to do that?). You can then drop the per-line setting of name.
Do not attempt to run the program on the #topic.md files; it leads to confusion.
Try this one:
grep -f greplist.txt *.md | awk ' match($0, /(.*).md:(.*)(#.*)/, vars) { print vars[2], "[[" vars[1] "]]" >> vars[3]".md.out"} '
What it does:
grep will output matched patterns in greplist.txt in the .md files:
fun.md:* play videogame #computer
school.md:* find lost schoolbooks #home
school.md:* do homework #computer
finally awk will move the file name to the back in the format you want and append each line to the corressponding #.md.out* file:
* play videogame #computer [[fun]]
* find lost schoolbooks #home [[school]]
* do homework #computer [[school]]
I added the .out on the file name so that the next time you execute the command it will not include the #* files.
Note that I'm not sure if the awk script will work on the Mac OS X awk.
awk 'BEGIN{OFS=","} FNR == 1
{if (NR > 1) {print fn,fnr,nl}
fn=FILENAME; fnr = 1; nl = 0}
{fnr = FNR}
/ERROR/ && FILENAME ~ /\.gz$/ {nl++}
{
cmd="gunzip -cd " FILENAME
cmd; close(cmd)
}
END {print fn,fnr,nl}
' /tmp/appscraps/* > /tmp/test.txt
the above scans all files in a given directory. prints the file name, number of lines in each file and number of lines found containing 'ERROR'.
im now trying to make it so that the script executes a command if any of the file it reads in isn't a regular file. i.e., if the file is a gzip file, then run a particular command.
above is my attempt to include the gunzip command in there and to do it on my own. unfortunately, it isn't working. also, i cannot "gunzip" all the files in the directory beforehand. this is because not all files in the directory will be "gzip" type. some will be regular files.
so i need the script to treat any .gz file it finds a different way so it can read it, count and print the number of lines that's in it, and the number of lines it found matching the pattern supplied (just as it would if the file had been a regular file).
any help?
This part of your script makes no sense:
{if (NR > 1) {print fn,fnr,nl}
fn=FILENAME; fnr = 1; nl = 0}
{fnr = FNR}
/ERROR/ && FILENAME ~ /\.gz$/ {nl++}
Let me restructure it a bit and comment it so it's clearer what it does:
{ # for every line of every input file, do the following:
# If this is the 2nd or subsequent line, print the values of these variables:
if (NR > 1) {
print fn,fnr,nl
}
fn = FILENAME # set fn to FILENAME. Since this will occur for the first line of
# every file, this is that value fn will have when printed above,
# so why not just get rid of fn and print FILENAME?
fnr = 1 # set fnr to 1. This is immediately over-written below by
# setting it to FNR so this is pointless.
nl = 0
}
{ # for every line of every input file, also do the following
# (note the unnecessary "}" then "{" above):
fnr = FNR # set fnr to FNR. Since this will occur for the first line of
# every file, this is that value fnr will have when printed above,
# so why not just get rid of fnr and print FNR-1?
}
/ERROR/ && FILENAME ~ /\.gz$/ {
nl++ # increment the value of nl. Since nl is always set to zero above,
# this will only ever set it to 1, so why not just set it to 1?
# I suspect the real intent is to NOT set it to zero above.
}
You also have the code above testing for a file name that ends in ".gz" but then you're running gunzip on every file in the very next block.
Beyond that, just call gunzip from shell as everyone else also suggested. awk is a tool for parsing text, it's not an environment from which to call other tools - that's what a shell is for.
For example, assuming your comment (prints the file name, number of lines in each file and number of lines found containing 'ERROR) accurately describes what you want your awk script to do and assuming it makes sense to test for the word "ERROR" directly in a ".gz" file using awk:
for file in /tmp/appscraps/*.gz
do
awk -v OFS=',' '/ERROR/{nl++} END{print FILENAME, NR+0, nl+0}' "$file"
gunzip -cd "$file"
done > /tmp/test.txt
Much clearer and simpler, isn't it?
If it doesn't make sense to test for the word ERROR directly in a ".gz" file, then you can do this instead:
for file in /tmp/appscraps/*.gz
do
zcat "$file" | awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'
gunzip -cd "$file"
done > /tmp/test.txt
To handle gz and non-gz files as you've now described in your comment below:
for file in /tmp/appscraps/*
do
case $file in
*.gz ) cmd="zcat" ;;
* ) cmd="cat" ;;
esac
"$cmd" "$file" |
awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'
done > /tmp/test.txt
I left out the gunzip since you don't need it as far as I can tell from your stated requirements. If I'm wrong, explain what you need it for.
I think it could be simpler than that.
With shell expansion you already have the file name (hence you can print it).
So you can do a loop over all the files, and for each do the following:
print the file name
zgrep -c ERROR $file (this outputs the number of lines containing 'ERROR')
zcat $file|wc -l (this will output the line numbers)
zgrep and zcat work on both plain text files and gzipped ones.
Assuming you don't have any spaces in the paths/filenames:
for f in /tmp/appscraps/*
do
n_lines=$(zcat "$f"|wc -l)
n_errors=$(zgrep -c ERROR "$f")
echo "$f $n_lines $n_errors"
done
This is untested but it should work.
You can use execute the following command for each file :
gunzip -t FILENAME; echo $?
It will pass print the exit code 0(for gzip files) or 1(corrupt/other file). Now you can compare the output using IF to execute the required processing.
i have a script which reads every line of a file and outputs based on certain match,
function tohyphen (o) {
split (o,a,"to[-_]")
split (a[2],b,"-")
if (b[1] ~ / /) { k=""; p=""; }
else { k=b[1]; p=b[2] }
if (p ~ / /) { p="" }
return k
}
print k, "is present in" , FILENAME
what i need to do is check if the value of k is present in say about 60 other files and print that filename and also it has to ignore the file which it was original reading, im currently doing this with grep , but the calling of grep so many times causes the cpu to go high, is there a way i can do this within the awk script itself.
You can try something like this with gnu awk.
gawk '/pattern to search/ { print FILENAME; nextfile }' *.files
You can replace your pipeline grep "$k" *.cfg | grep "something1" | grep "something2" | cut -d -f2,3,4 with the following single awk script:
awk -v k="$k" '$0~k&&/something1/&&/something2/{print $2,$3,$4}' *.cfg
You mention printing the filename in your question, in this case:
awk -v k="$k" '$0~k&&/something1/&&/something2/{print FILENAME;nextfile}' *.cfg