Loop through file to count field number - awk

I have a script bash to add users from a .txt file.
It is really simple:
name firstname uid gid
space separated values
I want to check with awk if each row contains 4 fields. If yes I want to return 1, if not return 0.
file=/my/file.txt
awk=$(awk -F' ' '{(NF != 4) ? res = 0 : res = 1; print res)}' $file)
echo $awk
Right now, awk returns 1 for each row, but I want it to return 1 or 0 at the end, not for each line in the file.

On UNIX you'll return 0 in case of success and !=0 in case of an error. For me it makes more sense to return 0 when all records have 4 fields and 1 when not all records have 4 fields.
To achieve that, use exit:
awk 'NF!=4{exit 1}' file
FYI: awk will exit with 0 by default.
If you want to use it in a shell conditional:
#!/bin/bash
if ! awk 'NF!=4{exit 1}' file ; then
echo "file is invalid"
fi
PS: -F' ' in your example is superfluous because ' ' is the default field delimiter.

You can use:
awk 'res = NF!=4{exit} END{exit !res}' file
This will exit with 1 if all rows have 4 columns otherwise it will exist with 0

Subtle changes to your script would do
result=$(awk -F' ' 'BEGIN{flag=1}NF!=4{flag=0;exit}END{print flag}' "$file")
[ ${result:-0} -eq 0 ] && echo "Problematic entries found in file"
The approach
set the flag to 1 hoping that every record would contain 4 fields.
check if record actually contains 4 fields, if not set flag to zero and exit.
And exit would skip the rest of the input and go to the END rule.
print the flag and store it in result.
Check the result and proceed with the action course.

Related

Filter a file removing lines just with all 0

I need to remove rows from a file with all "0" in the differents columns
Example
seq_1
seq_2
seq_3
data_0
0
0
1
data_1
0
1
4
data_2
0
0
0
data_3
6
0
2
From the example, I need a new file just with the row of data_2. Because it has just all "0" numbers.
I was try using grep and awk but I dont know how to filter just between column $2:4
$ awk 'FNR>1{for(i=2;i<=NF;i++)if($i!=0)next}1' file
Explained:
$ awk 'FNR>1 { # process all data records
for(i=2;i<=NF;i++) # loop all data fields
if($i!=0) # once non-0 field is found
next # on to the next record
}1' file # output the header and all-0 records
Very poorly formated output as the sample data is in some kind of table format which it probably is not IRL:
seq_1 seq_2 seq_3
data_2 0 0 0
With awk you can rely on field string representation:
$ awk 'NR>1 && $2$3$4=="000"' test.txt > result.txt
Using sed, find lines matching a pattern of one or more spaces followed by a 0 (3 times) and if found print the line.
sed -nr '/\s+0\s+0\s+0/'p file.txt > new_file.txt
Or with awk, if columns 2, 3 and 4 are equal to a 0, print the line.
awk '{if ($2=="0" && $3=="0" && $4=="0"){print $0}}' file.txt > new_file.txt
EDIT: I ran the time command on these a bunch of times and the awk version is generally faster. Could add up if you are searching a large file. Of course your mileage may vary!

awk to get result of multiple lines in one sentence with if statement

I am new to awk and I was wondering if I could get one single result for an if operation on awk.
Example:
cat example.txt:
0
0
0
0
0
awk '{ if ($1==0) print "all zeros"; else print "there is 1"}'
result:
all zeros
all zeros
all zeros
all zeros
all zeros
I would like to have only one all zeros as answer or a TRUE . Is this the case where I should use an awk function to return something ? Thanks
Have your code in this way. Written and tested with shown samples.
awk '$0==0{count++} END{if(count==FNR){print "TRUE"}}' Input_file
OR
awk '$0==0{count++} END{if(count==FNR){print "All lines are zeroes"}}' Input_file
OR to print a message when some non-zero line(s) found:
awk '$0==0{count++} END{if(count==FNR){print "TRUE"} else{print "There is non-zero line(s) found."}}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
$0==0{ ##Checking condition if current line is zero then do following.
count++ ##Increasing count with 1 here.
}
END{ ##Starting END block of this program from here.
if(count==FNR){ ##Checking condition if count is equal to number of total lines of file.
print "TRUE" ##If above condition is TRUE then print TRUE here.
}
}
' Input_file ##Mentioning Input_file name here.
Here is an alternative using gnu-awk:
awk -v RS='^(0\r?\n)+$' '{print (NF ? "there is 1" : "all zeros")}' file
all zeros
I would do it following way using GNU AWK let file.txt content be
0
0
0
0
0
then
awk '{nonzero = nonzero || $1!="0"}END{print nonzero?"has not zero":"all zeros"}' file.txt
output
all zeros
Explanation: I am using nonzero to store information if some non-zero was already save (value 1) or not (value 0). If you are about variable which is not set in awk in arithmetic context then its value is 0, so I do not need to declare nonzero=0 in BEGIN section. I harness || which is logical or and might be described as follows:
if you did not see non-zero earlier and do not see it now that mean there is not non-zero element so far (0 || 0 == 0)
if you did not see non-zero earlier and do see it now that mean there is non-zero element so far (0 || 1 == 1)
if you did see non-zero earlier and do not see it now that mean there is non-zero element so far (1 || 0 == 1)
if you did see non-zero eralier and do see it now that mean there is non-zero element so far (1 || 1 == 1)
After processing all lines, in END section I print either has not zero or all zeros depending on nonzero value and harnessing ternary.
(tested in gawk 4.2.1)
Another:
$ awk '$0{exit v=1}END{printf "All %szeroes\n",(v?"not ":"")}' file
Output with sample data:
All zeroes
Alternative output:
All not zeroes
Explained:
$ awk '
$0 { # if record evaluates to non-zero
exit v=1 # jump to END with "parameter" 1
} # why continue once non-zero seen
END {
printf "All %szeroes\n",(v?"not ":"") # if "parameter" v was set, output "not"
}' file
The condition to examine $0 could of course be something more specific (like $0=="0") but it's sufficient for this purpose. exit v=1 sets var v to value 1 but it also exits the program once a non-zero value of $0 has been found and jumps to END where the value of v is examined. The program finally exits with exit code 1. If that is not acceptable, you need to exit from END explicitly with exit 0.

awk check for fields in CSV file smeeting certain criteria

I'm trying to write a simple file sanity check script. I have a directory with dozen CSV files containing id,edname,firstname,lastname,suffix,email.
I like to write a awk script to check if first field contain a number and is not empty. and fields number 3,4 & 6 are not empty and that the file contains 6 fields no more no less than 6, if all of this conditions are true nothing happens but if any of these conditions failed, re-name the file to .bad. Here is what i have done so far.
for f in *.csv; do
awk -F, '{ exit (NF ==6 ? 0:1) }' "$f" && echo mv "$f" "${f}.bad"
done
The actual answer, you find e.g. in 6.3.2.2 Comparison Operators of the GNU Awk online doc.:
You can use
x != y True if x is not equal to y
to compare if a field is not empty.
You can use
x ~ y True if the string x matches the regexp denoted by y
to check whether it matches a certain pattern.
Your awk script extended respectively:
{ exit (NF==6 && $1~/[1-9][0-9]*/ && $3!="" && $4!="" && $6!="") ? 0 : 1 }
A small demonstration:
$ cat >good.txt <<'EOF'
1,edname,firstname,lastname,suffix,email
2,edname,firstname,lastname,suffix,email
EOF
$ cat >bad_nr_fields.txt <<'EOF'
> 1,edname,firstname,lastname,suffix
> EOF
$ cat >bad_id.txt <<'EOF'
> A,edname,firstname,lastname,suffix,email
> EOF
$ cat >bad_firstname.txt << 'EOF'
> 1,edname,,lastname,suffix,email
> EOF
$ for FILE in good.txt bad_nr_fields.txt bad_id.txt bad_firstname.txt; do
> echo $FILE":"
> if awk -F, '{ exit (NF==6 && $1~/[1-9][0-9]*/ && $3!="" && $4!="" && $6!="") ? 0 : 1 }' "$FILE"; then echo "good"
> else echo "bad"
> fi
> done
good.txt:
good
bad_nr_fields.txt:
bad
bad_id.txt:
bad
bad_firstname.txt:
bad
$
Of course, I don't know which specific syntax your number for id has to match. In my case, I used the pattern for decimal integers which may not start with '0'. (This excludes number '0' as well.)

awk change column name as a file name

I want to replace my 5th column in the file with the file name using awk
I tried this
#!/bin/bash
For i in ls; do
awk '{$5 = "$i"; print}' $i > $i_edit
But I don´t know why I can not run it, Do you have any idea where is my error?
it doesn´t like my first line.
My file is like this inside.
The name of my file is Balteni_SV_ed2_MT_2016_D_P10_G+C_-4040m.
Maybe I have to remove the first and last line? But I have a lot of file like this. I don´t want to overwrite but edit it and the separation is space.
PROFILE Balteni_SV_ed2_M TYPE 3 unspecified m m
363923.46104 372500.00000 0 4040.000 Balteni_SV_ed2_MT_20 unspecified
363780.87963 372530.87963 0 4040.000 Balteni_SV_ed2_MT_20 unspecified
363750.00000 372535.75894 0 4040.000 Balteni_SV_ed2_MT_20 unspecified
EOD
I would like something like this:
PROFILE Balteni_SV_ed2_M TYPE 3 unspecified m m
363923.46104 372500.00000 0 4040.000 Balteni_SV_ed2_MT_2016_D_P10_G+C_-4040m unspecified
363780.87963 372530.87963 0 4040.000 Balteni_SV_ed2_MT_2016_D_P10_G+C_-4040m unspecified
363750.00000 372535.75894 0 4040.000 Balteni_SV_ed2_MT_2016_D_P10_G+C_-4040m unspecified
EOD
here is another way
for f in *; do awk '{$5=FILENAME}1' "$f" > "$f"_edited; done
to skip the first line add NR>1 qualifier and to skip the last line you can check the field counts, e.g.
for f in *; do awk 'NR>1 && NF>4 {$5=FILENAME}1' "$f" > "$f"_edited; done
All you actually need is:
awk 'FNR==1{close(f); f=FILENAME"_edit"} {$5=FILENAME; print > f}' *
and with your input to not modify the first and last lines:
awk 'FNR==1{close(f); f=FILENAME"_edit"} FNR>1 && NF>1{$5=FILENAME} {print > f}' *

How to compare two strings of a file match the strings of another file using AWK?

I possess 2 huge files and I need to count how many entries of file 1 exist on file 2.
The file 1 contains two ids, source and destination, like below:
11111111111111|22222222222222
33333333333333|44444444444444
55555555555555|66666666666666
11111111111111|44444444444444
77777777777777|22222222222222
44444444444444|00000000000000
12121212121212|77777777777777
01010101010101|01230123012301
77777777777777|97697697697697
66666666666666|12121212121212
The file 2 contains the valid id list, which will be used to filter file 1:
11111111111111
22222222222222
44444444444444
77777777777777
00000000000000
88888888888888
66666666666666
99999999999999
12121212121212
01010101010101
What I am struggling to achieve is find a way to count how many entries in file one possess the entry in file 2. Only when both numbers in the same line
exist in file 2 will the line be counted.
On file 2:
11111111111111|22222222222222 — This will be counted because both entries exist on file 2, as well as 77777777777777|22222222222222 because both entries exist on file 2.
33333333333333|44444444444444 — This will not be counted because 33333333333333 does not exist on file 2 and the same goes to 55555555555555|66666666666666, the first does not exist on file 2.
So in the examples I mentioned in the beginning it should count 6, and printing this should be enough, better than editing one file.
awk -F'|' 'FNR == NR { seen[$0] = 1; next }
seen[$1] && seen[$2] { ++count }
END { print count }' file2 file1
Explanation:
1) FNR == NR (number of record in current file equals number of record) is only true for the first input file, which is file2 (the order is important!). Thus for every line of file2, we record the number in seen.
2) For other lines (which is file1, given second on the command line) if the |-separated fields (-F'|') number 1 and 2 were both seen (in file2), we increment count by one.
3) In the END output the count.
Caveat: Every unique number in file2 is loaded into memory. But this also makes it fast instead of having to read through file2 over and over again.
Don't know how to do it in awk but if you are open to a quick-and-dirty bash script that someone can help make efficient, you could try this:
searcher.sh
-------------
#!/bin/bash
file1="$1"
file2="$2"
-- split by pipe
while IFS='|' read -ra line; do
-- find 1st item in file2. If found, find 2nd item in file2
grep -q ${line[0]} "$file2"
if [ $? -eq 0 ]; then
grep -q ${line[1]} "$file2"
if [ $? -eq 0 ]; then
-- print line since both items were found in file2
echo "${line[0]}|${line[1]}"
fi
fi
done < "$file1"
Usage
------
bash searcher.sh file1 file2
Result using your example
--------------------------
$ time bash searcher.sh file1 file2
11111111111111 | 22222222222222
11111111111111 | 44444444444444
77777777777777 | 22222222222222
44444444444444 | 00000000000000
12121212121212 | 77777777777777
66666666666666 | 12121212121212
real 0m1.453s
user 0m0.423s
sys 0m0.627s
That's really slow on my old PC.