awk to add one to column based on condition - awk

Trying to add a awk condition that will add one to $2 if the sum of $3 - $2is greater than one, The below has a syntax error on the if line. Thank you :).
file
2 21002880 21002881
17 3476163 3476186
11 108292759 108292760
2 218661210 218661265
2 21002865 21002866
desired
2 21002880 21002881
17 3476164 3476186
11 108292759 108292760
2 218661211 218661265
2 21002865 21002866
awk
awk 'BEGIN {FS=OFS="\t"} {sum+=$3-$2} # define FS, OFS, and sum
if((sum > 1)) { # condition check
print $1,$2+1,$3 # print desired output
next # goto next line
}
}1' file.txt

Another short one:
awk -v OFS="\t" '{$2+=($3-$2>1)}1' file
Output:
2 21002880 21002881
17 3476164 3476186
11 108292759 108292760
2 218661211 218661265
2 21002865 21002866

I missed a { and needed sum=$3-$2
awk 'BEGIN {FS=OFS="\t"} {sum=$3-$2} # define FS, OFS, and sum
{
if((sum > 1)) { # condition check
print $1,$2+1,$3 # print desired output
next # goto next line
}
}1' file.txt

Related

Count rows and columns for multiple CSV files and make new file

I have multiple large comma separated CSV files in a directory. But, as a toy example:
one.csv has 3 rows, 2 columns
two.csv has 4 rows 5 columns
This is what the files look like -
# one.csv
a b
1 1 3
2 2 2
3 3 1
# two.csv
c d e f g
1 4 1 1 4 1
2 3 2 2 3 2
3 2 3 3 2 3
4 1 4 4 1 4
The goal is to make a new .txt or .csv that gives the rows and columns for each:
one 3 2
two 4 5
To get the rows and columns (and dump it into a file) for a single file
$ awk -F "," '{print NF}' *.csv | sort | uniq -c > dims.txt
But I'm not understanding the syntax to get counts for multiple files.
What I've tried
$ awk '{for (i=1; i<=2; i++) -F "," '{print NF}' *.csv$i | sort | uniq -c}'
With any awk, you could try following awk program.
awk '
FNR==1{
if(cols && rows){
print file,rows,cols
}
rows=cols=file=""
file=FILENAME
sub(/\..*/,"",file)
cols=NF
next
}
{
rows=(FNR-1)
}
END{
if(cols && rows){
print file,rows,cols
}
}
' one.csv two.csv
Explanation: Adding detailed explanation for above solution.
awk ' ##Starting awk program from here.
FNR==1{ ##Checking condition if this is first line of each line then do following.
if(cols && rows){ ##Checking if cols AND rows are NOT NULL then do following.
print file,rows,cols ##Printing file, rows and cols variables here.
}
rows=cols=file="" ##Nullifying rows, cols and file here.
file=FILENAME ##Setting FILENAME value to file here.
sub(/\..*/,"",file) ##Removing everything from dot to till end of value in file.
cols=NF ##Setting NF values to cols here.
next ##next will skip all further statements from here.
}
{
rows=(FNR-1) ##Setting FNR-1 value to rows here.
}
END{ ##Starting END block of this program from here.
if(cols && rows){ ##Checking if cols AND rows are NOT NULL then do following.
print file,rows,cols ##Printing file, rows and cols variables here.
}
}
' one.csv two.csv ##Mentioning Input_file names here.
Using gnu awk you can do this in a single awk:
awk -F, 'ENDFILE {
print gensub(/\.[^.]+$/, "", "1", FILENAME), FNR-1, NF-1
}' one.csv two.csv > dims.txt
cat dims.txt
one 3 2
two 4 5
You will need to iterate over all CSVs print the name for each file and the dimensions
for i in *.csv; do awk -F "," 'END{print FILENAME, NR, NF}' $i; done > dims.txt
If you want to avoid awk you can also do it wc -l for lines and grep -o "CSV-separator" | wc -l for fields
I would harness GNU AWK's ENDFILE for this task as follows, let content of one.csv be
1,3
2,2
3,1
and two.csv be
4,1,1,4,1
3,2,2,3,2
2,3,3,2,3
1,4,4,1,4
then
awk 'BEGIN{FS=","}ENDFILE{print FILENAME, FNR, NF}' one.csv two.csv
output
one.csv 3 2
two.csv 4 5
Explanation: ENDFILE is executed after processing every file, I set FS to , assuming that fields are ,-separated and there is not , inside filed, FILENAME, FNR, NF are built-in GNU AWK variables: FNR is number of current row in file, i.e. in ENDFILE number of last row, NF is number of fileds (again of last row). If you have files with headers use FNR-1, if you have rows prepended with row number use NF-1.
edit: changed NR to FNR
Without GNU awk you can use the shell plus POSIX awk this way:
for fn in *.csv; do
cols=$(awk '{print NF; exit}' "$fn")
rows=$(awk 'END{print NR-1}' "$fn")
printf "%s %s %s\n" "${fn%.csv}" "$rows" "$cols"
done
Prints:
one 3 2
two 4 5

selecting columns in awk discarding corresponding header

How to properly select columns in awk after some processing. My file here:
cat foo
A;B;C
9;6;7
8;5;4
1;2;3
I want to add a first column with line numbers and then extract some columns of the result. For the example let's get the new first (line numbers) and third columns. This way:
awk -F';' 'FNR==1{print "linenumber;"$0;next} {print FNR-1,$1,$3}' foo
gives me this unexpected output:
linenumber;A;B;C
1 9 7
2 8 4
3 1 3
but expected is (note B is now the third column as we added linenumber as first):
linenumber;B
1;6
2;5
3;2
[fixed and revised]
To get your expected output, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$(FNR==1?3:1)
}' file
Output:
linenumber;C
1;9
2;8
3;1
To add a column with line number and extract first and last columns, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$1,$NF
}' file
Output this time:
linenumber;A;C
1;9;7
2;8;4
3;1;3
Why do you print $0 (the complete record) in your header? And, if you want only two columns in your output, why to you print 3 (FNR-1, $1 and $3)? Finally, the reason why your output field separators are spaces instead of the expected ; is simply that... you did not specify the output field separator (OFS). You can do this with a command line variable assignment (OFS=\;), as shown in the second and third versions below, but also using the -v option (-v OFS=\;) or in a BEGIN block (BEGIN {OFS=";"}) as you wish (there are differences between these 3 methods but they don't matter here).
[EDIT]: see a generic solution at the end.
If the field you want to keep is the second of the input file (the B column), try:
$ awk -F\; 'FNR==1 {print "linenumber;" $2; next} {print FNR-1 ";" $2}' foo
linenumber;B
1;6
2;5
3;2
or
$ awk -F\; 'FNR==1 {print "linenumber",$2; next} {print FNR-1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Note that, as long as you don't want to keep the first field of the input file ($1), you could as well overwrite it with the line number:
$ awk -F\; '{$1=FNR==1?"linenumber":FNR-1; print $1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Finally, here is a more generic solution to which you can pass the list of indexes of the columns of the input file you want to print (1 and 3 in this example):
$ awk -F\; -v cols='1;3' '
BEGIN { OFS = ";"; n = split(cols, c); }
{ printf("%s", FNR == 1 ? "linenumber" : FNR - 1);
for(i = 1; i <= n; i++) printf("%s", OFS $(c[i]));
printf("\n");
}' foo
linenumber;A;C
1;9;7
2;8;4
3;1;3

Filter logs with awk for last 100 lines

I can filter the last 500 lines using tail or grep
tail --line 500 my_log | grep "ERROR"
What is the equivalent command for using awk
How can I add no of lines in below command
awk '/ERROR/' my_log
awk don't know about end of file until it change of reading file but you can read twhice the file, first time to find the end, second to treat line that are in the scope. You could also keep the X last line in a buffer but it's a bit heavy in memory consuption and process. Notice that the file need to be mentionned twice at the end for this.
awk 'FNR==NR{L=NR-500;next};FNR>=L && /ERROR/{ print FNR":"$0}' my_log my_log
With explanaition
awk '# first reading
FNR==NR{
#last line is this minus 500
LL=NR-500
# go to next line (for this file)
next
}
# at second read (due to previous section filtering)
# if line number is after(included) LL AND error is on the line content, print it
FNR >= LL && /ERROR/ { print FNR ":" $0 }
' my_log my_log
on gnu sed
sed '$-500,$ {/ERROR/ p}' my_log
As you had no sample data to test with, I'll show with just numbers using seq 1 10. This one stores last n records and prints them out in the end:
$ seq 1 10 |
awk -v n=3 '{a[++c]=$0;delete a[c-n]}END{for(i=c-n+1;i<=c;i++)print a[i]}'
8
9
10
If you want to filter the data add for example /ERROR/ before {a[++c]=$0; ....
Explained:
awk -v n=3 '{ # set wanted amount of records
a[++c]=$0 # hash to a
delete a[c-n] # delete the ones outside of the window
}
END { # in the end
for(i=c-n+1;i<=c;i++) # in order
print a[i] # output records
}'
Could you please try following.
tac Input_file | awk 'FNR<=100 && /error/' | tac
In case you want to add number of lines in awk command then try following.
awk '/ERROR/{print FNR,$0}' Input_file

awk to copy and move of file last line to previous line above

In the awk below I am trying to move the last line only, to the one above it. The problem with the below is that since my input file varies (not always 4 lines like in the below), I can not use i=3 everytime and can not seem to fix it. Thank you :).
file
this is line 1
this is line 2
this is line 3
this is line 4
desired output
this is line 1
this is line 2
this is line 4
this is line 3
awk (seems like the last line is being moved, but to i=2)
awk '
{lines[NR]=$0}
END{
print lines[1], lines[NR];
for (i=3; i<NR; i++) {print lines[i]}
}
' OFS=$'\n' file
this is line 1
this is line 2
this is line 4
this is line 3
$ seq 4 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
4
3
$ seq 7 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
3
4
5
7
6
try following awk once:
awk '{a[FNR]=$0} END{for(i=1;i<=FNR-2;i++){print a[i]};print a[FNR] ORS a[FNR-1]}' Input_file
Explanation: Creating an array named a with index FNR(current line's number) and keeping it's value to current line's value. Now in END section of awk, starting a for loop from i=1 to i<=FNR-2 why till FNR-2 because you need to swap only last 2 lines here. Once it prints all the lines then simply printing a[FNR](which is last line) and then printing a[FNR-1] with ORS(to print new line).
Solution 2nd: By counting the number of lines in a Input_file and putting them into a awk variable.
awk -v lines=$(wc -l < Input_file) 'FNR==(lines-1){val=$0;next} FNR==lines{print $0 ORS val;next} 1' Input_file
You nearly had it. You just have to change the order.
awk '
{lines[NR]=$0}
END{
for (i=1; i<NR-1; i++) {print lines[i]}
print lines[NR];
print lines[NR-1];
}
' OFS=$'\n' file
I'd reverse the file, swap the first two lines, then re-reverse the file
tac file | awk 'NR==1 {getline line2; print line2} 1' | tac

How to subtract a constant number from a column

Is there a way to subtract the smallest value from all the values of a column? I need to subtract the first number in the 1st column from all other numbers in the first column.
I wrote this script, but it's not giving the right result:
$ awk '{$1 = $1 - 1280449530}' file
1280449530 452
1280449531 2434
1280449531 2681
1280449531 2946
1280449531 1626
1280449532 3217
1280449532 4764
1280449532 4501
1280449532 3372
1280449533 4129
1280449533 6937
1280449533 6423
1280449533 4818
1280449534 4850
1280449534 8980
1280449534 8078
1280449534 6788
1280449535 5587
1280449535 10879
1280449535 9920
1280449535 8146
1280449536 6324
1280449536 12860
1280449536 11612
What you have essentially works, you're just not outputting it. This will output what you want:
awk '{print ($1 - 1280449530) " " $2}' file
You can also be slightly cleverer and not hardcode the shift amount:
awk '{
if(NR == 1) {
shift = $1
}
print ($1 - shift) " " $2
}' file
You were on the right track:
awk '{$1 = $1 - 1280449530; print}' file
Here is a simplified version of Michael's second example:
awk 'NR == 1 {origin = $1} {$1 = $1 - origin; print}' file
bash shell script
#!/bin/bash
exec 4<"file"
read col1 col2<&4
while read -r n1 n2 <&4
do
echo $((n1-$col1))
# echo "scale=2;$n1 - $col1" | bc # dealing with decimals..
done
exec >&4-
In vim you can select the column with
and go to the bottom of the page with
G
then
e
to go to the end of the number
then you may enter the number like 56
56
this will add 56 to the column