AWK simple line replacement

AWK simple line replacement - awk

How do I get AWK to correctly exchange the values of data file for the string "greater" (than before) or "smaller" (than before)"?
I'm trying:
awk '{if ($1>prev); print ($1="greater"); prev="smaller"}' arraydatafile
arraydatafile file:
2
7
6
1
7
3
Desired output:
smaller
greater ##### because 7 is greater than previous which is 2 ..
smaller ##### because 6 is smaller than previous which is 7 ..
smaller ##### because 1 is smaller than previous which is 1 ..
greater ##### etc etc
smaller
I get a jarbled thing instead.
Extremely grateful for your insight on this.
UPDATE:
Theres a new data file for clarification of the task at hand:
arraydatafile:
2
7
6
6
1
7
3
desired output
smaller
greater
smaller
equal ##### will remove it later, would be nice if this could be done right from the script or one liner preferrably though
smaller
greater
smaller
How do I get a one liner that does this in AWK? Its just comparing the previous ones to the next ones and telling what they are in comparison to them. Ill then delete the "equal" output lines with another one liner instead of complicating this little script, in order to simplify the current task at hand.
The script replace.awk:
#!/bin/awk -f
NR>1 {
if($1==p){
# Skip identical lines
next
}
if($1>p){
print "smaller"
}else{
print "greater"
}
}
# Store previous value
{p=$1}
ran on the arraydatafile
7
6
6
1
7
3
yields the following
root#debian:/home/user/Documents/# awk -f replace.awk arraydatafile
smaller
greater
greater
smaller
greater
How do I get the desired output instead?

Like this:
NR>1 {
if($1==p){
# Skip identical lines
next
}
if($1>p){
print "smaller"
}else{
print "greater"
}
}
# Store previous value
{p=$1}

Related

extract specific row with numbers over N

I have a dataframe like this
1 3 MAPQ=0;CT=3to5;SRMAPQ=60
2 34 MAPQ=60;CT=3to5;SRMAPQ=67
4 56 MAPQ=67;CT=3to5;SRMAPQ=50
5 7 MAPQ=44;CT=3to5;SRMAPQ=61
with using awk (or others)
I want to extract rows with only SRMAPQ over 60.
This means the output is
2 34 MAPQ=60;CT=3to5;SRMAPQ=67
5 7 MAPQ=44;CT=3to5;SRMAPQ=61
update: "SRMAPQ=60" can be anywhere in the line,
MAPQ=44;CT=3to5;SRMAPQ=61;DT=3to5

You don't have to extract the value out of SRMAPQ separately and do the comparison. If the format is fixed like above, just use = as the field separator and access the last field using $NF
awk -F= '$NF > 60' file
Or if SRMAPQ can occur anywhere in the line (as updated in the comments), use a generic approach
awk 'match($0, /SRMAPQ=([0-9]+)/){ l = length("SRMAPQ="); v = substr($0, RSTART+l, RLENGTH-l) } v > 60' file

I would use GNU AWK following way let file.txt content be
1 3 MAPQ=0;CT=3to5;SRMAPQ=60
2 34 MAPQ=60;CT=3to5;SRMAPQ=67;SOMETHING=2
4 56 MAPQ=67;CT=3to5;SRMAPQ=50
5 7 MAPQ=44;CT=3to5;SRMAPQ=61
then
awk 'BEGIN{FS="SRMAPQ="}$2>60' file.txt
output
2 34 MAPQ=60;CT=3to5;SRMAPQ=67;SOMETHING=2
5 7 MAPQ=44;CT=3to5;SRMAPQ=61
Note: added SOMETHING to test if it would work when SRMAPQ is not last. Explantion: I set FS to SRMAPQ= thus what is before that becomes first field ($1) and what is behind becomes second field ($2). In 2nd line this is 67;SOMETHING=2 with which GNU AWK copes by converting its' longmost prefix which constitute number in this case 67, other lines have just numbers. Disclaimer: this solution assumes that all but last field have trailing ;, if this does not hold true please test my solution fully before usage.
(tested in gawk 4.2.1)

In a CSV file, subtotal 2 columns based on a third one, using AWK in KSH

Disclaimers:
1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
The "Problem":
I have an CSV file that looks like this:
c1,c2,c3,c4,c5,134.6,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0.18,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,416.09,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,12.1,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,480.64,,c8,c9,SERVER4,c11
c1,c2,c3,c4,c5,,83.65,c8,c9,SERVER5,c11
c1,c2,c3,c4,c5,,253.15,c8,c9,SERVER6,c11
c1,c2,c3,c4,c5,,18.84,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,8.12,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,22.45,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,117.81,c8,c9,SERVER8,c11
c1,c2,c3,c4,c5,,96.34,c8,c9,SERVER9,c11
Complementary facts:
1) File has 11 columns.
2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.
What I need as an output:
- A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
- If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
- **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.
Output sample, using the sample above as an input:
SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96
I've already found within these pages not one, but two AWK oneliners that PARTIALLY accomplish what it need:
awk -F "," 'NR==1{last=$10; sum=0;}{if (last != $10) {print last "," sum; last=$10; sum=0;} sum += $6;}END{print last "," sum;}' inputfile
awk -F, '{a[$10]+=$6;}END{for(i in a)print i","a[i];}' inputfile
My "problems" in both cases are the same:
- Subtotals of 0 are printed.
- I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.
Thanks in advance for your support people!
Regards,
Martín

something like this?
$ awk 'BEGIN{FS=OFS=","}
{s6[$10]+=$6; s7[$10]+=$7}
END{for(k in s6) print k,(s6[k]?s6[k]:""),(s7[k]?s7[k]:"")}' file | sort
SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34
note that your treatment of commas is not consistent, you're adding an extra one when the last field is zero (count the commas)

Your posted expected output doesn't seem to match your posted sample input so we're guessing but this might be what you're looking for:
$ cat tst.awk
BEGIN { FS=OFS="," }
$10 != prev {
if (NR > 1) {
print prev, sum6, sum7
}
sum6 = sum7 = ""
prev = $10
}
$6 { sum6 += $6 }
$7 { sum7 += $7 }
END { print prev, sum6, sum7 }
$ awk -f tst.awk file
SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

Awk: Append output to new field in existing file

Is there a way to print the output of an awk script to an existing file as a new field every time?
Hi!
I'm very new at awk (so my terminology might not be correct, sorry about that!) and I'm trying to print the output of a script that will operate on several hundred files to the same file, in different fields.
For example, my data files have this structure:
#File1
1
Values, 2, Hanna
20
15
Values, 2, Josh
30
56
Values, 2, Anna
50
70
#File2
2
Values, 2, Hanna
45
60
Values, 2, Josh
98
63
Values, 2, Anna
10
56
I have several of these files, which are divided by numbered month, with the same names, but different values. I want files that are named by the name of the person, and the values in fields by month, like so:
#Hanna
20 45
15 60
#Josh
30 98
56 63
#Anna
50 10
70 56
In my script, I search for the word "values", and determine which records to print (based on the number after "value"). This works fine. Then I want to print these values. It works fine for one file, with the command:
Print $0 > name #the varible name have I saved to be = $3 of the correct row
This creates three files correctly named "Hanna", "Josh" and "Anna", with their values. However, I would like to run the script for all my datafiles, and append them to only one "Hanna"-file etc, in a new field.
So what I'm looking for is something like print $0 > $month name, reading out like "print the record to the field corresponding to the month"
I have tried to find a solution, but most solutions either just paste temporary files together or append the values after the existing ones (so that they all are in field 1). I want to avoid the temporary files and have them in different fields (so that I get a kind of matrix-structure).
Thank you in advance!

try following, though I have not checked all permutations and combinations and only considered your post. Also your output Josh column is not consistent also(Or please do let us know if more conditions are there for same too). Let me know how it goes then.
awk 'FNR==NR{if($0 ~ /^Values/){Q=$NF;B[$NF]=$NF;i="";next};A[Q,++i]=$0;next} /^Values/{V=$NF;print "#"B[V];i="";next} B[V]{print A[V,++i],$0}' file1 file2
EDIT: Adding a non-one liner form of solution too.
awk 'FNR==NR{
if($0 ~ /^Values/){
Q=$NF;
B[$NF]=$NF;
i="";
next
};
A[Q,++i]=$0;
next
}
/^Values/{
V=$NF;
print "#"B[V];
i="";
next
}
B[V]{
print A[V,++i],$0
}
' file1 file2
EDIT2: Adding explanation too now for same.
awk 'FNR==NR{ ###Checking condition FNR==NR where this condition will be TRUE only when first file named file1 is being read. FNR and NR both indicate number of lines in a Input_file, only difference between them is FNR value will be RESET whenever there is next Input_file is being read and NR value will be keep on incresing till all the Input_files are read.
if($0 ~ /^Values/){ ###Checking here if any line starts from string Values if yes then perform following operations.
Q=$NF; ###Creating a variable named Q whose value is the last field of the line.
B[$NF]=$NF;###Creating an array named B whose index is $NF(last field of the line) and value is same too.
i=""; ###Making variable i value to NULL now.
next ###using next here, it is built-in keyword for awk and it will skip all further statements now.
};
A[Q,++i]=$0; ###Creating an array named A whose index is Q and variable i with increasing value with 1 to it, each time it comes on this statement.
next ###Using next will skip all further statements now.
}
/^Values/{ ###All statements from here will be executed when second file named file2 is being read. So I am checking here if a line starts from string Values then do following.
V=$NF; ###create variable V whose value is $NF of current line.
print "#"B[V]; ###printing the string # then value of array B whose index is variable V.
i=""; ###Nullifying the variable i value here.
next ###next will sip all the further statements now.
}
B[V]{ ###Checking here if array B with index V is having a value in it, then perform following on it too.
print A[V,++i],$0 ###printing the value of array A whose index is variable V and variable i increasing value with 1 and current line.
}
' file1 file2 ###Mentioning the Input_files here named file1 and file2.

AWK - find min value of each row with arbitrary size

I have a file with the lines as:
5 3 6 4 2 3 5
1 4 3 2 6 5 8
..
I want to get the min on each line, so for example with the input given above, I should get:
min of first line: 2
min of second line: 1
..
How can I use awk to do this for any arbitrary number of columns in each line?

If you don't mind the output using digits instead of words you can use this one liner:
$ awk '{m=$1;for(i=1;i<=NF;i++)if($i<m)m=$i;print "min of line",NR": ",m}' file
min of line 1: 2
min of line 2: 1
If you really do want to count in ordinal numbers:
BEGIN {
split("first second third fourth",count," ")
}
{
min=$1
for(i=1;i<=NF;i++)
if($i<min)
min=$i
print "min of",count[NR],"line: \t",min
}
Save this to script.awk and run like:
$ awk -f script.awk file
min of first line: 2
min of second line: 1
Obviously this will only work for files with upto 4 lines but just increase the ordinal numbers list to the maximum number you think you will need. You should be able to find a list online pretty easily.

Your problem is pretty simple. All you need to do is to define a variable min in the BEGIN part of your script, and at each line, you just have to perform a simple C-like algorithm for minimum element (set the first field as min, and then perform a check with the next field, and so on until you reach the final field of the line). The total number of fields in the line will be known to you because of the variable NF. So its just a matter of writing a for loop. Once the for loop is fully executed for the line, you will have the minimum element with you, and you could just print it.

Find values greater than or equal to 0.021 to 0 using awk

I have a large file with three sections. Part of it only has 3 columns, then it moves to the image section and it has 5 columns, then the rotation section which has 7 columns (I'm interested in changing the rotation section only).
I'm trying to get awk to produce a file that changes all negative values to 0 and all positive numbers greater than 0.2 to 0.
The values that I'm concerned with are only in column 7 and must be in the line containing the word ROTATION.
Here is my attempt.
awk BEGIN '/ROTATION/ {if (function abs($7) > = 0.021) $7=0; print}' awktest.tlt > awktest1.tlt
I need awk to keep the rest of the data in there as well and not just produce the change from X to 0.
As far as executing it I am using awk -f fix.awk awktest.tlt

This will do it:
awk 'NF==7&&/ROTATION/{if($7<0||$7>0.2)$7=0}1' file

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas