compare between two columns and subtract them

compare between two columns and subtract them - awk

my question
i have one file
344 0
465 1
729 2
777 3
676 4
862 5
766 0
937 1
980 2
837 3
936 5
i need to compare each two pair (zero with zero, one with one and so on) if the value exist(any value of column two should exist two times) subtract 766-344 , 937-465 and so on if not exist like the forth value do nothing (4 exist one time so do nothing) the output
422
472
251
060
074
also i need to add index
example
1 422
2 472
3 251
4 060
5 074
finally i need to add this code as part of tcl script, or function of tcl porgram
I have a tcl script contain awk functions like this
set awkCBR0 {
{
if ($1 == "r" && $6 == 280) {
print $2, i >> "cbr0.q";
i +=1 ;
}
}
}
exec rm -f cbr0.q
exec touch cbr0.q
exec awk $awkCBR0 cbr.trq
thanks

Try this:
awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file
Output
$ awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file
1 422
2 472
3 251
4 060
5 074
I will leave it for you to add it to tcl function.

Related

awk split returns index

I want to obtain an array named ids, containing all those values from a string passed as a variable:
426
425
422
415
405
397
349
310
254
167
0
I found this code should work:
awk -v branches="426;425;422;415;405;397;349;310;254;167;0" 'BEGIN { split( branches, ids, ";" ); for (id in ids){print id}}'
However it gives me:
1
2
3
4
5
6
7
8
9
10
11
And if I take it out from the BEGIN block, it just stops there and outputs nothing...

if you want the values to be the keys to the array, you need to do one more step.
$ awk -v branches="426;425;422;415;405;397;349;310;254;167;0" '
BEGIN {n=split(branches,idV,";");
for(i=1;i<=n;i++) ids[idV[i]];
for(id in ids) print id}'
0
167
254
310
349
397
405
415
422
425
426
note that the values will not be in the same insertion order. This can be called more like a hash set, rather than an array.

Here is the perl version:
echo "426;425;422;415;405;397;349;310;254;167;0" | perl -ne ' chomp;print(join("\n",sort(split(/;/))))'
0
167
254
310
349
397
405
415
422
425
chomp - removes the newline.

How to strict an area in file with awk?

I have a long text file and I need to provide computation with a table that is in this large text file, so I am trying to restrict the area and print only the table I need. The area I care about looks like:
Sums of squares of residuals for separate curves, including only individual weights
Curve No. of obs. Sum of squares
1 82 0.20971070
2 7200 13659.50038631
3 7443 15389.87972458
4 5843 10510.37305696
5 290 49918.40634886
6 1376 49974.57509390
7 694 8340.44771461
8 545 2476.43037281
9 349 1425.69687357
1111 1111 0101110 01110 11001 01111 11110 0 1 1 0.100D-02
UNWEIGHTED OBSERVATIONAL EQUATIONS
No. Curve Input Param. Correction Output Param. Standard Deviation
9 0 39.6398000000 0.0796573846 39.7194573846 0.6864389887
I tried this, but all file is printed
/Curve/ { in_f_format=0; next }
/UNWEIGHTED/ { in_f_format=1; next }
{print}
desired output
1 82 0.20971070
2 7200 13659.50038631
3 7443 15389.87972458
4 5843 10510.37305696
5 290 49918.40634886
6 1376 49974.57509390
7 694 8340.44771461
8 545 2476.43037281
9 349 1425.69687357

Update: according to your desired output, you can use this:
awk '/Curve/ { in_f_format=1; next } /^[[:space:]]*$/ { in_f_format=0; next } in_f_format'
If you only want the content between the two patterns, change your code to this would work:
/Curve/ { in_f_format=1; next }
/UNWEIGHTED/ { in_f_format=0; next }
in_f_format {print}
The things before the blocks are considered conditions, when a condition evaluates to true, then the block after it will be executed.
A block without a condition will be executed by default (when not skipped by next or other thing).
Additionally, a condition without a block will have {print} implied, so it can be saved here.
For example, file with the content you provided:
$ awk '/Curve/ { in_f_format=1; next } /UNWEIGHTED/ { in_f_format=0; next } in_f_format' file
1 82 0.20971070
2 7200 13659.50038631
3 7443 15389.87972458
4 5843 10510.37305696
5 290 49918.40634886
6 1376 49974.57509390
7 694 8340.44771461
8 545 2476.43037281
9 349 1425.69687357
1111 1111 0101110 01110 11001 01111 11110 0 1 1 0.100D-02
Another example, starting from Curve title line to before empty line:
$ awk '/Curve/ { in_f_format=1; } /^[[:space:]]*$/ { in_f_format=0; next } in_f_format' file
Curve No. of obs. Sum of squares
1 82 0.20971070
2 7200 13659.50038631
3 7443 15389.87972458
4 5843 10510.37305696
5 290 49918.40634886
6 1376 49974.57509390
7 694 8340.44771461
8 545 2476.43037281
9 349 1425.69687357
Unassigned variables have 0 or empty value by default, which evaluates to false.
The [[:space:]]* is for lines have space characters, if you want strictly speaking empty line, then just /^$/ where ^ means line-beginning and $ means line-ending.

count, groupby with sed, or awk

i want to perform two different sort and count on a file, based on each line's content.
1. i need to take the first column of a .tsv file
i would like to group by each line that starts with three digits, and keep only the three first digits, and for everything else, just sort and count the whole occurrence of the sentence in the first column.
Sample data:
687/878 9
890987 4
01a 55
1b 8743917
890a 34
abcdee 987
dfeqfe fkdjald
890897 34213
6878853 834
32fasd 53891
abcdee 8794371
abd 873
result:
687 2
890 3
01a 1
1b 1
32fasd 1
abd 1
dfeqfe 1
abcdee 2
I would also appreciate a solution that would
also take into account a sample input like
687/878 9
890987 4
01a 55
1b 8743917
890a 34
abcdee 987
dfeqfe 545
890897 34213
6878853 834
(632)fasd 53891
(88)abcdee 8794371
abd 873
so the first column may have values like (,), #, ', all kind of characters
so output will have two columns, the first with the values extracted, and the second with the new count, with the new values extracted from the source file.
Again preferred output format tsv.
so i need to extract all values that start with
^\d\d\d, and then for these three first digits, sort and count unique values,
but in a second pass, also do the same for each line, that does not start with 3 digits, but this time, keep the whole columns value and sort count by it.
what i have tried:
| sort | uniq -c | sort -nr for the lines that do start with ^\d\d\d, and
the same for those that do not fulfill the above regex, but is there a more elegant way using either sed or awk?

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ cnt[/^[0-9]{3}/ ? substr($1,1,3) : $1]++ }
END {
for (key in cnt) {
print (key !~ /^[0-9]{3}/), cnt[key], key, cnt[key]
}
}
$ awk -f tst.awk file | sort -k1,2n | cut -f3-
687 1
890 2
abcdee 1

You can try Perl
$ cat nefijaka.txt
687 878 9
890987 4
890a 34
abcdee 987
$ perl -lne ' /^(\d{3})|(\S+)/; $x=$1?$1:$2; $kv{$x}++; END { print "$_\t$kv{$_}" for (sort keys %kv) } ' nefijaka.txt
687 1
890 2
abcdee 1
$
You can pipe it to sort and get the values sorted..
$ perl -lne ' /^(\d{3})|(\S+)/; $x=$1?$1:$2; $kv{$x}++; END { print "$_\t$kv{$_}" for (sort keys %kv) } ' nefijaka.txt | sort -k2 -nr
890 2
abcdee 1
687 1
EDIT1:
$ cat nefijaka.txt2
687 878 9
890987 4
890a 34
abcdee 987
a word and then 23
$ perl -lne ' /^(\d{3})|(.+?\t)/; $x=$1?$1:$2; $x=~s/\t//g; $kv{$x}++; END { print "$_\t$kv{$_}" for (sort keys %kv) } ' nefijaka.txt2
687 1
890 2
a word and then 1
abcdee 1
$

How to insert two lines for every data frame using awk?

I have repeating data as follows
....
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
5 4 3 16 22 247 0 40168 40911 40944 40205 40000 40562
6 4 4 17 154 93 309 0 40930 40919 40903 40917 40852 40000 40419
7 3 2 233 311 0 40936 40932 40874 40000 40807
....
This data is made up of 115 data blocks, and each data block have 4000 lines like that format.
Here, I hope to put two new lines (number of line per data block = 4000 and empty line) at the begining of each data blocks, so it looks
4000
1 4 4 244 263 704 952 0 40936 40930 40934 40921 40820 40000 40570
2 4 4 215 172 305 33 0 40945 40942 40937 40580 40687 40000 40410
3 4 4 344 279 377 1945 0 40933 40915 40907 40921 40839 40000 40437
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
...
3999 2 2 4079 4081 0 40873 40873 40746 40000 40634
4000 1 1 4080 0 40873 40923 40000 40345
4000
1 4 4 244 263 704 952 0 40936 40930 40934 40921 40820 40000 40570
2 4 4 215 172 305 33 0 40945 40942 40937 40580 40687 40000 40410
3 4 4 344 279 377 1945 0 40933 40915 40907 40921 40839 40000 40437
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
...
Can I do this with awk or any other unix command?

My solution is more general, since the blocks can be of non-equal lenght as long as you restart the 1st field counter to denote the beginning of a new block
% cat mark_blocks
$1<count { print count; print "";
for(i=1;i<=count;i++) print l[i]; }
# executed for each line
{ l[$1] = $0; count=$1}
END { print count; print "";
for(i=1;i<=count;i++) print l[i]; }
% awk -f mark_blocks your_data > marked_data
%
The working is simple, awk accumulates lines in memory and it prints the header lines and the accumulated data when it reaches a new block or EOF.
The (modest) trick is that the output action must take place before we do the usual stuff we do for each line.

A simple one liner using awk can do the purpose.
awk 'NR%4000==1{print "4000\n"} {print$0}' file
what it does.
print $0 prints every line.
NR%4000==1 selects the 4000th line. When it occures it prints a 4000 and a newline \n, that is two new lines.
NR Number of records, which is effectivly number of lines reads so far.
simple test.
inserts 4000 at 5th line
awk 'NR%5==1{print "4000\n"} {print$0}'
output:
4000
1
2
3
4
5
4000
6
7
8
9
10
4000
11
12
13
14
15
4000
16
17
18
19
20
4000

You can do it all in bash :
cat $FILE | ( let countmax=4000; let count=countmax; while read lin ; do if [ $count == $countmax ]; then let count=0; echo -e "$countmax\n" ; fi ; echo $lin ; let count=count+1 ; done )
Here we assume you are reading this data from $FILE . Then all we are doing is reading from the file and piping it into our little bash script.
The bash script reads lines one by one (with the while read lin) , and increments the counter countfor each line. When starting or when the counter count reaches the value countmax (set to 4000) , then it prints out the 2 lines you asked for.

i need to match the values of columns [duplicate]

This question already has answers here:
how to match two lines and subtract them [closed]
(3 answers)
Match two columns and put them in one file
(2 answers)
Closed 8 years ago.
I have two files in each file i have tow columns ,i need to match the first value of column two of file1 with each value from column two of file2 if they are equal i need to subtract the value of the matched value of column one from each other, after that i need to combine column one from the two files in one file but the two columns should be adjacent to each other
if the two vales do not matched do nothing
file1
344 0
465 1
729 2
777 3
676 4
862 5
file2
766 0
937 1
980 2
237 3
736 5
example
422
208
251

I don't understand how you come up with your given output. This might help you:
$ join -j 2 file1 file2
0 344 766
1 465 937
2 729 980
3 777 237
5 862 736
But where you go from there, you need to provide more details.

To expand a bit on #glenn jackman's answer... perhaps this:
join -j 2 file1 file2 | awk '{if ($3>$2) print $3-$2; else print $2-$3}'
That will print the absolute difference between the two column 1's when there is a match in the column 2's. But it doesn't match your expected output, which, aside from the first value, doesn't seem to have an obvious relation to your inputs in any way that seems to match your question...

May be this can help:
awk '
BEGIN { printf "%s\t%s\t%s\n","File1","File2","Difference(f2-f1)" }
NR==FNR { a[$2]=$1; next }
{ printf "%d\t%d\t%d\n",a[$2],$1,$1-a[$2] }' file1 file2
Output:
File1 File2 Difference(f2-f1)
344 766 422
465 937 472
729 980 251
777 237 -540
862 736 -126

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

compare between two columns and subtract them - awk

Try this: awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file Output $ awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file 1 422 2 472 3 251 4 060 5 074 I will leave it for you to add it to tcl function.

Related

awk split returns index

How to strict an area in file with awk?

count, groupby with sed, or awk

How to insert two lines for every data frame using awk?

i need to match the values of columns [duplicate]

Categories

Resources