awk split returns index - awk

I want to obtain an array named ids, containing all those values from a string passed as a variable:
426
425
422
415
405
397
349
310
254
167
0
I found this code should work:
awk -v branches="426;425;422;415;405;397;349;310;254;167;0" 'BEGIN { split( branches, ids, ";" ); for (id in ids){print id}}'
However it gives me:
1
2
3
4
5
6
7
8
9
10
11
And if I take it out from the BEGIN block, it just stops there and outputs nothing...

if you want the values to be the keys to the array, you need to do one more step.
$ awk -v branches="426;425;422;415;405;397;349;310;254;167;0" '
BEGIN {n=split(branches,idV,";");
for(i=1;i<=n;i++) ids[idV[i]];
for(id in ids) print id}'
0
167
254
310
349
397
405
415
422
425
426
note that the values will not be in the same insertion order. This can be called more like a hash set, rather than an array.

Here is the perl version:
echo "426;425;422;415;405;397;349;310;254;167;0" | perl -ne ' chomp;print(join("\n",sort(split(/;/))))'
0
167
254
310
349
397
405
415
422
425
chomp - removes the newline.

Related

How to insert two lines for every data frame using awk?

I have repeating data as follows
....
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
5 4 3 16 22 247 0 40168 40911 40944 40205 40000 40562
6 4 4 17 154 93 309 0 40930 40919 40903 40917 40852 40000 40419
7 3 2 233 311 0 40936 40932 40874 40000 40807
....
This data is made up of 115 data blocks, and each data block have 4000 lines like that format.
Here, I hope to put two new lines (number of line per data block = 4000 and empty line) at the begining of each data blocks, so it looks
4000
1 4 4 244 263 704 952 0 40936 40930 40934 40921 40820 40000 40570
2 4 4 215 172 305 33 0 40945 40942 40937 40580 40687 40000 40410
3 4 4 344 279 377 1945 0 40933 40915 40907 40921 40839 40000 40437
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
...
3999 2 2 4079 4081 0 40873 40873 40746 40000 40634
4000 1 1 4080 0 40873 40923 40000 40345
4000
1 4 4 244 263 704 952 0 40936 40930 40934 40921 40820 40000 40570
2 4 4 215 172 305 33 0 40945 40942 40937 40580 40687 40000 40410
3 4 4 344 279 377 1945 0 40933 40915 40907 40921 40839 40000 40437
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
...
Can I do this with awk or any other unix command?
My solution is more general, since the blocks can be of non-equal lenght as long as you restart the 1st field counter to denote the beginning of a new block
% cat mark_blocks
$1<count { print count; print "";
for(i=1;i<=count;i++) print l[i]; }
# executed for each line
{ l[$1] = $0; count=$1}
END { print count; print "";
for(i=1;i<=count;i++) print l[i]; }
% awk -f mark_blocks your_data > marked_data
%
The working is simple, awk accumulates lines in memory and it prints the header lines and the accumulated data when it reaches a new block or EOF.
The (modest) trick is that the output action must take place before we do the usual stuff we do for each line.
A simple one liner using awk can do the purpose.
awk 'NR%4000==1{print "4000\n"} {print$0}' file
what it does.
print $0 prints every line.
NR%4000==1 selects the 4000th line. When it occures it prints a 4000 and a newline \n, that is two new lines.
NR Number of records, which is effectivly number of lines reads so far.
simple test.
inserts 4000 at 5th line
awk 'NR%5==1{print "4000\n"} {print$0}'
output:
4000
1
2
3
4
5
4000
6
7
8
9
10
4000
11
12
13
14
15
4000
16
17
18
19
20
4000
You can do it all in bash :
cat $FILE | ( let countmax=4000; let count=countmax; while read lin ; do if [ $count == $countmax ]; then let count=0; echo -e "$countmax\n" ; fi ; echo $lin ; let count=count+1 ; done )
Here we assume you are reading this data from $FILE . Then all we are doing is reading from the file and piping it into our little bash script.
The bash script reads lines one by one (with the while read lin) , and increments the counter countfor each line. When starting or when the counter count reaches the value countmax (set to 4000) , then it prints out the 2 lines you asked for.

join the contents of files into a new file

I have some text files as shown below. I would like to join the contents of these files into one.
file A
>AXC
145
146
147
>SDF
1
8
67
>FGH
file B
>AXC
>SDF
12
65
>FGH
123
156
190
Desired ouput
new file
>AXC
145
146
147
>SDF
1
8
67
12
65
>FGH
123
156
190
your help would be appreciated!
awk '
/^>/ { key=$0; if (!seen[key]++) keys[++numKeys] = key; next }
{ vals[key] = vals[key] ORS $0 }
END{ for (keyNr=1;keyNr<=numKeys;keyNr++) {key = keys[keyNr]; print key vals[key]} }
' fileA fileB
>AXC
145
146
147
>SDF
1
8
67
12
65
>FGH
123
156
190
If you really want the leading white space added to the ">SDF" values from fileA, tell us why that's the case for that one but not ">AXC" so we can code an appropriate solution.
A bit shorter than Ed's answer
awk '/^>/{a=$0;next}{x[a]=x[a]$0"\n"}END{for(i in x)printf"%s\n%s",i,x[i]}'
Blocks will be printed in an unspecified order.
RS=">" seperate records by > character
OFS="\n" is to have number it's own line.
a[i]=a[i] $0 add fields into array with index of first field.
rt=RT is for adding > character to index
$ awk 'BEGIN{ RS=">"; OFS="\n" }
{i=rt $1; $1=""; a[i]=a[i] $0; rt=RT; next}
END { for (i in a) {print i a[i] }}' d6 d5
>SDF
12
65
1
8
67
>FGH
123
156
190
>AXC
145
146
147

compare between two columns and subtract them

my question
i have one file
344 0
465 1
729 2
777 3
676 4
862 5
766 0
937 1
980 2
837 3
936 5
i need to compare each two pair (zero with zero, one with one and so on) if the value exist(any value of column two should exist two times) subtract 766-344 , 937-465 and so on if not exist like the forth value do nothing (4 exist one time so do nothing) the output
422
472
251
060
074
also i need to add index
example
1 422
2 472
3 251
4 060
5 074
finally i need to add this code as part of tcl script, or function of tcl porgram
I have a tcl script contain awk functions like this
set awkCBR0 {
{
if ($1 == "r" && $6 == 280) {
print $2, i >> "cbr0.q";
i +=1 ;
}
}
}
exec rm -f cbr0.q
exec touch cbr0.q
exec awk $awkCBR0 cbr.trq
thanks
Try this:
awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file
Output
$ awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file
1 422
2 472
3 251
4 060
5 074
I will leave it for you to add it to tcl function.

i need to match the values of columns [duplicate]

This question already has answers here:
how to match two lines and subtract them [closed]
(3 answers)
Match two columns and put them in one file
(2 answers)
Closed 8 years ago.
I have two files in each file i have tow columns ,i need to match the first value of column two of file1 with each value from column two of file2 if they are equal i need to subtract the value of the matched value of column one from each other, after that i need to combine column one from the two files in one file but the two columns should be adjacent to each other
if the two vales do not matched do nothing
file1
344 0
465 1
729 2
777 3
676 4
862 5
file2
766 0
937 1
980 2
237 3
736 5
example
422
208
251
I don't understand how you come up with your given output. This might help you:
$ join -j 2 file1 file2
0 344 766
1 465 937
2 729 980
3 777 237
5 862 736
But where you go from there, you need to provide more details.
To expand a bit on #glenn jackman's answer... perhaps this:
join -j 2 file1 file2 | awk '{if ($3>$2) print $3-$2; else print $2-$3}'
That will print the absolute difference between the two column 1's when there is a match in the column 2's. But it doesn't match your expected output, which, aside from the first value, doesn't seem to have an obvious relation to your inputs in any way that seems to match your question...
May be this can help:
awk '
BEGIN { printf "%s\t%s\t%s\n","File1","File2","Difference(f2-f1)" }
NR==FNR { a[$2]=$1; next }
{ printf "%d\t%d\t%d\n",a[$2],$1,$1-a[$2] }' file1 file2
Output:
File1 File2 Difference(f2-f1)
344 766 422
465 937 472
729 980 251
777 237 -540
862 736 -126

Count and sum column list

Not 100% sure how to do this. What I have does not add up.
awk -F, '{array[$1]+=$2} END { for (i in array) {print i array[i] }}' gaaa
Here is a example of gaaa
acic 4
acgic 56
acpdc 183
acic 1677
acpicvp
acsis 23
hidr 4
hidr 1133
aggr 24
Desired result would be:
acic 1681
acgic 56
acpdc 183
acpicvp
acsis 23
hidr 1137
aggr 24
You have set the field separator to a comma but there is no comma in your data. You want:
$ awk '{array[$1]+=$2}END{for (i in array) print i,array[i]}' gaaa
acsis 23
aggr 24
acgic 56
acpdc 183
hidr 1137
acpicvp 0
acic 1681