Need help in editing a awk code to print minimum within range - awk

Thanks to Origineil for his help in modifying the awk script. The awk script work perfetly right if the interval are whole number but if I use an interval of less than one like 0.2, it gives a wrong output. I have a file "sss" containing this data:
H34 5.0856 5.45563
H39 5.0857 5.45573
H26 6.4822 6.81033
H30 6.4822 6.81033
H32 6.4823 6.81043
H40 6.4824 6.81053
H33 7.6729 7.96531
H27 7.673 7.96541
H31 7.6731 7.96551
H38 7.6731 7.96551
H29 8.5384 8.80485
H28 8.5387 8.80514
H35 8.5387 8.80514
H37 8.5387 8.80514
H41 9.9078 10.1332
H36 9.9087 10.134
If I then run the awk command
awk '!e{e=$2+0.2;} $2-e>0{print "Range " ++i , c " entries. min: " min " max: " max ; e+=0.2; c=0; min=""} {if(!min)min=$2; c++; max=$2} END{print "Range " ++i , c " entries. min: " min " max: " max} ' ss
It gives the output with difference between the values are not up to 0.2 indicated in the script:
Range 1 2 entries. min: 5.0856 max: 5.0857
Range 2 1 entries. min: 6.4822 max: 6.4822
Range 3 1 entries. min: 6.4822 max: 6.4822
Range 4 1 entries. min: 6.4823 max: 6.4823
Range 5 1 entries. min: 6.4824 max: 6.4824
Range 6 1 entries. min: 7.6729 max: 7.6729
Range 7 1 entries. min: 7.673 max: 7.673
Range 8 1 entries. min: 7.6731 max: 7.6731
Range 9 1 entries. min: 7.6731 max: 7.6731
Range 10 1 entries. min: 8.5384 max: 8.5384
Range 11 1 entries. min: 8.5387 max: 8.5387
Range 12 1 entries. min: 8.5387 max: 8.5387
Range 13 1 entries. min: 8.5387 max: 8.5387
Range 14 1 entries. min: 9.9078 max: 9.9078
Range 15 1 entries. min: 9.9087 max: 9.9087
Can somebody help me out on this?
Thanks in advance.

By minimum and maximum I assume you mean the first and last entry seen within the range.
For the provided input I changed 18 to 17 so that not all "max" values were also the upper bounds of the range.
Script:
awk '!e{e=$1+4;} $1-e>0{print "Range " ++i , c " entries. min: " min " max: " max ; e+=4; c=0; min=""} {if(!min)min=$1; c++; max=$1} END{print "Range " ++i , c " entries. min: " min " max: " max} ' file
I introduced two variables min and max to track the entries.
Output:
Range 1 4 entries. min: 2 max: 6
Range 2 3 entries. min: 7 max: 10
Range 3 3 entries. min: 12 max: 14
Range 4 2 entries. min: 16 max: 17
Range 5 1 entries. min: 19 max: 19

Related

Fetching Millions Records going too slow

I'm trying to copy 60 Million Records in other Table with the Clustered Index with Fetching. but after 20 Million Records it's going too Slow. I dont no what should i do. can anybody help me please. hier is my Time counting.
1000000 Million Min: 1
2000000 Million Min: 0
3000000 Million Min: 0
4000000 Million Min: 2
5000000 Million Min: 2
6000000 Million Min: 1
7000000 Million Min: 0
8000000 Million Min: 1
9000000 Million Min: 0
10000000 Million Min: 1
11000000 Million Min: 0
12000000 Million Min: 1
13000000 Million Min: 1
14000000 Million Min: 0
15000000 Million Min: 1
16000000 Million Min: 1
17000000 Million Min: 1
18000000 Million Min: 0
19000000 Million Min: 1
20000000 Million Min: 3
21000000 Million Min: 3
22000000 Million Min: 4
23000000 Million Min: 5
24000000 Million Min: 4
25000000 Million Min: 4
26000000 Million Min: 4
27000000 Million Min: 4
28000000 Million Min: 5
29000000 Million Min: 5
30000000 Million Min: 5
31000000 Million Min: 6
32000000 Million Min: 7
33000000 Million Min: 7
34000000 Million Min: 8
35000000 Million Min: 8
36000000 Million Min: 9
37000000 Million Min: 8
38000000 Million Min: 10
39000000 Million Min: 10
40000000 Million Min: 11
41000000 Million Min: 11
42000000 Million Min: 11
43000000 Million Min: 12
44000000 Million Min: 11
45000000 Million Min: 12
46000000 Million Min: 12
47000000 Million Min: 14
48000000 Million Min: 13
49000000 Million Min: 13
50000000 Million Min: 14
51000000 Million Min: 15
52000000 Million Min: 14
53000000 Million Min: 16
54000000 Million Min: 18
55000000 Million Min: 18
56000000 Million Min: 20
57000000 Million Min: 19
58000000 Million Min: 21
59000000 Million Min: 19
declare
#RecNo Int
, #RecCount Int
, #RecordST nvarchar(max)
, #str_date datetime
, #end_date datetime;
Set #RecNo = 0
select #RecCount = 1000000
While 1 = 1
Begin
set #str_date = getdate();
Insert Into dbo.test2(
ap_id
,lipt_id
,li_cntr
)
select
ap_id
,lipt_id
,li_cntr
from dbo.test
order by ap_id, lipt_id, li_cntr
offset #RecNo rows
fetch next #RecCount rows only;
if ##ROWCOUNT = 0
break
Set #RecNo += 1000000;
set #end_date = GETDATE() ;
set #RecordST = cast(#RecNo as nvarchar(max)) + ' Million Min:'+cast(DATEDIFF(MINUTE,#str_date,#end_date) as nvarchar(max))
RAISERROR(#RecordST,0,0) WITH NOWAIT
end
First of all, you need drop all constraints like unique, PK and etc. It's the bottleneck for each insert in an existing table.
Secondly, If you insert into the table much more records than it has now then you may increase performance using SELECT INTO statement instead INSERT. But remember that SELECT INTO creates a new table so you would need to think how to append records which were there before.
Last but not the least, you can use loops and insert 1M record by batch.

High Redis latency in AWS (ElastiCache)

I'm trying to determine the cause of some high latency I'm seeing on my ElastiCache Redis node (cache.m3.medium). I gathered some data using the redis-cli latency test, running it from an EC2 instance in the same region/availability-zone as the ElastiCache node.
I see that the latency is quite good on average (~.5ms), but that there are some pretty high outliers. I don't believe that the outliers are due to network latency, as network ping tests between two EC2 instances don't exhibit these high spikes.
The Redis node is not under any load, and the metrics seem to look fine.
My questions are:
What might be causing the high max latencies?
Are these max latencies expected?
What other steps/tests/tools would you use to further diagnose the issue?
.
user#my-ec2-instance:~/redis-3.2.8$ ./src/redis-cli -h redis-host --latency-history -i 1
min: 0, max: 12, avg: 0.45 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 3, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.29 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.26 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.34 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.34 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.26 (96 samples) -- 1.00 seconds range
min: 0, max: 5, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.31 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.28 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.30 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.35 (96 samples) -- 1.01 seconds range
min: 0, max: 15, avg: 0.52 (95 samples) -- 1.01 seconds range
min: 0, max: 4, avg: 0.48 (94 samples) -- 1.00 seconds range
min: 0, max: 2, avg: 0.54 (94 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.38 (96 samples) -- 1.01 seconds range
min: 0, max: 8, avg: 0.55 (94 samples) -- 1.00 seconds range
I ran tests with several different node types, and found that bigger nodes performed much better. I'm using the cache.m3.xlarge type, which has provided more consistent network latency.

obtain averages of field 2 after grouping by field 1 with awk

I have a file with two fields containing numbers that I have sorted numerically based on field 1. The numbers in field 1 range from 1 to 200000 and the numbers in field 2 between 0 and 1. I want to obtain averages for both field 1 and field 2 in batches (based on rows).
Here is example input output when specifying batches of 4 rows:
1 0.12
1 0.34
2 0.45
2 0.40
50 0.60
301 0.12
899 0.13
1003 0.14
1300 0.56
1699 0.43
2100 0.25
2500 0.56
The output would be:
1.5 0.327
563.25 0.247
1899.75 0.45
Here you go:
awk -v n=4 '{s1 += $1; s2 += $2; if (++i % n == 0) { print s1/n, s2/n; s1=s2=0; } }'
Explanation:
Initialize n=4, the size of the batches
Collect the sums: sum of 1st column in s1, the 2nd in s2
Increment counter i by 1 (default initial value is 0, no need to set it)
If i is divisible by n with no remainder, then we print the averages, and reset the sum variables

How to sum up every 10 lines and calculate average using AWK?

I have a file containing N*10 lines, each line consisting of a number. I need to sum up every 10 lines and then print out an average for every such group. I know it's doable in awk, I just don't know how.
Try something like this:
$ cat input
1
2
3
4
5
6
2.5
3.5
4
$ awk '{sum+=$1} (NR%3)==0{print sum/3; sum=0;}' input
2
5
3.33333
(Adapt for 10-line blocks, obviously.)
May be something like this -
[jaypal:~/Temp] seq 20 > test.file
[jaypal:~/Temp] awk '
{sum+=$1}
(NR%10==0){avg=sum/10;print $1"\nTotal: "sum "\tAverage: "avg;sum=0;next}1' test.file
1
2
3
4
5
6
7
8
9
10
Total: 55 Average: 5.5
11
12
13
14
15
16
17
18
19
20
Total: 155 Average: 15.5
If you don't want all lines to be printed then the following would work.
[jaypal:~/Temp] awk '
{sum+=$1}
(NR%10==0){avg=sum/10;print "Total: "sum "\tAverage: "avg;sum=0;next}' test.file
Total: 55 Average: 5.5
Total: 155 Average: 15.5

Sum of every N lines ; awk

I have a file containing data in a single column .. I have to find the sum of every 4 lines and print the sum
That is, I have to compute sum of values from 0-3rd line sum of line 4 to 7,sum of lines 8 to 11 and so on .....
awk '{s+=$1}NR%4==0{print s;s=0}' file
if your file has remains
$ cat file
1
2
3
4
5
6
7
8
9
10
$ awk '{s+=$1}NR%4==0{print s;t+=s;s=0}END{print "total: ",t;if(s) print "left: " s}' file
10
26
total: 36
left: 19
$ cat file
1
2
3
4
5
6
7
8
$ awk '{subtotal+=$1} NR % 4 == 0 { print "subtotal", subtotal; total+=subtotal; subtotal=0} END {print "TOTAL", total}' file
subtotal 10
subtotal 26
TOTAL 36