High Redis latency in AWS (ElastiCache) - redis

I'm trying to determine the cause of some high latency I'm seeing on my ElastiCache Redis node (cache.m3.medium). I gathered some data using the redis-cli latency test, running it from an EC2 instance in the same region/availability-zone as the ElastiCache node.
I see that the latency is quite good on average (~.5ms), but that there are some pretty high outliers. I don't believe that the outliers are due to network latency, as network ping tests between two EC2 instances don't exhibit these high spikes.
The Redis node is not under any load, and the metrics seem to look fine.
My questions are:
What might be causing the high max latencies?
Are these max latencies expected?
What other steps/tests/tools would you use to further diagnose the issue?
.
user#my-ec2-instance:~/redis-3.2.8$ ./src/redis-cli -h redis-host --latency-history -i 1
min: 0, max: 12, avg: 0.45 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 3, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.29 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.26 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.34 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.34 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.26 (96 samples) -- 1.00 seconds range
min: 0, max: 5, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.31 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.28 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.30 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.35 (96 samples) -- 1.01 seconds range
min: 0, max: 15, avg: 0.52 (95 samples) -- 1.01 seconds range
min: 0, max: 4, avg: 0.48 (94 samples) -- 1.00 seconds range
min: 0, max: 2, avg: 0.54 (94 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.38 (96 samples) -- 1.01 seconds range
min: 0, max: 8, avg: 0.55 (94 samples) -- 1.00 seconds range

I ran tests with several different node types, and found that bigger nodes performed much better. I'm using the cache.m3.xlarge type, which has provided more consistent network latency.

Related

How can I group a continuous column (0-1) into equal sizes? Scala spark

I have a dataframe column that I want to split into equal size buckets. The values in this column are floats between 0-1. Most of the data is skewed, so most values fall in the 0.90's and 1.
Bucket 10: All 1's (the size of this bucket will be different from 2-9 and 1)
Bucket 2-9: Any values > 0 and < 1 (equal sized)
Bucket 1: All 0's (the size of this bucket will be different from 2-9 and 10)
Example:
continous_number_col
Bucket
0.001
2
0.95
9
1
10
0
1
This should be how it looks when I groupBy("Bucket")
Counts of bucket 1 and 10 aren't significant here, they will just be in their own bucket.
And the 75 count will be different, just using as an example.
Bucket
Count
Values
1
1000
0
2
75
0.01 - 0.50
3
75
0.51 - 0.63
4
75
0.64 - 0.71
5
75
0.72 - 0.83
6
75
0.84 - 0.89
7
75
0.90 - 0.92
8
75
0.93 - 0.95
9
75
0.95 - 0.99
10
2000
1
I've tried using the QuantileDiscretizer() Function as this:
val df = {
rawDf
//Taking 1's and 0's out for the moment
.filter(col("continuous_number_col") =!= 1 && col("continuous_number_col") =!= 0)
}
val discretizer = new QuantileDiscretizer()
.setInputCol("continuous_number_col")
.setOutputCol("bucket_result")
.setNumBuckets(8)
val result = discretizer.fit(df).transform(df)
However, this gives me the following, not equal buckets:
bucket_result
count
7.0
20845806
6.0
21096698
5.0
21538813
4.0
21222511
3.0
21193393
2.0
21413413
1.0
21032666
0.0
21681424
Hopefully this gives enough context to what I'm trying to do. Thanks in advance.

Fetching Millions Records going too slow

I'm trying to copy 60 Million Records in other Table with the Clustered Index with Fetching. but after 20 Million Records it's going too Slow. I dont no what should i do. can anybody help me please. hier is my Time counting.
1000000 Million Min: 1
2000000 Million Min: 0
3000000 Million Min: 0
4000000 Million Min: 2
5000000 Million Min: 2
6000000 Million Min: 1
7000000 Million Min: 0
8000000 Million Min: 1
9000000 Million Min: 0
10000000 Million Min: 1
11000000 Million Min: 0
12000000 Million Min: 1
13000000 Million Min: 1
14000000 Million Min: 0
15000000 Million Min: 1
16000000 Million Min: 1
17000000 Million Min: 1
18000000 Million Min: 0
19000000 Million Min: 1
20000000 Million Min: 3
21000000 Million Min: 3
22000000 Million Min: 4
23000000 Million Min: 5
24000000 Million Min: 4
25000000 Million Min: 4
26000000 Million Min: 4
27000000 Million Min: 4
28000000 Million Min: 5
29000000 Million Min: 5
30000000 Million Min: 5
31000000 Million Min: 6
32000000 Million Min: 7
33000000 Million Min: 7
34000000 Million Min: 8
35000000 Million Min: 8
36000000 Million Min: 9
37000000 Million Min: 8
38000000 Million Min: 10
39000000 Million Min: 10
40000000 Million Min: 11
41000000 Million Min: 11
42000000 Million Min: 11
43000000 Million Min: 12
44000000 Million Min: 11
45000000 Million Min: 12
46000000 Million Min: 12
47000000 Million Min: 14
48000000 Million Min: 13
49000000 Million Min: 13
50000000 Million Min: 14
51000000 Million Min: 15
52000000 Million Min: 14
53000000 Million Min: 16
54000000 Million Min: 18
55000000 Million Min: 18
56000000 Million Min: 20
57000000 Million Min: 19
58000000 Million Min: 21
59000000 Million Min: 19
declare
#RecNo Int
, #RecCount Int
, #RecordST nvarchar(max)
, #str_date datetime
, #end_date datetime;
Set #RecNo = 0
select #RecCount = 1000000
While 1 = 1
Begin
set #str_date = getdate();
Insert Into dbo.test2(
ap_id
,lipt_id
,li_cntr
)
select
ap_id
,lipt_id
,li_cntr
from dbo.test
order by ap_id, lipt_id, li_cntr
offset #RecNo rows
fetch next #RecCount rows only;
if ##ROWCOUNT = 0
break
Set #RecNo += 1000000;
set #end_date = GETDATE() ;
set #RecordST = cast(#RecNo as nvarchar(max)) + ' Million Min:'+cast(DATEDIFF(MINUTE,#str_date,#end_date) as nvarchar(max))
RAISERROR(#RecordST,0,0) WITH NOWAIT
end
First of all, you need drop all constraints like unique, PK and etc. It's the bottleneck for each insert in an existing table.
Secondly, If you insert into the table much more records than it has now then you may increase performance using SELECT INTO statement instead INSERT. But remember that SELECT INTO creates a new table so you would need to think how to append records which were there before.
Last but not the least, you can use loops and insert 1M record by batch.

Comparing two columns in two files using awk with duplicates

File 1
A4gnt 0 0 0 0.3343
Aaas 2.79 2.54 1.098 0.1456
Aacs 0.94 0.88 1.063 0.6997
Aadac 0 0 0 0.3343
Aadacl2 0 0 0 0.3343
Aadat 0.01 0 1.723 0.7222
Aadat 0.06 0.03 1.585 0.2233
Aaed1 0.28 0.24 1.14 0.5337
Aaed1 1.24 1.27 0.976 0.9271
Aaed1 15.91 13.54 1.175 0.163
Aagab 1.46 1.14 1.285 0.3751
Aagab 6.12 6.3 0.972 0.6569
Aak1 0.02 0.01 1.716 0.528
Aak1 0.1 0.19 0.561 0.159
Aak1 0.14 0.19 0.756 0.5297
Aak1 0.16 0.18 0.907 0.6726
Aak1 0.21 0 0 0.066
Aak1 0.26 0.27 0.967 0.9657
Aak1 0.54 1.65 0.325 0.001
Aamdc 0.04 0 15.461 0.0875
Aamdc 1.03 1.01 1.019 0.8817
Aamdc 1.27 1.26 1.01 0.9285
Aamdc 7.21 6.94 1.039 0.7611
Aamp 0.06 0.05 1.056 0.9136
Aamp 0.11 0.11 1.044 0.9227
Aamp 0.12 0.13 0.875 0.7584
Aamp 0.22 0.2 1.072 0.7609
File 2
Adar
Ak3
Alox15b
Ampd2
Ampd3
Ankrd17
Apaf1
Aplp1
Arih1
Atg14
Aurkb
Bcl2l14
Bmp2
Brms1l
Cactin
Camta2
Cav1
Ccr5
Chfr
Clock
Cnot1
Crebrf
Crtc3
Csnk2b
Cul3
Cx3cl1
Dnaja3
Dnmt1
Dtl
Ednra
Eef1e1
Esr1
Ezr
Fam162a
Fas
Fbxo30
Fgr
Flcn
Foxp3
Frzb
Fzd6
Gdf3
Hey2
Hnf4
The desired output would be wherever matches in the first column from both file print out all the columns in the first file (including duplicates).
I've tried
awk 'NR==FNR{a[$1]=$2"\t"$3"\t"$4"\t"$5; next} { if($1 in a) { print $0,a[$1] } }' File2 File1 > output
But for some reason I'm getting just few hits. Does anyone know why?
Read second file first, and store 1st column values in array arr as array keys, and then read first file, if 1st column of file1 exists in array arr which was created using file2, then print current row/record from file1.
awk 'FNR==NR{arr[$1];next}$1 in arr' file2 file1
Advantage:
if you use a[$1]=$2"\t"$3"\t"$4"\t"$5; next, if there's any data with same key will be replaced with previous value,
but if you use arr[$1];next, we store just unique key, and $1 in arr takes care of duplicate record even if it exists

Need help in editing a awk code to print minimum within range

Thanks to Origineil for his help in modifying the awk script. The awk script work perfetly right if the interval are whole number but if I use an interval of less than one like 0.2, it gives a wrong output. I have a file "sss" containing this data:
H34 5.0856 5.45563
H39 5.0857 5.45573
H26 6.4822 6.81033
H30 6.4822 6.81033
H32 6.4823 6.81043
H40 6.4824 6.81053
H33 7.6729 7.96531
H27 7.673 7.96541
H31 7.6731 7.96551
H38 7.6731 7.96551
H29 8.5384 8.80485
H28 8.5387 8.80514
H35 8.5387 8.80514
H37 8.5387 8.80514
H41 9.9078 10.1332
H36 9.9087 10.134
If I then run the awk command
awk '!e{e=$2+0.2;} $2-e>0{print "Range " ++i , c " entries. min: " min " max: " max ; e+=0.2; c=0; min=""} {if(!min)min=$2; c++; max=$2} END{print "Range " ++i , c " entries. min: " min " max: " max} ' ss
It gives the output with difference between the values are not up to 0.2 indicated in the script:
Range 1 2 entries. min: 5.0856 max: 5.0857
Range 2 1 entries. min: 6.4822 max: 6.4822
Range 3 1 entries. min: 6.4822 max: 6.4822
Range 4 1 entries. min: 6.4823 max: 6.4823
Range 5 1 entries. min: 6.4824 max: 6.4824
Range 6 1 entries. min: 7.6729 max: 7.6729
Range 7 1 entries. min: 7.673 max: 7.673
Range 8 1 entries. min: 7.6731 max: 7.6731
Range 9 1 entries. min: 7.6731 max: 7.6731
Range 10 1 entries. min: 8.5384 max: 8.5384
Range 11 1 entries. min: 8.5387 max: 8.5387
Range 12 1 entries. min: 8.5387 max: 8.5387
Range 13 1 entries. min: 8.5387 max: 8.5387
Range 14 1 entries. min: 9.9078 max: 9.9078
Range 15 1 entries. min: 9.9087 max: 9.9087
Can somebody help me out on this?
Thanks in advance.
By minimum and maximum I assume you mean the first and last entry seen within the range.
For the provided input I changed 18 to 17 so that not all "max" values were also the upper bounds of the range.
Script:
awk '!e{e=$1+4;} $1-e>0{print "Range " ++i , c " entries. min: " min " max: " max ; e+=4; c=0; min=""} {if(!min)min=$1; c++; max=$1} END{print "Range " ++i , c " entries. min: " min " max: " max} ' file
I introduced two variables min and max to track the entries.
Output:
Range 1 4 entries. min: 2 max: 6
Range 2 3 entries. min: 7 max: 10
Range 3 3 entries. min: 12 max: 14
Range 4 2 entries. min: 16 max: 17
Range 5 1 entries. min: 19 max: 19

SQL linear interpolation based on lookup table

I need to build linear interpolation into an SQL query, using a joined table containing lookup values (more like lookup thresholds, in fact). As I am relatively new to SQL scripting, I have searched for an example code to point me in the right direction, but most of the SQL scripts I came across were for interpolating between dates and timestamps and I couldn't relate these to my situation.
Basically, I have a main data table with many rows of decimal values in a single column, for example:
Main_Value
0.33
0.12
0.56
0.42
0.1
Now, I need to yield interpolated data points for each of the rows above, based on a joined lookup table with 6 rows, containing non-linear threshold values and the associated linear normalized values:
Threshold_Level Normalized_Value
0 0
0.15 20
0.45 40
0.60 60
0.85 80
1 100
So for example, if the value in the Main_Value column is 0.45, the query will lookup its position in (or between) the nearest Threshold_Level, and interpolate this based on the adjacent value in the Normalized_Value column (which would yield a value of 40 in this example).
I really would be grateful for any insight into building a SQL query around this, especially as it has been hard to track down any SQL examples of linear interpolation using a joined table.
It has been pointed out that I could use some sort of rounding, so I have included a more detailed table below. I would like the SQL query to lookup each Main_Value (from the first table above) where it falls between the Threshold_Min and Threshold_Max values in the table below, and return the 'Normalized_%' value:
Threshold_Min Threshold_Max Normalized_%
0.00 0.15 0
0.15 0.18 5
0.18 0.22 10
0.22 0.25 15
0.25 0.28 20
0.28 0.32 25
0.32 0.35 30
0.35 0.38 35
0.38 0.42 40
0.42 0.45 45
0.45 0.60 50
0.60 0.63 55
0.63 0.66 60
0.66 0.68 65
0.68 0.71 70
0.71 0.74 75
0.74 0.77 80
0.77 0.79 85
0.79 0.82 90
0.82 0.85 95
0.85 1.00 100
For example, if the value from the Main_Value table is 0.52, it falls between Threshold_Min 0.45 and Threshold_Max 0.60, so the Normalized_% returned is 50%. The problem is that the Threshold_Min and Max values are not linear. Could anyone point me in the direction of how to script this?
Assuming you want the Main_Value and the nearest (low and not high) or equal Normalized_Value, you can do it like this:
select t1.Main_Value, max(t2.Normalized_Value) as Normalized_Value
from #t1 t1
inner join #t2 t2 on t1.Main_Value >= t2.Threshold_Level
group by t1.Main_Value
Replace #t1 and #t2 by the correct tablenames.