Fetching Millions Records going too slow - sql

I'm trying to copy 60 Million Records in other Table with the Clustered Index with Fetching. but after 20 Million Records it's going too Slow. I dont no what should i do. can anybody help me please. hier is my Time counting.
1000000 Million Min: 1
2000000 Million Min: 0
3000000 Million Min: 0
4000000 Million Min: 2
5000000 Million Min: 2
6000000 Million Min: 1
7000000 Million Min: 0
8000000 Million Min: 1
9000000 Million Min: 0
10000000 Million Min: 1
11000000 Million Min: 0
12000000 Million Min: 1
13000000 Million Min: 1
14000000 Million Min: 0
15000000 Million Min: 1
16000000 Million Min: 1
17000000 Million Min: 1
18000000 Million Min: 0
19000000 Million Min: 1
20000000 Million Min: 3
21000000 Million Min: 3
22000000 Million Min: 4
23000000 Million Min: 5
24000000 Million Min: 4
25000000 Million Min: 4
26000000 Million Min: 4
27000000 Million Min: 4
28000000 Million Min: 5
29000000 Million Min: 5
30000000 Million Min: 5
31000000 Million Min: 6
32000000 Million Min: 7
33000000 Million Min: 7
34000000 Million Min: 8
35000000 Million Min: 8
36000000 Million Min: 9
37000000 Million Min: 8
38000000 Million Min: 10
39000000 Million Min: 10
40000000 Million Min: 11
41000000 Million Min: 11
42000000 Million Min: 11
43000000 Million Min: 12
44000000 Million Min: 11
45000000 Million Min: 12
46000000 Million Min: 12
47000000 Million Min: 14
48000000 Million Min: 13
49000000 Million Min: 13
50000000 Million Min: 14
51000000 Million Min: 15
52000000 Million Min: 14
53000000 Million Min: 16
54000000 Million Min: 18
55000000 Million Min: 18
56000000 Million Min: 20
57000000 Million Min: 19
58000000 Million Min: 21
59000000 Million Min: 19
declare
#RecNo Int
, #RecCount Int
, #RecordST nvarchar(max)
, #str_date datetime
, #end_date datetime;
Set #RecNo = 0
select #RecCount = 1000000
While 1 = 1
Begin
set #str_date = getdate();
Insert Into dbo.test2(
ap_id
,lipt_id
,li_cntr
)
select
ap_id
,lipt_id
,li_cntr
from dbo.test
order by ap_id, lipt_id, li_cntr
offset #RecNo rows
fetch next #RecCount rows only;
if ##ROWCOUNT = 0
break
Set #RecNo += 1000000;
set #end_date = GETDATE() ;
set #RecordST = cast(#RecNo as nvarchar(max)) + ' Million Min:'+cast(DATEDIFF(MINUTE,#str_date,#end_date) as nvarchar(max))
RAISERROR(#RecordST,0,0) WITH NOWAIT
end

First of all, you need drop all constraints like unique, PK and etc. It's the bottleneck for each insert in an existing table.
Secondly, If you insert into the table much more records than it has now then you may increase performance using SELECT INTO statement instead INSERT. But remember that SELECT INTO creates a new table so you would need to think how to append records which were there before.
Last but not the least, you can use loops and insert 1M record by batch.

Related

Sorting Pandas data frame with groupby and conditions

I'm trying to sort a data frame based on groups meeting conditions.
The I'm getting a syntax error for the way I'm sorting the groups.
And I'm losing the initial order of the data frame before attempting the above.
This is the order of sorting that I'm trying to achieve:
1) Sort on First and Test columns.
2) Test==1 groups, sort on Secondary then by Final column.
---Test==0 groups, sort on Final column only.
import pandas as pd
df=pd.DataFrame({"First":[100,100,100,100,100,100,200,200,200,200,200],"Test":[1,1,1,0,0,0,0,1,1,1,0],"Secondary":[.1,.1,.1,.2,.2,.3,.3,.3,.3,.4,.4],"Final":[1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9,10.10,11.11]})
def sorter(x):
if x["Test"]==1:
x.sort_values(['Secondary','Final'], inplace=True)
else:
x=x.sort_values('Final', inplace=True)
df=df.sort_values(["First","Test"],ascending=[False, False]).reset_index(drop=True)
df.groupby(['First','Test']).apply(lambda x: sorter(x))
df
Expected result:
First Test Secondary Final
200 1 0.4 10.1
200 1 0.3* 9.9*
200 1 0.3* 8.8*
200 0 0.4 11.11*
200 0 0.3 7.7*
100 1 0.5 2.2
100 1 0.1* 3.3*
100 1 0.1* 1.1*
100 0 0.3 6.6*
100 0 0.2 5.5*
100 0 0.2 4.4*
You can try of sorting in descending order without groupby,
w.r.t sequence you gave, the order of sorting will change.will it work for you
df=pd.DataFrame({"First":[100,100,100,100,100,100,200,200,200,200,200],"Test":[1,1,1,0,0,0,0,1,1,1,0],"Secondary":[.1,.5,.1,.9,.4,.1,.3,.3,.3,.4,.4],"Final":[1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9,10.10,11.11]})
df = df.groupby(['First','Test']).apply(lambda x: x.sort_values(['First','Test','Secondary','Final'],ascending=False) if x.iloc[0]['Test']==1 else x.sort_values(['First','Test','Final'],ascending=False)).reset_index(drop=True)
df.sort_values(['First','Test'],ascending=[True,False])
Out:
Final First Secondary Test
3 2.20 100 0.5 1
4 3.30 100 0.1 1
5 1.10 100 0.1 1
0 6.60 100 0.1 0
1 5.50 100 0.4 0
2 4.40 100 0.9 0
8 10.10 200 0.4 1
9 9.90 200 0.3 1
10 8.80 200 0.3 1
6 11.11 200 0.4 0
7 7.70 200 0.3 0
The trick was to sort subsets separately and replace the values in the original df.
This came up in other solutions to pandas sorting problems.
import pandas as pd
df=pd.DataFrame({"First":[100,100,100,100,100,100,200,200,200,200,200],"Test":[1,1,1,0,0,0,0,1,1,1,0],"Secondary":[.1,.5,.1,.9,.4,.1,.3,.3,.3,.4,.4],"Final":[1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9,10.10,11.11]})
df.sort_values(['First','Test','Secondary','Final'],ascending=False, inplace=True)
index_subset=df[df["Test"]==0].index
sorted_subset=df[df["Test"]==0].sort_values(['First','Final'],ascending=False)
df.loc[index_subset,:]=sorted_subset.values
print(df)

High Redis latency in AWS (ElastiCache)

I'm trying to determine the cause of some high latency I'm seeing on my ElastiCache Redis node (cache.m3.medium). I gathered some data using the redis-cli latency test, running it from an EC2 instance in the same region/availability-zone as the ElastiCache node.
I see that the latency is quite good on average (~.5ms), but that there are some pretty high outliers. I don't believe that the outliers are due to network latency, as network ping tests between two EC2 instances don't exhibit these high spikes.
The Redis node is not under any load, and the metrics seem to look fine.
My questions are:
What might be causing the high max latencies?
Are these max latencies expected?
What other steps/tests/tools would you use to further diagnose the issue?
.
user#my-ec2-instance:~/redis-3.2.8$ ./src/redis-cli -h redis-host --latency-history -i 1
min: 0, max: 12, avg: 0.45 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 3, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.29 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.26 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.34 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.34 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.26 (96 samples) -- 1.00 seconds range
min: 0, max: 5, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.31 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.28 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.30 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.35 (96 samples) -- 1.01 seconds range
min: 0, max: 15, avg: 0.52 (95 samples) -- 1.01 seconds range
min: 0, max: 4, avg: 0.48 (94 samples) -- 1.00 seconds range
min: 0, max: 2, avg: 0.54 (94 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.38 (96 samples) -- 1.01 seconds range
min: 0, max: 8, avg: 0.55 (94 samples) -- 1.00 seconds range
I ran tests with several different node types, and found that bigger nodes performed much better. I'm using the cache.m3.xlarge type, which has provided more consistent network latency.

Insert data from another table on specific rows

I successfully inserted data from other table, but data was inserted on other rows. how to specify condition that will let data inserted on the particular row, most of query has conditions from other (old)table.
INSERT INTO machine_types( machine_id)
SELECT DISTINCT machine_id
FROM machine_events
WHERE platform_id = '70ZOvy' AND cpu = 0.25 AND memory = 0.2498
machine_types table has machine_type column, I want the data inserted on machine_type = 1.
machine_types table:
machine_id // has null value now
cpu
memory
platform
machine_type
machine_events:
machine_id
cpu
memory
platform
Question is how I am going to write the SQL query to insert the data into records that have machine_type = 1?
Note. machine_types table has only 10 records, and the new records that is going to be inserted is 126 records. All of these records is going to machine_type = 1.
Machine_types table:
machine_type platform cpu memory machine_id
1 HofLG 0.25 0.2498 NULL
2 HofLG 0.5 0.03085 NULL
3 HofLG 0.5 0.06158 NULL
4 HofLG 0.5 0.1241 NULL
Machine_events table:
time machine_id platform_id cpu memory machine_type
0 5 HofLG 0.25 0.2493 1
0 6 HofLG 0.5 0.03085 1
0 7 HofLG 0.5 0.2493 2
0 10 HofLG 0.5 0.2493 2
Null on first table should changed to get machine_id from second table based on machine_type.
machine_types after machine_id just updated:
machine_type platform cpu memory machine_id
1 HofLG 0.25 0.2498 5
1 HofLG 0.5 0.03085 6
2 HofLG 0.5 0.03085 NULL
3 HofLG 0.5 0.06158 NULL
4 HofLG 0.5 0.1241 NULL

Need help in editing a awk code to print minimum within range

Thanks to Origineil for his help in modifying the awk script. The awk script work perfetly right if the interval are whole number but if I use an interval of less than one like 0.2, it gives a wrong output. I have a file "sss" containing this data:
H34 5.0856 5.45563
H39 5.0857 5.45573
H26 6.4822 6.81033
H30 6.4822 6.81033
H32 6.4823 6.81043
H40 6.4824 6.81053
H33 7.6729 7.96531
H27 7.673 7.96541
H31 7.6731 7.96551
H38 7.6731 7.96551
H29 8.5384 8.80485
H28 8.5387 8.80514
H35 8.5387 8.80514
H37 8.5387 8.80514
H41 9.9078 10.1332
H36 9.9087 10.134
If I then run the awk command
awk '!e{e=$2+0.2;} $2-e>0{print "Range " ++i , c " entries. min: " min " max: " max ; e+=0.2; c=0; min=""} {if(!min)min=$2; c++; max=$2} END{print "Range " ++i , c " entries. min: " min " max: " max} ' ss
It gives the output with difference between the values are not up to 0.2 indicated in the script:
Range 1 2 entries. min: 5.0856 max: 5.0857
Range 2 1 entries. min: 6.4822 max: 6.4822
Range 3 1 entries. min: 6.4822 max: 6.4822
Range 4 1 entries. min: 6.4823 max: 6.4823
Range 5 1 entries. min: 6.4824 max: 6.4824
Range 6 1 entries. min: 7.6729 max: 7.6729
Range 7 1 entries. min: 7.673 max: 7.673
Range 8 1 entries. min: 7.6731 max: 7.6731
Range 9 1 entries. min: 7.6731 max: 7.6731
Range 10 1 entries. min: 8.5384 max: 8.5384
Range 11 1 entries. min: 8.5387 max: 8.5387
Range 12 1 entries. min: 8.5387 max: 8.5387
Range 13 1 entries. min: 8.5387 max: 8.5387
Range 14 1 entries. min: 9.9078 max: 9.9078
Range 15 1 entries. min: 9.9087 max: 9.9087
Can somebody help me out on this?
Thanks in advance.
By minimum and maximum I assume you mean the first and last entry seen within the range.
For the provided input I changed 18 to 17 so that not all "max" values were also the upper bounds of the range.
Script:
awk '!e{e=$1+4;} $1-e>0{print "Range " ++i , c " entries. min: " min " max: " max ; e+=4; c=0; min=""} {if(!min)min=$1; c++; max=$1} END{print "Range " ++i , c " entries. min: " min " max: " max} ' file
I introduced two variables min and max to track the entries.
Output:
Range 1 4 entries. min: 2 max: 6
Range 2 3 entries. min: 7 max: 10
Range 3 3 entries. min: 12 max: 14
Range 4 2 entries. min: 16 max: 17
Range 5 1 entries. min: 19 max: 19

Manipulating SQL query to create data categories that return maximum values

I have a table that looks like this at the moment:
Day Limit Price
1 52 0.3
1 4 70
1 44 200
1 9 0.01
1 0 0.03
1 0 0.03
2 52 0.4
2 10 70
2 44 200
2 5 0.01
2 0 0.55
2 2 50
Is there a way I can use SQL to manipulate the result into a table with different categories for price and selecting the maximum value for the limit respective to its price?
Day 0-10 10-100 100+
1 52 4 44
2 52 10 44
You can use CASE and MAX:
SELECT Day,
MAX(CASE WHEN Price BETWEEN 0 AND 10 THEN Limit ELSE 0 END) as ZeroToTen,
MAX(CASE WHEN Price BETWEEN 10 AND 100 THEN Limit ELSE 0 END) as TenToHundred,
MAX(CASE WHEN Price > 100 THEN Limit ELSE 0 END) as HundredPlus
FROM YourTable
GROUP BY Day
Here is the Fiddle.
BTW -- if you're using MySQL, add ticks around LIMIT since it's a keyword.
Good luck.