Getting more data while converting data int to float and doing division and Multiplying with int? - sql

I have three columns as shown in below tableA
Student Day Shifts
129 11 4
91 9 6
166 19 8
164 26 12
146 11 6
147 16 8
201 8 3
164 4 2
186 8 6
165 7 4
171 10 4
104 5 4
1834 134 67
I am writing a tvf to calculate Value of Points generated for Students as below
ALTER function Statagic(
#StartDate date
)
RETURNS TABLE
AS
RETURN
(
with src as
( select
Division=case when Shifts=0 then 0 else cast(Day as float)/cast(Shifts as float) end,*
from TableA
)
,tgt as
(select *,Points=Student*Division from src
)
select * from tgt)
When i execute above tvf(select * from Statagic('3/16/2014'))
My output is below
129 11 4 2.75 354.75
91 9 6 1.5 136.5
166 19 8 2.375 394.25
164 26 12 2.16666666666667 355.333333333333
146 11 6 1.83333333333333 267.666666666667
147 16 8 2 294
201 8 3 2.66666666666667 536
164 4 2 2 328
186 8 6 1.33333333333333 248
165 7 4 1.75 288.75
171 10 4 2.5 427.5
104 5 4 1.25 130
1834 134 67 2 3668
Note :
If you see the last row for three columns in the table is the total of rest column.So when you see the last row in the Output of TVF for last two columns when i am adding i am not getting same data i am getting more.
Guys please help me i am struggling to fix this bug i tried in all ways but i am unable to fix it.
select 354.75+136.5+394.25+355.333333333333+267.666666666667+294+536+328+248+288.75+427.5+130=3760.750000000000
3668 is not euql to 3760.75(I am getting more 100 value)

Related

Preparing SQL Data for a CDF Plot

i have the following SQL that I would like to use to plot cumulative distribution plot but i can't seem to get the data right.
Sample Data:
token_Length
Frequency
1
6436
2
7489
3
3724
4
2440
5
667
6
396
7
264
8
215
9
117
10
90
11
61
12
29
13
69
15
40
18
45
How do i prepare this data to create a CDF plot in looker?
So that it looks like
token_Length
Frequency
cume_dist
1
6436
0.291459107
2
7489
0.630604112
3
3724
0.799248256
4
2440
0.909745494
5
667
0.939951091
6
396
0.95788425
7
264
0.969839688
8
215
0.979576125
9
117
0.984874558
10
90
0.988950276
11
61
0.991712707
12
29
0.993025994
13
69
0.996150711
15
40
0.997962141
18
45
1
I have tried a measure as follows:
measure: cume_dist {
type: number
sql: cume_dist() over (order by ${token_length} ASC);;
}
This generates SQL as:
SELECT
token_length,
COUNT(*) AS "count",
cume_dist() over (order by (token_length) ASC) AS "cume_dist"
FROM string_facts

Group repeating pattern in pandas Dataframe

so i have a Dataframe that has a repeating Number Series that i want to group like this:
Number Pattern
Value
Desired Group
Value.1
1
723
1
Max of Group
2
400
1
Max of Group
8
235
1
Max of Group
5
387
2
Max of Group
7
911
2
Max of Group
3
365
3
Max of Group
4
270
3
Max of Group
5
194
3
Max of Group
7
452
3
Max of Group
100
716
4
Max of Group
104
69
4
Max of Group
2
846
5
Max of Group
3
474
5
Max of Group
4
524
5
Max of Group
So essentially the number pattern is always monotonly increasing.
Any Ideas?
You can compare Number Pattern by 1 with cumulative sum by Series.cumsum and then is used GroupBy.transform with max:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
For monotically increasing use:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()

How to find the row associated with the min/max of a column?

So basically I have some simple SQL code that looks like the following;
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS maxColumn5
,MAX([Column6]) AS minColumn6
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
[tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
What I am trying to do is also find the column either 'Column1', 'Column2', or 'Column3' that corresponds to the MIN([Column6]) and then the column that corresponds to MAX([Column8]).
The output should be exactly the same except there will be an extra 2 column at the end specifying which one the min and max are associated with.
I think there is a simple problem in your question, as Col1,Col2,Col3 that correspond to the max or min, are displayed directly, in other words you have them as you are grouping by Col1,Col2,Col3 & Col4.
As you did not provide some data, I will set some random data to prove my point.
Lets create a memory table similar to yours with 9 columns and fill it with random data for col6-8 with 10 rows for example, you can use the below:-
Declare #data Table(
Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=5
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
we can see the data with the below
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 2 3 4 669 203 278 364 577
1 2 3 4 389 316 290 548 661
1 2 3 4 835 555 942 985 604
1 2 3 4 477 743 580 305 414
1 2 3 4 431 296 471 150 352
1 2 3 4 346 220 573 941 633
1 2 3 4 392 450 652 978 883
1 2 3 4 235 479 751 136 978
1 2 3 4 906 183 141 915 783
1 2 3 4 329 342 682 977 870
5 6 7 8 218 740 41 299 816
5 6 7 8 800 630 674 888 799
5 6 7 8 27 307 446 743 345
5 6 7 8 501 928 824 592 691
5 6 7 8 439 624 260 757 547
5 6 7 8 287 610 287 708 652
5 6 7 8 441 711 433 642 343
5 6 7 8 751 928 237 53 535
5 6 7 8 594 768 708 173 33
5 6 7 8 352 703 943 867 661
now lets see the result of your grouping that you provided without any change
Col1 Col2 Col3 Col4 minCol5 maxCol6 maxCol8 sumCol7 sumCol8 sumCol9
1 2 3 4 235 743 985 5360 6299 6755
5 6 7 8 27 928 888 4853 5722 5422
so if we go back to your question, what is the value of Col1,Col2,Col3 for the maxCol6, well for each maxCol6 you have the values of Col1,Col2,Col3 & even Col4.
so what are the values for Col1,Col2,Col3 for maxCol16 that is 928, well they are 5,6 & 7.
ok, now lets say you want the record key that have that maxCol6, that is easy too, we would add an identity col as ID as below:-
Declare #data Table(
ID int identity(1,1), Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=10
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
;with agg as (
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS minColumn5
,MAX([Column6]) AS maxColumn6
,MAX([Column8]) AS maxColumn8
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
#data [tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
)
--select * from agg order BY [Column1],[Column2],[Column3],[Column4]
select agg.*,maxCol6.ID [MaxCol6Seq],maxCol8.ID [MaxCol8Seq] from agg
inner join #data maxCol6
on agg.Column1=maxCol6.Column1
and agg.Column2=maxCol6.Column2
and agg.Column3=maxCol6.Column3
and agg.Column4=maxCol6.Column4
and agg.maxColumn6=maxCol6.Column6
inner join #data maxCol8
on agg.Column1=maxCol8.Column1
and agg.Column2=maxCol8.Column2
and agg.Column3=maxCol8.Column3
and agg.Column4=maxCol8.Column4
and agg.maxColumn8=maxCol8.Column8
As this is a new run for this set of data , below:-
ID Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 1 2 3 4 201 848 993 50 304
3 1 2 3 4 497 207 644 399 104
5 1 2 3 4 445 321 822 151 185
7 1 2 3 4 611 402 620 61 543
9 1 2 3 4 460 409 182 915 211
11 1 2 3 4 886 804 180 213 282
13 1 2 3 4 614 709 932 806 162
15 1 2 3 4 795 752 110 474 463
17 1 2 3 4 737 545 77 648 727
19 1 2 3 4 788 862 266 464 851
20 5 6 7 8 218 561 943 572 54
18 5 6 7 8 741 621 610 214 536
16 5 6 7 8 579 248 374 693 761
14 5 6 7 8 866 415 198 528 657
12 5 6 7 8 905 947 500 50 387
10 5 6 7 8 492 860 948 299 220
8 5 6 7 8 861 328 727 40 327
6 5 6 7 8 435 534 707 769 777
4 5 6 7 8 587 68 45 184 614
2 5 6 7 8 189 24 289 121 772
The result is as below:-
C1 C2 C3 C4 minC5 maxC6 maxC8 sumC7 sumC8 sumC9 MaxCol6Seq MaxCol8Seq
1 2 3 4 201 862 915 4826 4181 3832 19 9
5 6 7 8 189 947 769 5341 3470 5105 12 6
Hope this helps.
If you just want a flag on each row specifying whether the value is the overall maximum or minimum, you can use window functions and CASE:
SELECT [Column1], [Column2], [Column3], [Column4],
MAX([Column5]) AS maxColumn5,
MIN([Column6]) AS minColumn6,
SUM([Column7]) AS sumColumn7,
SUM([Column8]) AS sumColumn8,
SUM([Column9]) AS sumColumn9,
(CASE WHEN MIN([Column6]) = MIN(MIN([Column6])) OVER () THEN 1 ELSE 0 END) as is_min_column6,
(CASE WHEN MAX([Column7]) = MAX(MAX([Column7])) OVER () THEN 1 ELSE 0 END) as is_max_column7
FROM [tableName]
GROUP BY [Column1], [Column2], [Column3], [Column4]

Oracle - Group By Creating Duplicate Rows

I have a query that looks like this:
select nvl(trim(a.code), 'Blanks') as Ward, count(b.apcasekey) as UNSP, count(c.apcasekey) as GRAPH,
count(d.apcasekey) as "ANI/PIG",
(count(b.apcasekey) + count(c.apcasekey) + count(d.apcasekey)) as "TOTAL ACTIVE",
count(a.apcasekey) as "TOTAL OPEN" from (etc...)
group by a.code
order by Ward
The reason I have nvl(trim(a.code), 'Blanks') as Ward is that sometimes a.code is a blank string, sometimes it's a null.
The problem is that when I use the Group By statement, I can't use Ward or I get the error
Ward: Invalid Identifier
I can only use a.code so I get 2 rows for 'Blanks', as per below
1 Blanks 7 0 0 7 7
2 Blanks 23 1 1 25 30
3 W01 75 4 0 79 91
4 W02 62 1 0 63 72
5 W03 140 2 0 142 162
6 W04 6 1 0 7 7
7 W05 46 0 1 47 48
8 W06 322 46 1 369 425
9 W07 91 0 1 92 108
10 W08 93 2 0 95 104
11 W09 28 1 0 29 30
12 W10 25 0 0 25 28
What I need, is for the row with 'Blanks' to combined into 1 row. Little help?
Thanks.
You can not use the alias in the GROUP BY, but you can use the expression that builds the value:
GROUP BY nvl(trim(a.code), 'Blanks')

how to update duplicate rows in a column to a new values

I will explain my problem briefly
have duplicate rino like below (actually this rino is the serial number in front end)
chqid rino branchid
----- ---- --------
876 6 2
14 6 2
18 10 2
828 10 2
829 11 2
19 11 2
830 12 2
20 12 2
78 40 2
1092 40 2
1094 41 2
79 41 2
413 43 2
1103 43 2
82 44 2
1104 44 2
1105 45 2
83 45 2
91 46 2
1106 46 2
here in my case I don't want to delete these duplicate rino instead of that I planned to update the rino having max date(column not specified in the above sample actually a date column is there) to the next rino number
what exactly I meant is :
if I sort out the above result according to the max(date) I will get
chqid rino branchid
----- ---- --------
876 6 2
828 10 2
19 11 2
830 12 2
1092 40 2
79 41 2
413 43 2
82 44 2
83 45 2
1106 46 2
(NOTE : total number of duplicate rows are 10 in branchid=2)
the last entered rino in the table for branchid=2 is 245
So I just want to update the 10 rows(Of column rino) with numbers starting from 246 to 255( just added 245+10 like this select lastno+ generate_series(1,10) nos from tab where cola=4 and branchid = 2 and vrid=20;)
Expected Output:
chqid rino branchid
----- ---- --------
876 246 2
828 247 2
19 248 2
830 249 2
1092 250 2
79 251 2
413 252 2
82 253 2
83 254 2
1106 255 2
using postgresql
Finally I found a solution, am using dynamic-sql to solve my issue
do
$$
declare
arow record;
begin
for arow in
select chqid,rino,branchid from (
select chqid,rino::int ,vrid,branchid ,row_number()over (partition by rino::int ) rn
from tab
where vrid =20
and branchid = 2)t
where rn >1
loop
execute format('
update tab
set rino=(select max(rino::int)+1 from gtab19 where acyrid=4 and branchid = 2 and vrid=20)
where chqid=%s
',arow.chqid);
end loop;
end;
$$;