How to find the row associated with the min/max of a column? - sql

So basically I have some simple SQL code that looks like the following;
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS maxColumn5
,MAX([Column6]) AS minColumn6
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
[tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
What I am trying to do is also find the column either 'Column1', 'Column2', or 'Column3' that corresponds to the MIN([Column6]) and then the column that corresponds to MAX([Column8]).
The output should be exactly the same except there will be an extra 2 column at the end specifying which one the min and max are associated with.

I think there is a simple problem in your question, as Col1,Col2,Col3 that correspond to the max or min, are displayed directly, in other words you have them as you are grouping by Col1,Col2,Col3 & Col4.
As you did not provide some data, I will set some random data to prove my point.
Lets create a memory table similar to yours with 9 columns and fill it with random data for col6-8 with 10 rows for example, you can use the below:-
Declare #data Table(
Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=5
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
we can see the data with the below
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 2 3 4 669 203 278 364 577
1 2 3 4 389 316 290 548 661
1 2 3 4 835 555 942 985 604
1 2 3 4 477 743 580 305 414
1 2 3 4 431 296 471 150 352
1 2 3 4 346 220 573 941 633
1 2 3 4 392 450 652 978 883
1 2 3 4 235 479 751 136 978
1 2 3 4 906 183 141 915 783
1 2 3 4 329 342 682 977 870
5 6 7 8 218 740 41 299 816
5 6 7 8 800 630 674 888 799
5 6 7 8 27 307 446 743 345
5 6 7 8 501 928 824 592 691
5 6 7 8 439 624 260 757 547
5 6 7 8 287 610 287 708 652
5 6 7 8 441 711 433 642 343
5 6 7 8 751 928 237 53 535
5 6 7 8 594 768 708 173 33
5 6 7 8 352 703 943 867 661
now lets see the result of your grouping that you provided without any change
Col1 Col2 Col3 Col4 minCol5 maxCol6 maxCol8 sumCol7 sumCol8 sumCol9
1 2 3 4 235 743 985 5360 6299 6755
5 6 7 8 27 928 888 4853 5722 5422
so if we go back to your question, what is the value of Col1,Col2,Col3 for the maxCol6, well for each maxCol6 you have the values of Col1,Col2,Col3 & even Col4.
so what are the values for Col1,Col2,Col3 for maxCol16 that is 928, well they are 5,6 & 7.
ok, now lets say you want the record key that have that maxCol6, that is easy too, we would add an identity col as ID as below:-
Declare #data Table(
ID int identity(1,1), Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=10
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
;with agg as (
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS minColumn5
,MAX([Column6]) AS maxColumn6
,MAX([Column8]) AS maxColumn8
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
#data [tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
)
--select * from agg order BY [Column1],[Column2],[Column3],[Column4]
select agg.*,maxCol6.ID [MaxCol6Seq],maxCol8.ID [MaxCol8Seq] from agg
inner join #data maxCol6
on agg.Column1=maxCol6.Column1
and agg.Column2=maxCol6.Column2
and agg.Column3=maxCol6.Column3
and agg.Column4=maxCol6.Column4
and agg.maxColumn6=maxCol6.Column6
inner join #data maxCol8
on agg.Column1=maxCol8.Column1
and agg.Column2=maxCol8.Column2
and agg.Column3=maxCol8.Column3
and agg.Column4=maxCol8.Column4
and agg.maxColumn8=maxCol8.Column8
As this is a new run for this set of data , below:-
ID Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 1 2 3 4 201 848 993 50 304
3 1 2 3 4 497 207 644 399 104
5 1 2 3 4 445 321 822 151 185
7 1 2 3 4 611 402 620 61 543
9 1 2 3 4 460 409 182 915 211
11 1 2 3 4 886 804 180 213 282
13 1 2 3 4 614 709 932 806 162
15 1 2 3 4 795 752 110 474 463
17 1 2 3 4 737 545 77 648 727
19 1 2 3 4 788 862 266 464 851
20 5 6 7 8 218 561 943 572 54
18 5 6 7 8 741 621 610 214 536
16 5 6 7 8 579 248 374 693 761
14 5 6 7 8 866 415 198 528 657
12 5 6 7 8 905 947 500 50 387
10 5 6 7 8 492 860 948 299 220
8 5 6 7 8 861 328 727 40 327
6 5 6 7 8 435 534 707 769 777
4 5 6 7 8 587 68 45 184 614
2 5 6 7 8 189 24 289 121 772
The result is as below:-
C1 C2 C3 C4 minC5 maxC6 maxC8 sumC7 sumC8 sumC9 MaxCol6Seq MaxCol8Seq
1 2 3 4 201 862 915 4826 4181 3832 19 9
5 6 7 8 189 947 769 5341 3470 5105 12 6
Hope this helps.

If you just want a flag on each row specifying whether the value is the overall maximum or minimum, you can use window functions and CASE:
SELECT [Column1], [Column2], [Column3], [Column4],
MAX([Column5]) AS maxColumn5,
MIN([Column6]) AS minColumn6,
SUM([Column7]) AS sumColumn7,
SUM([Column8]) AS sumColumn8,
SUM([Column9]) AS sumColumn9,
(CASE WHEN MIN([Column6]) = MIN(MIN([Column6])) OVER () THEN 1 ELSE 0 END) as is_min_column6,
(CASE WHEN MAX([Column7]) = MAX(MAX([Column7])) OVER () THEN 1 ELSE 0 END) as is_max_column7
FROM [tableName]
GROUP BY [Column1], [Column2], [Column3], [Column4]

Related

Group repeating pattern in pandas Dataframe

so i have a Dataframe that has a repeating Number Series that i want to group like this:
Number Pattern
Value
Desired Group
Value.1
1
723
1
Max of Group
2
400
1
Max of Group
8
235
1
Max of Group
5
387
2
Max of Group
7
911
2
Max of Group
3
365
3
Max of Group
4
270
3
Max of Group
5
194
3
Max of Group
7
452
3
Max of Group
100
716
4
Max of Group
104
69
4
Max of Group
2
846
5
Max of Group
3
474
5
Max of Group
4
524
5
Max of Group
So essentially the number pattern is always monotonly increasing.
Any Ideas?
You can compare Number Pattern by 1 with cumulative sum by Series.cumsum and then is used GroupBy.transform with max:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
For monotically increasing use:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()

How to remove unwanted values in data when reading csv file

Reading Pina_Indian_Diabities.csv some of the values are strings, something like this
+AC0-5.4128147485
734 2
735 4
736 0
737 8
738 +AC0-5.4128147485
739 1
740 NaN
741 3
742 1
743 9
744 13
745 12
746 1
747 1
like in row 738, there re such values in other rows and columns as well.
How can I drop them?

Getting more data while converting data int to float and doing division and Multiplying with int?

I have three columns as shown in below tableA
Student Day Shifts
129 11 4
91 9 6
166 19 8
164 26 12
146 11 6
147 16 8
201 8 3
164 4 2
186 8 6
165 7 4
171 10 4
104 5 4
1834 134 67
I am writing a tvf to calculate Value of Points generated for Students as below
ALTER function Statagic(
#StartDate date
)
RETURNS TABLE
AS
RETURN
(
with src as
( select
Division=case when Shifts=0 then 0 else cast(Day as float)/cast(Shifts as float) end,*
from TableA
)
,tgt as
(select *,Points=Student*Division from src
)
select * from tgt)
When i execute above tvf(select * from Statagic('3/16/2014'))
My output is below
129 11 4 2.75 354.75
91 9 6 1.5 136.5
166 19 8 2.375 394.25
164 26 12 2.16666666666667 355.333333333333
146 11 6 1.83333333333333 267.666666666667
147 16 8 2 294
201 8 3 2.66666666666667 536
164 4 2 2 328
186 8 6 1.33333333333333 248
165 7 4 1.75 288.75
171 10 4 2.5 427.5
104 5 4 1.25 130
1834 134 67 2 3668
Note :
If you see the last row for three columns in the table is the total of rest column.So when you see the last row in the Output of TVF for last two columns when i am adding i am not getting same data i am getting more.
Guys please help me i am struggling to fix this bug i tried in all ways but i am unable to fix it.
select 354.75+136.5+394.25+355.333333333333+267.666666666667+294+536+328+248+288.75+427.5+130=3760.750000000000
3668 is not euql to 3760.75(I am getting more 100 value)

HSQLDB query to replace a null value with a value derived from another record

This is a small excerpt from a much larger table, call it LOG:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 null
3 364 509 7045 7457 null
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 null
9 672 622 5632 null 5966
10 672 622 5632 2635 null
I would like a query that will replace the null in the 'TFAID' column with the value from the 'TFAID' column from the 'FID' column that matches.
Desired output would therefore be:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 7452
3 364 509 7045 7457 7452
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 5452
9 672 622 5632 null 5966
10 672 622 5632 2635 5966
I know that something like
SELECT RN,
EID,
FID,
FRID,
TID,
(COALESCE TFAID, {insert clever code here}) AS TFAID
FROM LOG
is what I need, but I can't for the life of me come up with the clever bit of SQL that will fill in the proper TFAID.
HSQLDB supports SQL features that can be used as alternatives. These features are not supported by some other databases.
CREATE TABLE LOG (RN INT, EID INT, FID INT, FRID INT, TID INT, TFAID INT);
-- using LATERAL
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l , LATERAL (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID) f
-- using scalar subquery
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID)) AS TFAID
FROM LOG l
Here is one approach. This aggregates the log to get the value and then joins the result in:
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l join
(select fid, max(tfaid) as tfaid
from log
group by fid
) f
on l.fid = f.fid;
There may be other approaches that are more efficient. However, HSQL doesn't implement all SQL features.

SQL Query: How to pull counts of two coulmns from respective tables

Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0
If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3