HSQLDB query to replace a null value with a value derived from another record - sql

This is a small excerpt from a much larger table, call it LOG:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 null
3 364 509 7045 7457 null
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 null
9 672 622 5632 null 5966
10 672 622 5632 2635 null
I would like a query that will replace the null in the 'TFAID' column with the value from the 'TFAID' column from the 'FID' column that matches.
Desired output would therefore be:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 7452
3 364 509 7045 7457 7452
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 5452
9 672 622 5632 null 5966
10 672 622 5632 2635 5966
I know that something like
SELECT RN,
EID,
FID,
FRID,
TID,
(COALESCE TFAID, {insert clever code here}) AS TFAID
FROM LOG
is what I need, but I can't for the life of me come up with the clever bit of SQL that will fill in the proper TFAID.

HSQLDB supports SQL features that can be used as alternatives. These features are not supported by some other databases.
CREATE TABLE LOG (RN INT, EID INT, FID INT, FRID INT, TID INT, TFAID INT);
-- using LATERAL
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l , LATERAL (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID) f
-- using scalar subquery
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID)) AS TFAID
FROM LOG l

Here is one approach. This aggregates the log to get the value and then joins the result in:
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l join
(select fid, max(tfaid) as tfaid
from log
group by fid
) f
on l.fid = f.fid;
There may be other approaches that are more efficient. However, HSQL doesn't implement all SQL features.

Related

Count number of unique occurrences of a key value corresponding to each ID column

I have a table in DB2 as below :
Key ID SubID
Abc123 576 10
Abc123 576 12
Abc124 576 13
Abc125 577 14
Abc126 578 15
Abc127 578 16
Abc128 578 17
Want to create a additional count column where it counts number of unique occurrences of key value for each ID and the output should be as below
Key ID SubID Count
Abc123 576 10 2
Abc123 576 12 2
Abc124 576 13 2
Abc125 577 14 1
Abc126 578 15 3
Abc127 578 16 3
Abc128 578 17 3
I tried below
select Key, ID, SubId ,
count(Key) over (partition by Key) as count
from table
Appreciate any help!
You cannot use a window function with the DISTINCT qualifier. You can use a scalar subquery to count the rows you want.
For example:
select *,
(select count(distinct key) from t x where x.id = t.id) as cnt
from t
Result:
KEY ID SUBID CNT
------- ---- ------ ---
Abc123 576 10 2
Abc123 576 12 2
Abc124 576 13 2
Abc125 577 14 1
Abc126 578 15 3
Abc127 578 16 3
Abc128 578 17 3
See running example at db<>fiddle.

Get nearest date column value from another table in SQL Server

I have two tables A and B,
Table A
PstngDate WorkingDayOutput
12/1/2020 221
12/3/2020 327
12/4/2020 509
12/5/2020 418
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382
12/28/2020 608
Table B
PstngDate HolidayOutput isWorkingDay
12/2/2020 20 0
12/6/2020 24 0
12/13/2020 31 0
12/19/2020 82 0
12/22/2020 507 0
12/27/2020 537 0
Expected output:
PstngDate WorkingDayOutput HolidayOutput
12/1/2020 221 20
12/3/2020 327
12/4/2020 509
12/5/2020 418 24
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301 31
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375 589
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382 537
12/28/2020 608
I want to join TableB to TableA with nearest lesser date column. If you see Expectedoutput table, day 18 row of holidayoutput column is taking sum of day19 and day22 of table B.
I want to join TableB to TableA with nearest lesser date column
This sounds like a lateral join:
select a.*, coalesce(b.holidayquantity, 0) as holidayquantity
from a
outer apply (
select top (1) b.*
from b
where b.pstng_date >= a.pstng_date
order by b.pstng_date
) b
You can use self left join as follows:
Select pstng_date, workingDayQuantity,
HolidayQuantity,
workingDayQuantity + HolidayQuantity as total
From
(Select a.*, b.HolidayQuantity,
Row_number() over (partirion by a.psrng_date order by b.pstng_date) ad rn
From tablea a join tableb b On b.pstng_date > a.pstng_date) t
Where rn=1

How to find the row associated with the min/max of a column?

So basically I have some simple SQL code that looks like the following;
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS maxColumn5
,MAX([Column6]) AS minColumn6
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
[tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
What I am trying to do is also find the column either 'Column1', 'Column2', or 'Column3' that corresponds to the MIN([Column6]) and then the column that corresponds to MAX([Column8]).
The output should be exactly the same except there will be an extra 2 column at the end specifying which one the min and max are associated with.
I think there is a simple problem in your question, as Col1,Col2,Col3 that correspond to the max or min, are displayed directly, in other words you have them as you are grouping by Col1,Col2,Col3 & Col4.
As you did not provide some data, I will set some random data to prove my point.
Lets create a memory table similar to yours with 9 columns and fill it with random data for col6-8 with 10 rows for example, you can use the below:-
Declare #data Table(
Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=5
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
we can see the data with the below
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 2 3 4 669 203 278 364 577
1 2 3 4 389 316 290 548 661
1 2 3 4 835 555 942 985 604
1 2 3 4 477 743 580 305 414
1 2 3 4 431 296 471 150 352
1 2 3 4 346 220 573 941 633
1 2 3 4 392 450 652 978 883
1 2 3 4 235 479 751 136 978
1 2 3 4 906 183 141 915 783
1 2 3 4 329 342 682 977 870
5 6 7 8 218 740 41 299 816
5 6 7 8 800 630 674 888 799
5 6 7 8 27 307 446 743 345
5 6 7 8 501 928 824 592 691
5 6 7 8 439 624 260 757 547
5 6 7 8 287 610 287 708 652
5 6 7 8 441 711 433 642 343
5 6 7 8 751 928 237 53 535
5 6 7 8 594 768 708 173 33
5 6 7 8 352 703 943 867 661
now lets see the result of your grouping that you provided without any change
Col1 Col2 Col3 Col4 minCol5 maxCol6 maxCol8 sumCol7 sumCol8 sumCol9
1 2 3 4 235 743 985 5360 6299 6755
5 6 7 8 27 928 888 4853 5722 5422
so if we go back to your question, what is the value of Col1,Col2,Col3 for the maxCol6, well for each maxCol6 you have the values of Col1,Col2,Col3 & even Col4.
so what are the values for Col1,Col2,Col3 for maxCol16 that is 928, well they are 5,6 & 7.
ok, now lets say you want the record key that have that maxCol6, that is easy too, we would add an identity col as ID as below:-
Declare #data Table(
ID int identity(1,1), Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=10
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
;with agg as (
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS minColumn5
,MAX([Column6]) AS maxColumn6
,MAX([Column8]) AS maxColumn8
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
#data [tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
)
--select * from agg order BY [Column1],[Column2],[Column3],[Column4]
select agg.*,maxCol6.ID [MaxCol6Seq],maxCol8.ID [MaxCol8Seq] from agg
inner join #data maxCol6
on agg.Column1=maxCol6.Column1
and agg.Column2=maxCol6.Column2
and agg.Column3=maxCol6.Column3
and agg.Column4=maxCol6.Column4
and agg.maxColumn6=maxCol6.Column6
inner join #data maxCol8
on agg.Column1=maxCol8.Column1
and agg.Column2=maxCol8.Column2
and agg.Column3=maxCol8.Column3
and agg.Column4=maxCol8.Column4
and agg.maxColumn8=maxCol8.Column8
As this is a new run for this set of data , below:-
ID Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 1 2 3 4 201 848 993 50 304
3 1 2 3 4 497 207 644 399 104
5 1 2 3 4 445 321 822 151 185
7 1 2 3 4 611 402 620 61 543
9 1 2 3 4 460 409 182 915 211
11 1 2 3 4 886 804 180 213 282
13 1 2 3 4 614 709 932 806 162
15 1 2 3 4 795 752 110 474 463
17 1 2 3 4 737 545 77 648 727
19 1 2 3 4 788 862 266 464 851
20 5 6 7 8 218 561 943 572 54
18 5 6 7 8 741 621 610 214 536
16 5 6 7 8 579 248 374 693 761
14 5 6 7 8 866 415 198 528 657
12 5 6 7 8 905 947 500 50 387
10 5 6 7 8 492 860 948 299 220
8 5 6 7 8 861 328 727 40 327
6 5 6 7 8 435 534 707 769 777
4 5 6 7 8 587 68 45 184 614
2 5 6 7 8 189 24 289 121 772
The result is as below:-
C1 C2 C3 C4 minC5 maxC6 maxC8 sumC7 sumC8 sumC9 MaxCol6Seq MaxCol8Seq
1 2 3 4 201 862 915 4826 4181 3832 19 9
5 6 7 8 189 947 769 5341 3470 5105 12 6
Hope this helps.
If you just want a flag on each row specifying whether the value is the overall maximum or minimum, you can use window functions and CASE:
SELECT [Column1], [Column2], [Column3], [Column4],
MAX([Column5]) AS maxColumn5,
MIN([Column6]) AS minColumn6,
SUM([Column7]) AS sumColumn7,
SUM([Column8]) AS sumColumn8,
SUM([Column9]) AS sumColumn9,
(CASE WHEN MIN([Column6]) = MIN(MIN([Column6])) OVER () THEN 1 ELSE 0 END) as is_min_column6,
(CASE WHEN MAX([Column7]) = MAX(MAX([Column7])) OVER () THEN 1 ELSE 0 END) as is_max_column7
FROM [tableName]
GROUP BY [Column1], [Column2], [Column3], [Column4]

SQL JOIN with 2 aggregates returning incorrect results

I am trying to join 3 different tables to get how many Home Runs a player has in his career along with how many Awards they have recieved. However, I'm getting incorrect results:
Peoples
PlayerId
Battings
PlayerId, HomeRuns
AwardsPlayers
PlayerId, AwardName
Current Attempt
SELECT TOP 25 Peoples.PlayerId, SUM(Battings.HomeRuns) as HomeRuns, COUNT(AwardsPlayers.PlayerId)
FROM Peoples
JOIN Battings ON Battings.PlayerId = Peoples.PlayerId
JOIN AwardsPlayers ON AwardsPlayers.PlayerId = Battings.PlayerId
GROUP BY Peoples.PlayerId
ORDER BY SUM(HomeRuns) desc
Result
PlayerID HomeRuns AwardCount
bondsba01 35814 1034
ruthba01 23562 726
rodrial01 21576 682
mayswi01 21120 736
willite01 20319 741
griffke02 18270 667
schmimi01 18084 594
musiast01 16150 748
pujolal01 14559 414
dimagjo01 12996 468
ripkeca01 12499 609
gehrilo01 12325 425
aaronha01 12080 368
foxxji01 11748 462
ramirma02 10545 399
benchjo01 10114 442
sosasa01 9744 304
ortizda01 9738 360
piazzmi01 9394 396
winfida01 9300 460
rodriiv01 9019 667
robinfr02 8790 330
dawsoan01 8760 420
robinbr01 8576 736
hornsro01 8127 648
I am pretty confident it's my second join Do I need to do some sort of subquery or should this work? Barry Bonds definitely does not have 35,814 Home Runs nor does he have 1,034 Awards
If I just do a single join, I get the correct output:
SELECT TOP 25 Peoples.PlayerId, SUM(Battings.HomeRuns) as HomeRuns
FROM Peoples
JOIN Battings ON Battings.PlayerId = Peoples.PlayerId
GROUP BY Peoples.PlayerId
ORDER BY SUM(HomeRuns) desc
bondsba01 762
aaronha01 755
ruthba01 714
rodrial01 696
mayswi01 660
pujolal01 633
griffke02 630
thomeji01 612
sosasa01 609
robinfr02 586
mcgwima01 583
killeha01 573
palmera01 569
jacksre01 563
ramirma02 555
schmimi01 548
ortizda01 541
mantlmi01 536
foxxji01 534
mccovwi01 521
thomafr04 521
willite01 521
bankser01 512
matheed01 512
ottme01 511
What am I doing wrong? I'm sure it's how I'm joining my second table (AwardsPlayers)
I think you have two independent dimensions. The best approach is to aggregate before joining:
SELECT TOP 25 p.PlayerId, b.HomeRuns, ap.cnt
FROM Peoples p LEFT JOIN
(SELECT b.PlayerId, SUM(b.HomeRuns) as HomeRuns
FROM Battings b
GROUP BY b.PlayerId
) b
ON b.PlayerId = p.PlayerId LEFT JOIN
(SELECT ap.PlayerId, COUNT(*) as cnt
FROM AwardsPlayers ap
GROUP BY ap.PlayerId
) ap
ON ap.PlayerId = p.PlayerId
ORDER BY b.HomeRuns desc;
Result
bondsba01 762 47
aaronha01 755 16
ruthba01 714 33
rodrial01 696 31
mayswi01 660 32
pujolal01 633 23
griffke02 630 29
thomeji01 612 6
sosasa01 609 16
robinfr02 586 15
mcgwima01 583 9
killeha01 573 8
palmera01 569 8
jacksre01 563 13
ramirma02 555 19
schmimi01 548 33
ortizda01 541 18
mantlmi01 536 15
foxxji01 534 22
mccovwi01 521 10
thomafr04 521 10
willite01 521 39
bankser01 512 10
matheed01 512 4
ottme01 511 11

How to sort a sql result based on values in previous row?

I'm trying to sort a sql data selection by values in columns of the result set. The data looks like:
(This data is not sorted correctly, just an example)
ID projectID testName objectBefore objectAfter
=======================================================================================
13147 280 CDM-710 Generic TP-0000120 TOC~~#~~ -1 13148
1145 280 3.2 Quadrature/Carrier Null 25 Deg C 4940 1146
1146 280 3.2 Quadrature/Carrier Null 0 Deg C 1145 1147
1147 280 3.3 External Frequency Reference 1146 1148
1148 280 3.4 Phase Noise 50 Deg C 1147 1149
1149 280 3.4 Phase Noise 25 Deg C 1148 1150
1150 280 3.4 Phase Noise 0 Deg C 1149 1151
1151 280 3.5 Output Spurious 50 Deg C 1150 1152
1152 280 3.5 Output Spurious 25 Deg C 1151 1153
1153 280 3.5 Output Spurious 0 Deg C 1152 1154
............
18196 280 IP Regression Suite 18195 -1
The order of the data is based on the objectBefore and the objectAfter columns. The first row will always be when objectBefore = -1 and the last row will be when objectAfter = -1. In the above example, the second row would be ID 13148 as that is what row 1 objectAfter is equal to. Is there any way to write a query that would order the data in this manner?
This is actually sorting a linked list:
WITH SortedList (Id, objectBefore , projectID, testName, Level)
AS
(
SELECT Id, objectBefore , projectID, testName, 0 as Level
FROM YourTable
WHERE objectBefore = -1
UNION ALL
SELECT ll.Id, ll.objectBefore , ll.projectID, ll.testName, Level+1 as Level
FROM YourTable ll
INNER JOIN SortedList as s
ON ll.objectBefore = s.Id
)
SELECT Id, objectBefore , projectID, testName
FROM SortedList
ORDER BY Level
You can find more details in this post