I have a table covid, my table looks something like this:
location | date | new_cases | total_deaths | new_deaths
----------------------------------------------------------------
Afghanistan 2020-04-07 38 7 0
Afghanistan 2020-04-08 30 11 4
Afghanistan 2020-04-09 56 14 3
Afghanistan 2020-04-10 61 15 1
Afghanistan 2020-04-11 37 15 0
Afghanistan 2020-04-12 34 18 3
In this case, I want to get rows location based on max(new_cases),this is my query:
select a.*
from covid a
join (
select location, max(new_cases) highest_case
from covid
group by location
) b
on a.location = b.location
and a.new_cases = b.highest_case
but I found the same location and max(case) values with the different date value, this is the result.
location | date | new_cases | total_deaths | new_deaths
----------------------------------------------------------------
Bhutan 2020-06-08 11 0 0
Bolivia 2020-07-28 2382 2647 64
Bonaire Sint 2020-04-02 2 0 0
Bonaire Sint 2020-07-15 2 0 0
Botswana 2020-07-24 164 1 0
Now, how can I get the values based on min(date), please give me advice for fix this, and the output should be like this:
location | date | new_cases | total_deaths | new_deaths
----------------------------------------------------------------
Bhutan 2020-06-08 11 0 0
Bolivia 2020-07-28 2382 2647 64
Bonaire Sint 2020-04-02 2 0 0
Botswana 2020-07-24 164 1 0
Use distinct on:
select distinct on (location) c.*
from covid c
order by location, new_cases desc;
For the minimum date, use:
order by location, date asc;
You can use window function Max() to get max_cases (according to location) and then numbering rows (to fetch the min date) :
select location,date,new_cases,total_deaths,new_deaths from
(
--get min date with max_cases
select row_number()over(partition by location order by date)n,date,
location,new_cases,total_deaths,new_deaths
from
(
select location,date,max(new_cases)over(partition by
location)max_case,new_cases,total_deaths,new_deaths from covid --get max_case
) X
where new_cases=max_case --fetch only max case
)Y where n=1
Related
I'm trying to join data between three slow change dimension type 2. When I query the result, the sort by date between the dimensions are not as expected.
I have the slow change dimensions below:
Table Subsidiaries
id
name
subsidiary
department
start_date_dep
end_date_dep
last_record_flg
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
1
John Doe
AL
Sales
2013-01-01
2014-05-01
0
1
John Doe
NY
Sales
2014-05-01
1
38
Ivy Johnson
NY
Sales
2020-06-01
1
Table Functions
id
function
start_date_fun
end_date_fun
last_record_flg
1
operator
2005-10-01
2009-08-01
0
1
leader
2009-08-01
2011-10-01
0
1
manager
2011-10-01
2017-07-01
0
1
director
2017-07-01
1
38
operator
2020-06-01
1
Table Graduations
id
university_graduation
conclusion_date
last_record_flg
1
bachelor
15/12/2005
0
1
master
15/12/2008
1
38
bachelor
15/12/2014
1
The desired result is:
id
name
subsidiary
department
start_date_dep
end_date_dep
last_record_flg
function
start_date_fun
end_date_fun
last_record_flg
university_graduation
conclusion_date
last_record_flg
max_date
seq
start
end
last_record_flg
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
operator
2005-10-01
2009-08-01
0
bachelor
2005-12-15
0
2005-12-15
1
2005-10-01
2008-12-15
0
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
operator
2005-10-01
2009-08-01
0
master
2008-12-15
1
2008-12-15
1
2008-12-15
2009-08-01
0
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
leader
2009-08-01
2011-10-01
0
master
2008-12-15
1
2009-08-01
1
2009-08-01
2011-10-01
0
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
manager
2011-10-01
2017-07-01
0
master
2008-12-15
1
2011-10-01
1
2011-10-01
2013-01-01
0
1
John Doe
AL
Sales
2013-01-01
2014-05-01
0
manager
2011-10-01
2017-07-01
0
master
2008-12-15
1
2013-01-01
1
2013-01-01
2014-05-01
0
1
John Doe
NY
Sales
2014-05-01
NULL
1
manager
2011-10-01
2017-07-01
0
master
2008-12-15
1
2014-05-01
1
2014-05-01
2017-07-01
0
1
John Doe
NY
Sales
2014-05-01
NULL
1
director
2017-07-01
NULL
1
master
2008-12-15
1
2017-07-01
1
2017-07-01
NULL
1
38
Ivy Johnson
NY
Sales
2020-06-01
NULL
1
operator
2020-06-01
NULL
1
bachelor
2014-12-15
1
2020-06-01
1
2020-06-01
NULL
1
I tried with CROSS APPLY, but is returning only one line for each id. I'm trying with CASE WHEN but the query output is not exactly equal the desired result. In my return the column 'FUNCTION' and 'START_DATE_FUN' not follow the sequence (sort) presented in the desired result, the same occur for columns 'UNIVERSITY_GRADUATION' and 'CONCLUSION_DATE'.
The query:
select
*
from(
select
tb.*
,row_number() over(partition by tb.id,tb.max_date order by tb.max_date) as seq
,tb.max_date as [start]
,lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) as [end]
,case when lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) is null then 1 else 0 end as last_record_flg
from(
select
sb.id
,sb.[name]
,sb.subsidiary
,sb.department
,sb.start_date_dep
,sb.end_date_dep
,sb.last_record_flg as lr_sb
,fc.[function]
,fc.start_date_fun
,fc.end_date_fun
,fc.last_record_flg as lr_fc
,gd.university_graduation
,gd.end_date_grad
,gd.last_record_flg as lr_gd
,case
when sb.start_date_dep >= fc.start_date_fun and sb.start_date_dep >= gd.end_date_grad then sb.start_date_dep
when fc.start_date_fun >= sb.start_date_dep and fc.start_date_fun >= gd.end_date_grad then fc.start_date_fun
else gd.end_date_grad
end as max_date
from
#Subsidiaries as sb
left outer join #Functions as fc
on sb.id = fc.id
left outer join #Graduations as gd
on sb.id = gd.id
) as tb
) as tb2
where
tb2.seq = 1
Below the DDL:
create table #Subsidiaries (
id int
,[name] varchar(15)
,subsidiary varchar(2)
,department varchar(15)
,start_date_dep date
,end_date_dep date
,last_record_flg bit
)
go
insert into #Subsidiaries values
(1,'John Doe','AL','Engineering','2005-10-01','2013-01-01',0),
(1,'John Doe','AL','Sales','2013-01-01','2014-05-01',0),
(1,'John Doe','NY','Sales','2014-05-01',null,1),
(38,'Ivy Johnson','NY','Sales','2020-06-01',null,1)
go
create table #Functions (
id int
,[function] varchar(15)
,start_date_fun date
,end_date_fun date
,last_record_flg bit
)
go
insert into #Functions values
(1,'operator','2005-10-01','2009-08-01',0),
(1,'leader','2009-08-01','2011-10-01',0),
(1,'manager','2011-10-01','2017-07-01',0),
(1,'director','2017-07-01',null,1),
(38,'operator','2020-06-01',null,1)
go
create table #Graduations (
id int
,university_graduation varchar(15)
,end_date_grad date
,last_record_flg bit
)
go
insert into #Graduations values
(1,'bachelor','2005-12-15',0),
(1,'master','2008-12-15',1),
(38,'bachelor','2014-12-15',1)
go
Case when someone find the same difficult to join two or more SCD type 2, I could find a reference in this link https://sqlsunday.com/2014/11/30/joining-two-scd2-tables/ (SQL Sunday) that help me to build the query and use the range intervals in the join condition to return result as desired.
I have some data like this ↓
T_ID T_PERIOD T_COUNT T_SUM T_UPDATE_COUNT
1 2013-2014 3436 20118043 0
2 2014-2015 4298 27101356 0
3 2015-2016 5577 38844640 0
4 2016-2017 5764 40701339 0
5 2017-2018 6997 54316874 0
6 2018-2019 13315 151012820 0
7 2019-2020 13933 162731044 0
8 2018-2019 13300 150000000 1
9 2013-2014 3600 21000000 1
10 2018-2019 13500 155000000 2
This table only has insert,during insert T_UPDATE_COUNT = max(T_UPDATE_COUNT) + 1 ;
I want the data to look like this ↓
T_ID T_PERIOD T_COUNT T_SUM T_UPDATE_COUNT
9 2013-2014 3600 21000000 1
2 2014-2015 4298 27101356 0
3 2015-2016 5577 38844640 0
4 2016-2017 5764 40701339 0
5 2017-2018 6997 54316874 0
10 2018-2019 13500 155000000 2
7 2019-2020 13933 162731044 0
How do i write SQL statement ?
Assuming that you need to get all the rows with the maximum value of T_UPDATE_COUNT for each T_PERIOD , you may try:
select T_ID, T_PERIOD, T_COUNT, T_SUM, T_UPDATE_COUNT
from
(
select T_ID, T_PERIOD, T_COUNT, T_SUM, T_UPDATE_COUNT,
row_number() over (partition by T_PERIOD order by T_UPDATE_COUNT desc) as RN
from yourData x
)
where RN = 1
There are different ways to do this; I believe this one is clear enough: the inner query is used to compute the row number in the set of all the rows with the same value of T_PERIOD (partition by T_PERIOD) and name it RN.
The external one simply filters this result to only get the first rows (RN = 1) of each group.
Good day community.
I'm having a hard time trying to figure out a way to achieve the results I try to get. As im not very skilled with SQL queries, I start to lose my mind. What I'm trying to do is to find the highest and lowest grade on a particular test, but I also wish to get the ID or the row number (they are matching) of the rows where the MAX() and MIN() were found.
The table "Results" looks like this:
ResultID|Test_UK|Test_US|TestUK_Scr|TestUS_Scr|TestTakenOn
1 1 3 85 14 2018-11-22 00:00:00.000
2 3 1 41 94 2018-11-23 00:00:00.000
3 2 4 71 54 2018-11-24 00:00:00.000
4 4 2 51 52 2018-12-25 00:00:00.000
5 6 3 74 69 2018-12-01 00:00:00.000
6 3 6 83 57 2018-12-02 00:00:00.000
7 7 4 91 98 2018-12-03 00:00:00.000
8 4 7 88 22 2018-12-04 00:00:00.000
9 5 8 41 76 2018-12-08 00:00:00.000
10 8 5 37 64 2018-12-09 00:00:00.000
The results I get when I run my query...
TestID|TopScore|LowScore|LastDateTestTaken
1 94 85 2018-11-23 00:00:00.000
2 71 52 2018-11-25 00:00:00.000
3 83 14 2018-12-02 00:00:00.000
4 98 51 2018-12-04 00:00:00.000
5 64 41 2018-12-09 00:00:00.000
6 74 57 2018-12-02 00:00:00.000
7 91 22 2018-12-04 00:00:00.000
8 76 37 2018-12-09 00:00:00.000
This is the queries I'm working on.
This query returns the results mentioned above
WITH
-- Combine the results of UK and US tests
Combined_Results_Both_Tests AS(
select ResultID as resultID, Test_UK as TestID, Test_UK_Scr as TestScore, TestTakenOn as TestDate from Results
union all
select ResultID as resultID, Test_US as TestID, Test_US_Scr as TestScore, TestTakenOn as TestDate from Results),
--Gets TOP and WORST results of the tests, LastDateTaken (Needs to add ResultID!)
Get_Best_and_Worst_Results_And_LastTestDate AS(
SELECT TestID ,max(TestScore) AS TopScore ,min(TestScore) AS LowScore ,max(TestDate) AS LastDateTestTaken
FROM Combined_Results_Both_Tests
GROUP BY TestID)
--Final query execution
SELECT * FROM Get_Best_and_Worst_Results_And_LastTestDate
I've tried to achieve my desired results with something like this, which doesn't work and is also very inefficient. What I mean that it doesn't work, it is filled with dublicates, whenever the match is found on US and UK tests.
--Gets ReslutID of Min and Max values
Get_ResultID_Of_Results AS(
SELECT * FROM Get_Best_and_Worst_Results_And_LastTestDate A
CROSS APPLY
(SELECT ResultID FROM Results res
WHERE (A.TestID = res.Test_UK AND A.TopScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_US AND A.TopScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_UK AND A.LowScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_US AND A.LowScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_UK AND A.TopScore = res.Test_US_Scr) OR
(A.TestID = res.Test_US AND A.TopScore = res.Test_US_Scr) OR
(A.TestID = res.Test_UK AND A.LowScore = res.Test_US_Scr) OR
(A.TestID = res.Test_US AND A.LowScore = res.Test_US_Scr)) D)
SELECT * FROM Get_ResultID_Of_Results
This is the results I'm trying to achieve (extra columns that would state where Max value and Min value was found) that would state the ResultID from Results table. Also, the row numbers match the ResultIDs in the table.
TestID|TopScore|LowScore|LastDateTestTaken |MaxValueLocID|MinValueLocID|
1 94 85 2018-11-23 00:00:00.000 2 1
2 71 52 2018-11-25 00:00:00.000 3 4
3 83 14 2018-12-02 00:00:00.000 6 1
4 98 51 2018-12-04 00:00:00.000 7 4
5 64 41 2018-12-09 00:00:00.000 10 9
6 74 57 2018-12-02 00:00:00.000 5 6
7 91 22 2018-12-04 00:00:00.000 7 8
8 76 37 2018-12-09 00:00:00.000 9 10
Asking for any help with the solution, theoretical or even practical. Thank you!
If I follow correctly, you want to unpivot the data and aggregate:
select v.testid, max(v.score), min(v.score) max(v.TestTakenOn)
from results r cross apply
(values (Test_UK, TestUK_Scr, TestTakenOn),
(Test_US, TestUS_Scr, TestTakenOn)
) v(testid, score, TestTakenOn)
group by v.testid;
Then you can modify this using window functions:
select v.testid, max(v.score), min(v.score) max(v.TestTakenOn),
max(case when seqnum_desc = 1 then resultid end) as resultid_max,
max(case when seqnum_asc = 1 then resultid end) as resultid_min
from (select r.resultid, v.*,
row_number() over (partition by v.testid order by v.score asc) as seqnum_asc,
row_number() over (partition by v.testid order by v.score desc) as seqnum_desc
from results r cross apply
(values (Test_UK, TestUK_Scr, TestTakenOn),
(Test_US, TestUS_Scr, TestTakenOn)
) v(testid, score, TestTakenOn)
) v
group by v.testid;
with allScores (TestId, Score, TestTakenOn, valueLoc) as
(
select [Test_UK], [TestUK_Scr],[TestTakenOn], ResultId from scores
union all
select [Test_US], [TestUS_Scr],[TestTakenOn], ResultId from scores
),
maxMin (TestId, MaxScore, MinScore, LastTestDate) as (
select TestId, Max(score), Min(score), Max(TestTakenOn)
from allScores
group by TestId
)
select mm.*, a1.valueLoc as MaxValueLoc, a2.ValueLoc as MinValueLoc
from maxMin mm
inner join allScores a1
on mm.TestId = a1.TestId and mm.MaxScore = a1.score
inner join allScores a2
on mm.TestId = a2.TestId and mm.MinScore = a2.score;
DBFiddle demo
I have a table called BOOK (memberId, ISBN, dateBorrowed)
For example:
isbn | memberId | borrowed
-------+---------------+-------------+----
9998-01-101-9 | |
9998-01-101-9 | |
9998-01-101-9 | |
9998-01-101-9 | 1000 | 2018-10-02
9998-01-101-9 | 1010 | 2018-09-04
9998-01-101-9 | 1021 | 2018-09-14
9998-01-101-9 | |
9998-01-101-9 | 1001 | 2018-10-02
I have to SELECT all dates, where total count of borrowed books per day is larger, than per all days in average. How to do it?
I have selected date and how many times was it picked by:
SELECT borrowed, COUNT(*) AS dates
FROM BOOK
WHERE borrowed IS NOT NULL
GROUP BY borrowed;
Another query which was written by me is to count average:
SELECT SUM(dates)/COUNT(borrowed) AS average
FROM (
SELECT borrowed, COUNT(*) AS dates
FROM BOOKS
WHERE borrowed IS NOT NULL GROUP BY borrowed
) AS average;
Now, how to concatenate these two sequels into one clear sequel?
Using window functions can help you much: https://www.postgresql.org/docs/current/static/tutorial-window.html
demo: db<>fiddle
My test data:
isbn borrowed
9998-01-101-1 2018-08-01
9998-01-101-2 2018-08-01
9998-01-101-3 2018-08-01
9998-01-101-4 2018-08-01
9998-01-101-5 2018-08-01
9998-01-101-1 2018-08-02
9998-01-101-2 2018-08-02
9998-01-101-3 2018-08-02
9998-01-101-4 2018-08-03
9998-01-101-5 2018-08-03
9998-01-101-1 2018-08-04
9998-01-101-2 2018-08-04
9998-01-101-3 2018-08-04
9998-01-101-4 2018-08-04
9998-01-101-5 2018-08-05
9998-01-101-1 2018-08-05
The query:
SELECT
*
FROM (
SELECT
*,
borrowed_all_time::decimal / COUNT(*) OVER () as avg_borrows_per_day -- D
FROM (
SELECT DISTINCT -- C
borrowed,
COUNT(*) OVER (PARTITION BY borrowed) as borrowed_on_day, -- A
COUNT(*) OVER () as borrowed_all_time -- B
FROM book
)s
)s
WHERE borrowed_on_day > avg_borrows_per_day -- E
A: This window function counts the rows per borrowed date
B: This window function counts all rows which equals to count borrows of all time.
The result so far looks like this:
borrowed borrowed_on_day borrowed_all_time
2018-08-01 5 16
2018-08-01 5 16
2018-08-01 5 16
2018-08-01 5 16
2018-08-01 5 16
2018-08-02 3 16
2018-08-02 3 16
2018-08-02 3 16
2018-08-03 2 16
2018-08-03 2 16
2018-08-04 4 16
2018-08-04 4 16
2018-08-04 4 16
2018-08-04 4 16
2018-08-05 2 16
2018-08-05 2 16
C: Because we need no duplicates we eliminate them with a DISTINCT
D: Counting all rows after eliminating all tied rows gives the count of the distinct days. This dividing borrows of all time gives the average borrows per day. The decimal cast is neccessary. It converts the integer division (16 / 5 == 3) into a float division (16 / 5 == 3.2)
E: Now we can filter borrows per current day > average borrows per day.
The result:
borrowed
2018-08-01
2018-08-04
This looks a bit like HW, so windowed functions might be out of bounds.
SELECT *
FROM (
SELECT BOOK.*,
CAST(
COUNT(1) OVER
( PARTITION BY borrowed
) AS FLOAT) cntThatDay,
CAST(
SUM(1) OVER() AS FLOAT)/ CAST(
(SELECT COUNT(DISTINCT borrowed)
FROM BOOKS
) AS FLOAT) AS totalAverage
FROM BOOK
WHERE borrowed IS NOT NULL
) TMP
WHERE cntThatDay >= totalAverage;
I have the following query which shows the first 3 columns:
select
'Position Date' = todaypositiondate,
'Realized LtD SEK' = round(sum(realizedccy * spotsek), 0),
'Delta Realized SEK' = round(sum(realizedccy * spotsek) -
(SELECT sum(realizedccy*spotsek)
FROM t1
WHERE todaypositiondate = a.todaypositiondate - 1
GROUP BY todaypositiondate), 0)
FROM
t1 AS a
GROUP BY
todaypositiondate
ORDER BY
todaypositiondate DESC
Table:
Date | Realized | Delta | 5 day avg delta
-------------------------------------------------------------------
2016-09-08 | 696 981 323 | 90 526 | 336 611
2016-09-07 | 696 890 797 | 833 731 | 335 232
2016-09-06 | 696 057 066 | 85 576 | 84 467
2016-09-05 | 695 971 490 | 86 390 | 83 086
2016-09-04 | 695 885 100 | 81 434 | 80 849
2016-09-03 | 695 803 666 | 81 434 | 78 806
2016-09-02 | 695 722 231 | 79 679 | 74 500
2016-09-01 | 695 642 553 | 75 305 |
2016-08-31 | 695 567 248 | 68 515 |
How do I create the 5d average of delta realized?
Based on delta I tried the following but it did not work:
select
todaypositiondate,
'30d avg delta' = (select sum(realizedccy * spotsek)
from T1
where todaypositiondate between a.todaypositiondate and a.todaypositiondate -5
group by todaypositiondate)
from
T1 as a
group by
todaypositiondate
order by
todaypositiondate desc
Do not use single quotes for column names. Only use single quotes for string and date literals.
I would write this as:
with t as (
select todaypositiondate as PositionDate,
round(sum(realizedccy * spotsek), 0) as RealizedSEK,
from t1 a
group by todaypositiondate
)
select a.*,
(a.RealizedSEK - a_prev.RealizedSEK) as diff_1,
(a.RealizedSEK - a_prev5.RealizedSEK)/5 as avg_diff_5
from a outer apply
(select top 1 a_prev.*
from a a_prev
where a_prev = a.PositionDate - 1
) a_prev outer apply
(select top 1 a_prev.*
from a a_prev
where a_prev = a.PositionDate - 5
) a_prev5;
Note that the 5 day average difference is the most recent value minus the value from 6 days ago divided by 5.
I already have that kind of formula when I caluclate Delta between 2 dates.
It's like this:
Select todaypositiondate,
'D_RealizedSEK' = round(sum(realizedccy*spotsek) -
(SELECT sum(realizedccy*spotsek)
FROM T1
WHERE todaypositiondate = a.todaypositiondate - 1
GROUP BY todaypositiondate),0)
FROM T1 AS a
group by todaypositiondate
J
Instead of adding 5 formulas and just replaceing -1 with -2, -3... I would like to find away to select the average sum of all realicedccy from the previous 5 days, eventually adding them together and divide by 5.