Can't think of a better title for this, and I've had a lot of problems coming up with a solution in SQL.
Basically, I have a table of events with dates and an associated entity, like the following:
event
entity
date
er_visit
bob
2020-01-01
triage
bob
2020-01-01
admitted_to_icu
bob
2020-01-01
inpatient_bed_rest
bob
2020-01-02
inpatient_bed_rest
bob
2020-01-03
physical_therapy
bob
2020-01-03
hospital_discharge
bob
2020-01-11
physical_therapy
bob
2020-01-12
physical_therapy
bob
2020-01-13
inpatient_followup
bob
2020-02-01
inpatient_followup
bob
2021-02-11
pregnancy_checkup
alice
2020-01-01
admitted_maternity_ward
alice
2020-02-01
inpatient_birth
alice
2020-02-02
bed_rest
alice
2020-02-02
bed_rest
alice
2020-02-03
bed_rest
alice
2020-02-04
hospital_discharge
alice
2021-02-04
I need to turn this into a list of events per entity, with the relative dates encoded as discrete intervals - say 1 day, 1 week, 1 month, 1 year - and other intervals encoded as a combination of those intervals. 14 months, 15 days would be 1 year + 1 month + 1 month + 1 week + 1 week + 1 day.
event
entity
seq
er_visit
bob
1
triage
bob
2
admitted_to_icu
bob
3
nextday
bob
4
inpatient_bed_rest
bob
5
nextday
bob
6
inpatient_bed_rest
bob
7
physical_therapy
bob
8
nextweek
bob
9
nextday
bob
10
hospital_discharge
bob
11
nextday
bob
12
physical_therapy
bob
13
nextday
bob
14
physical_therapy
bob
15
nextweek
bob
16
nextweek
bob
17
nextday
bob
18
nextday
bob
19
nextday
bob
20
nextday
bob
21
nextday
bob
22
inpatient_followup
bob
23
nextyear
bob
24
nextweek
bob
25
nextday
bob
26
nextday
bob
27
nextday
bob
28
inpatient_followup
bob
29
pregnancy_checkup
alice
1
nextmonth
alice
2
admitted_maternity_ward
alice
3
nextday
alice
4
inpatient_birth
alice
5
bed_rest
alice
6
nextday
alice
7
bed_rest
alice
8
nextday
alice
9
bed_rest
alice
10
hospital_discharge
alice
11
Is there a reasonable solution for this in Athena (Presto) SQL?
Done in Trino, but should be the same for these functions.
I've output the intervals in number of days, converting it from there to your strangely formatted desired output is possible, but way more complicated than makes sense.
WITH the_table AS (
SELECT event, entity, date(date) date
FROM (VALUES
('er_visit', 'bob', '2020-01-01'), ('triage', 'bob', '2020-01-01'),
('admitted_to_icu', 'bob', '2020-01-01'), ('inpatient_bed_rest', 'bob', '2020-01-02'),
('inpatient_bed_rest', 'bob', '2020-01-03'), ('physical_therapy', 'bob', '2020-01-03'),
('hospital_discharge', 'bob', '2020-01-11'), ('physical_therapy', 'bob', '2020-01-12'),
('physical_therapy', 'bob', '2020-01-13'), ('inpatient_followup', 'bob', '2020-02-01'),
('inpatient_followup', 'bob', '2021-02-11'), ('pregnancy_checkup', 'alice', '2020-01-01'),
('admitted_maternity_ward', 'alice', '2020-02-01'), ('inpatient_birth', 'alice', '2020-02-02'),
('bed_rest', 'alice', '2020-02-02'), ('bed_rest', 'alice', '2020-02-03'),
('bed_rest', 'alice', '2020-02-04'), ('hospital_discharge', 'alice', '2021-02-04'))
AS data(event, entity, date)
)
SELECT event
, entity
, date_diff('day', lag(date) OVER (PARTITION BY entity ORDER BY date), date) days_since_last
FROM the_table
event
entity
days_since_last
pregnancy_checkup
alice
NULL
admitted_maternity_ward
alice
31
inpatient_birth
alice
1
bed_rest
alice
0
bed_rest
alice
1
bed_rest
alice
1
hospital_discharge
alice
366
er_visit
bob
NULL
triage
bob
0
admitted_to_icu
bob
0
inpatient_bed_rest
bob
1
inpatient_bed_rest
bob
1
physical_therapy
bob
0
hospital_discharge
bob
8
physical_therapy
bob
1
physical_therapy
bob
1
inpatient_followup
bob
19
inpatient_followup
bob
376
How to find the Effective end Date for the below table using select statement only
This is the actual table:
EMID ENAME DEPT_NO EFDT
101 ANUJ 10 1/1/2018
101 ANUJ 11 1/1/2020
101 ANUJ 12 5/1/2020
102 KUNAL 12 1/1/2019
102 KUNAL 14 1/1/2020
102 KUNAL 15 5/1/2020
103 AJAY 11 1/1/2018
103 AJAY 12 1/1/2020
104 RAJAT 10 1/1/2018
104 RAJAT 12 1/1/2020
This is desired output:
EMID ENAME DEPTNO EFDT EF_ENDT
101 ANUJ 10 1/1/2018 12/31/2019
101 ANUJ 11 1/1/2020 4/30/2020
101 ANUJ 12 5/1/2020 NULL
102 KUNAL 12 1/1/2019 12/31/2019
102 KUNAL 14 1/1/2020 4/30/2020
102 KUNAL 15 5/1/2020 NULL
103 AJAY 11 1/1/2018 12/31/2019
103 AJAY 12 1/1/2020 NULL
104 RAJAT 10 1/1/2018 12/31/2019
104 RAJAT 12 1/1/2020 NULL
The EF_ENDT needs to be populated using the statement only.
How can we do this?
This code can be generic for all Database
Basically, you want a lead and then to subtract one day. The standard SQL for this is:
select t.*,
lead(efdt) over (partition by emid order by efdt) - interval '1 day' as ef_enddt
from t;
Date/time function vary significantly among databases. All provide some method for subtracting one day. You'll probably have to adapt this to your particular (unstated) database.
I would like to get the output for the following problem.
I have the following datatype:
id start end count Time Train
001 Paris London 01 05:00 Yes
001 Paris London 01 05:00 Yes
002 Prague Vienna 15 15:00 No
003 Frankfurt London 01 17:00 Yes
015 Paris London 08 21:00 No
019 Barcelona Vienna 15 15:00 No
003 Frankfurt London 01 07:00 Yes
002 Prague Vienna 15 05:00 No
I would like to find the sum of count, grouped by the id. Also ignore the rows that has the same id, start and end . Also I have a data of 4 gb and I would like to find the start and end city of top 5 count. Thank you.
I could like to get output that gives data something similar to this,
Prague -> Vienna Count : 15
Barcelona -> Vienna count : 15
Paris --> london Count : 09
Frankfurt -> London Count: 02
.....
You can use drop_duplicates + groupby with aggregating sum:
df['count'] = df['count'].astype(int)
df = df.drop_duplicates(['id','start','end'])
print (df)
id start end count Time Train
0 001 Paris London 1 05:00 Yes
2 002 Prague Vienna 15 15:00 No
3 003 Frankfurt London 1 07:00 Yes
4 015 Paris London 8 21:00 No
5 019 Barcelona Vienna 15 15:00 No
df1 = df.groupby('id', as_index=False)['count'].sum()
print (df1)
id count
0 001 1
1 002 15
2 003 1
3 015 8
4 019 15
df11 = df.groupby(['id', 'start', 'end'], as_index=False)['count'].sum()
print (df11)
id start end count
0 001 Paris London 1
1 002 Prague Vienna 15
2 003 Frankfurt London 1
3 015 Paris London 8
4 019 Barcelona Vienna 15
df12 = df.groupby(['start', 'end'], as_index=False)['count'].sum()
print (df12)
start end count
0 Barcelona Vienna 15
1 Frankfurt London 1
2 Paris London 9
3 Prague Vienna 15
For top values use nlargest:
df2 = df.nlargest(5, 'count')[['start','end']]
print (df2)
start end
2 Prague Vienna
5 Barcelona Vienna
4 Paris London
0 Paris London
3 Frankfurt London
SELECT T.* FROM
(
SELECT *,COUNT(id) AS count FROM TABLE1 GROUP BY id,start,end
) T
GROUP BY id ORDER BY count DESC LIMIT 0,5
This is my source table
Reference ModifiedDate Teachers Students SchoolID ETC
-------------------------------------------------------------------------
1023175 2017-03-03 16:02:01.723 10 25 5
1023175 2017-03-07 07:59:49.283 15 50 15
1023175 2017-03-12 11:14:40.230 25 6 5
1023176 2017-03-04 16:02:01.723 11 35 8
1023176 2017-03-08 07:59:49.283 16 60 25
1023177 2017-03-15 11:14:40.230 15 7 2
I need the following output
Reference StartDate EndDate
---------------------------------------------
1023175 2017-03-03 16:02:01.723 2017-03-07 07:59:49.283
1023175 2017-03-07 07:59:49.283 2017-03-12 11:14:40.230
1023175 2017-03-12 11:14:40.230 9999-12-31 00:00:00.000
1023176 2017-03-04 16:02:01.723 2017-03-08 07:59:49.283
1023176 2017-03-08 07:59:49.283 9999-12-31 00:00:00.000
1023177 2017-03-15 11:14:40.230 9999-12-31 00:00:00.000 (last record should have this value)
Teachers Students SchoolID
10 25 5
15 50 15
25 6 5
11 35 8
16 60 25
15 7 2
All other columns like Teachers,Students and SchoolId etc also have to be in the output along with each record.
Any suggestions on how this can be achieved?
Using Sql Server 2008
using outer apply():
select
Reference
, StartDate = t.ModifiedDate
, EndDate = coalesce(x.ModifiedDate, convert(datetime,'9999-12-31 00:00:00.000'))
, Teachers
, Students
, SchoolID
from t
outer apply (
select top 1 i.ModifiedDate
from t as i
where i.Reference = t.Reference
and i.ModifiedDate > t.ModifiedDate
order by i.ModifiedDate asc
) x
rextester demo: http://rextester.com/RFTD32624
returns:
+-----------+-------------------------+-------------------------+----------+----------+----------+
| Reference | StartDate | EndDate | Teachers | Students | SchoolID |
+-----------+-------------------------+-------------------------+----------+----------+----------+
| 1023175 | 2017-03-03 16:02:01.723 | 2017-03-07 07:59:49.283 | 10 | 25 | 5 |
| 1023175 | 2017-03-07 07:59:49.283 | 2017-03-12 11:14:40.230 | 15 | 50 | 15 |
| 1023175 | 2017-03-12 11:14:40.230 | 9999-12-31 00:00:00.000 | 25 | 6 | 5 |
| 1023176 | 2017-03-04 16:02:01.723 | 2017-03-08 07:59:49.283 | 11 | 35 | 8 |
| 1023176 | 2017-03-08 07:59:49.283 | 9999-12-31 00:00:00.000 | 16 | 60 | 25 |
| 1023177 | 2017-03-15 11:14:40.230 | 9999-12-31 00:00:00.000 | 15 | 7 | 2 |
+-----------+-------------------------+-------------------------+----------+----------+----------+
Reference:
apply() - msdn
The power of T-SQL's APPLY operator - Rob Farley
APPLY: It Slices! It Dices! It Does It All! - Brad Shulz
I have three tables in my database Books, Borrowers and Movement:
Books
BookID Title Author Category Published
----------- ------------------------------ ------------------------- --------------- ----------
101 Ulysses James Joyce Fiction 1922-06-16
102 Huckleberry Finn Mark Twain Fiction 1884-03-24
103 The Great Gatsby F. Scott Fitzgerald Fiction 1925-06-17
104 1984 George Orwell Fiction 1949-04-19
105 War and Peace Leo Tolstoy Fiction 1869-08-01
106 Gullivers Travels Jonathan Swift Fiction 1726-07-01
107 Moby Dick Herman Melville Fiction 1851-08-01
108 Pride and Prejudice Jane Austen Fiction 1813-08-13
110 The Second World War Winston Churchill NonFiction 1953-09-01
111 Relativity Albert Einstein NonFiction 1917-01-09
112 The Right Stuff Tom Wolfe NonFiction 1979-09-07
121 Hitchhikers Guide to Galaxy Douglas Adams Humour 1975-10-27
122 Dad Is Fat Jim Gaffigan Humour 2013-03-01
131 Kick-Ass 2 Mark Millar Comic 2012-03-03
133 Beautiful Creatures: The Manga Kami Garcia Comic 2014-07-01
Borrowers
BorrowerID Name Birthday
----------- ------------------------- ----------
2 Bugs Bunny 1938-09-08
3 Homer Simpson 1992-09-09
5 Mickey Mouse 1928-02-08
7 Fred Flintstone 1960-06-09
11 Charlie Brown 1965-06-05
13 Popeye 1933-03-03
17 Donald Duck 1937-07-27
19 Mr. Magoo 1949-09-14
23 George Jetson 1948-04-08
29 SpongeBob SquarePants 1984-08-04
31 Stewie Griffin 1971-11-17
Movement
MoveID BookID BorrowerID DateOut DateIn ReturnCondition
----------- ----------- ----------- ---------- ---------- ---------------
1 131 31 2012-06-01 2013-05-24 good
2 101 23 2012-02-10 2012-03-24 good
3 102 29 2012-02-01 2012-04-01 good
4 105 7 2012-03-23 2012-05-11 good
5 103 7 2012-03-22 2012-04-22 good
6 108 7 2012-01-23 2012-02-12 good
7 112 19 2012-01-12 2012-02-10 good
8 122 11 2012-04-14 2013-05-01 poor
9 106 17 2013-01-24 2013-02-01 good
10 104 2 2013-02-24 2013-03-10 bitten
11 121 3 2013-03-01 2013-04-01 good
12 131 19 2013-04-11 2013-05-23 good
13 111 5 2013-05-22 2013-06-22 poor
14 131 2 2013-06-12 2013-07-23 bitten
15 122 23 2013-07-10 2013-08-12 good
16 107 29 2014-01-01 2014-02-14 good
17 110 7 2014-01-11 2014-02-01 good
18 105 2 2014-02-22 2014-03-02 bitten
What is a query I can use to find out which book was borrowed by the oldest borrower?
I am new to SQL and am using Microsoft SQL Server 2014
Here are two different solutions:
First using two sub querys and one equi-join:
select Title
from Books b , Movement m
where b.BookID = m.BookID and m.BorrowerID = (select BorrowerID
from Borrowers
where Birthday = (select MIN(Birthday)
from Borrowers))
Using two equi-joins and one sub query:
select Title
from Books b, Borrowers r, Movement m
where b.BookID = m.BookID
and m.BorrowerID = r.BorrowerID
and Birthday = (select MIN(Birthday) from Borrowers)
Both above queries give the following answer:
Title
------------------------------
Relativity