SQL : GROUP and MAX multiple columns - sql

I am a SQL beginner, can anyone please help me about a SQL query?
my table looks like below
PatientID Date Time Temperature
1 1/10/2020 9:15 36.2
1 1/10/2020 20:00 36.5
1 2/10/2020 8:15 36.1
1 2/10/2020 18:20 36.3
2 1/10/2020 9:15 36.7
2 1/10/2020 20:00 37.5
2 2/10/2020 8:15 37.1
2 2/10/2020 18:20 37.6
3 1/10/2020 8:15 36.2
3 2/10/2020 18:20 36.3
How can I get each patient everyday's max temperature:
PatientID Date Temperature
1 1/10/2020 36.5
1 2/10/2020 36.3
2 1/10/2020 37.5
2 2/10/2020 37.6
Thanks in advance!

For this dataset, simple aggregation seems sufficient:
select patientid, date, max(temperature) temperature
from mytable
group by patientid, date
On the other hand, if there are other columns that you want to display on the row that has the maximum daily temperature, then it is different. You need some filtering; one option uses window functions:
select *
from (
select t.*,
rank() over(partition by patientid, date order by temperature desc)
from mytable t
) t
where rn = 1

Related

Calculate duration between two rows T-Sql

Good afternoon! Could anyone help me to solve the task? I have a table:
Id
Date
Reason
1
2020-01-01 10:00
Departure
1
2020-01-01 12:20
Arrival
1
2020-01-02 14:30
Departure
1
2020-01-02 19:20
Arrival
1
2020-01-03 15:40
Departure
1
2020-01-04 19:20
Arrival
2
2020-02-03 15:40
Departure
2
2020-02-04 19:20
Arrival
3
2020-03-05 15:40
Departure
3
2020-03-05 19:20
Arrival
3
2020-03-06 16:28
Departure
3
2020-03-06 21:00
Arrival
I need to estimate average duration of each ID. At first step I want to get table, for example for id = 1, as
Id
Duraton (minutes)
1
140
1
290
1
1660
How can I achive that by T-Sql query?
Assuming the rows are perfectly interleaved, you can use lead():
select t.*,
datediff(minute, date, next_date) as diff_minutes
from (select t.*,
lead(date) over (partition by id order by date) as next_date
from t
) t
where reason = 'Departure';
If you want the results for only one id, you can filter in either the subquery or the outer query.

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

Summarizing the date, count to minimum and maximum date as per interval - SQL

The following is the table I am having,
City date count
Seattle 2016-07-14 10
Seattle 2016-07-15 20
Seattle 2016-07-16 30
Seattle 2016-07-18 40
Seattle 2016-07-19 50
Seattle 2016-07-20 60
Seattle 2016-07-25 70
Seattle 2016-07-26 80
Bellevue 2016-07-21 90
Bellevue 2016-07-22 100
Bellevue 2016-07-23 110
Bellevue 2016-07-25 120
Bellevue 2016-07-26 130
Bellevue 2016-07-27 140
Bellevue 2016-08-10 150
Bellevue 2016-08-11 160
Bellevue 2016-08-12 170
I want to summarize this table into date intervals where every row will contain each interval of date. Whenever there is a break in the days, I want to create another row. My sample output should be as follows,
City min_date max_date sum_count
Seattle 2016-07-14 2016-07-16 60
Seattle 2016-07-18 2016-07-20 150
Seattle 2016-07-25 2016-07-26 150
Bellevue 2016-07-21 2016-07-23 300
Bellevue 2016-07-25 2016-07-27 390
Bellevue 2016-08-10 2016-08-12 480
Here if we can see, whenever there is a break in the dates, a new entry is created and the count is summed across. I want to create a entry whenever there is a break in the date.
I tried,
select city, min(date), max(date) , sum(count) from table
group by city
but that gives only two rows here.
Can anybody help me in doing this in Hive?
This is a "gaps-and-islands" problem. The difference of row number from the date works:
select city, min(date), max(date), sum(count)
from (select t.*,
row_number() over (partition by city order by date) as seqnum
from t
) t
group by city, date_sub(date, seqnum);

DENSE_RANK() OVER (Order by UniqueIdentifer) issue

I'm struggling trying to get DENSE_RANK to do what I want it to do.
It is basically to create a unique invoice number based on a unique identifier, but it needs to go up in order based on the date/time of the invoice.
For example I need:
InvoiceNo TxnId TxnDate
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:01
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:02
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:03
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:04
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:05
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:06
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:07
1 6C952E91-B888-4244-9079-14FBECAE0BA2 02/01/2014 00:08
2 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
2 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:10
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:20
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:21
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:23
But what I get when using DENSE_RANK OVER (Order by TxnId) is:
InvoiceNo TxnId TxnDate
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:02
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:01
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:03
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:04
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:06
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:05
1 6C952E91-B888-4244-9079-14FBECAE0BA2 02/01/2014 00:08
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:07
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:10
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:21
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:20
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:23
3 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
3 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
If I do DENSE_RANK OVER(TxnId,TxnDate), it is a complete mess and doesn't do what I want either.
Any ideas guys? Am I even using the write function to do this? Any help appreciated :)
I think you want:
select dense_rank() over (order by txnid, txndate)
Everything with the same transaction id and date will have the same value.
EDIT:
If you need to extract the date, then that depends on the database. It would look something like this. For Oracle:
select dense_rank() over (order by txnid, trunc(txndate))
For Postgres:
select dense_rank() over (order by txnid, date_trunc('day', txndate))
For SQL Server:
select dense_rank() over (order by txnid, cast(txndate as date))
EDIT II:
You want the transactions ordered by the earliest date. Get the earliest date and then do the dense_rank():
select dense_rank() over (order by txnmindate, txnid)
from (select t.*, min(txndate) over (partition by txnid) as txnmindate
from table t
) t

how to find the date difference in hours between two records with nearest datetime value and it must be compared in same group

How to find the date difference in hours between two records with nearest datetime value and it must be compared in same group?
Sample Data as follows:
Select * from tblGroup
Group FinishedDatetime
1 03-01-2009 00:00
1 13-01-2009 22:00
1 08-01-2009 03:00
2 01-01-2009 10:00
2 13-01-2009 20:00
2 10:01-2009 10:00
3 27-10-2008 00:00
3 29-10-2008 00:00
Expected Output :
Group FinishedDatetime Hours
1 03-01-2009 00:00 123
1 13-01-2009 22:00 139
1 08-01-2009 03:00 117
2 01-01-2009 10:00 216
2 13-01-2009 20:00 82
2 10:01-2009 10:00 82
3 27-10-2008 00:00 48
3 29-10-2008 00:00 48
Try this:
Select t1.[Group], DATEDIFF(HOUR, z.FinishedDatetime, t1.FinishedDatetime)
FROM tblGroup t1
OUTER APPLY(SELECT TOP 1 *
FROM tblGroup t2
WHERE t2.[Group] = t1.[Group] AND t2.FinishedDatetime<t1.FinishedDatetime
ORDER BY FinishedDatetime DESC)z