Counting readmissions in postgresql - sql

I have a table containing data for a prison facility, of the following format:
Prisoner_id admission date discharge date
---------------------------------------------------
1325 06/13/2014 09/13/2014
1266 05/01/2014 07/02/2014
1325 02/21/2015 07/23/2015
1471 02/26/2014 04/20/2014
1266 10/19/2014 12/22/2014
1325 10/09/2015 11/10/2015
I need to count the number of readmissions of each prisoner; that is, how many times each prisoner has been admitted again to the facility, such that the difference between his admission date (date he entered) the last time he entered the facility and his discharge date (date he was let go) the time before the last is less than 60 days.
This means that if the same prisoner has been admitted 2 times, we count this as 1 readmission if the difference between his admission date of the second time and his discharge date of the first time is less than 60 days.
Moreover, if a prisoner has been admitted 3 times, we count this as 2 readmissions if the difference between his discharge date the third time and his admission date the second time AND the difference between his discharge date the second time and his admission date the first time are both less than 60 days. If one of them is less than 60 days but the other is not, count as 1 readmission. If none of them is less than 60 days, count as zero readmissions.
How can I do this in SQL or PostgreSQL? Your help is really appreciated.

I think you just want lag() and some query logic:
The following gets the groups:
select t.prisoner_id,
sum( (prev_dd > admission_date - interval '60 day')::int ) as num_readmissions
from (select t.*,
lag(discharge_date) over (partition by prisoner_id) as prev_dd
from t
) t
group by prisoner_id;

Related

How to calculate the time difference in SQL with DATEDIFF?

I am using the DATEDIFF function to calculate the difference between my two timestamps.
payment_time = 2021-10-29 07:06:32.097332
trigger_time = 2021-10-10 14:11:13
What I have written is : date_diff('minute',payment_time,trigger_time) <= 15
I basically want the count of users who paid within 15 mins of the triggered time
thus I have also done count(s.user_id) as count
However it returns count as 1 even in the above case since the minutes are within 15 but the dates 10th October and 29th October are 19 days apart and hence it should return 0 or not count this row in my query.
How do I compare the dates in my both columns and then count users who have paid within 15 mins?
This also works to calculate minutes between to timestamps (it first finds the interval (subtraction), and then converts that to seconds (extracting EPOCH), and divides by 60:
extract(epoch from (payment_time-trigger_time))/60
In PostgreSQL, I prefer to subtract the two timestamps from each other, and extract the epoch from the resulting interval:
Like here:
WITH
indata(payment_time,trigger_time) AS (
SELECT TIMESTAMP '2021-10-29 07:06:32.097332',TIMESTAMP '2021-10-10 14:11:13'
UNION ALL SELECT TIMESTAMP '2021-10-29 00:00:14' ,TIMESTAMP '2021-10-29 00:00:00'
)
SELECT
EXTRACT(EPOCH FROM payment_time-trigger_time) AS epdiff
, (EXTRACT(EPOCH FROM payment_time-trigger_time) <= 15) AS filter_matches
FROM indata;
-- out epdiff | filter_matches
-- out ----------------+----------------
-- out 1616119.097332 | false
-- out 14.000000 | true

GROUP BY different dates for max and min numbers

I am trying to query this data set of hourly price date. The dataset defined daily prices at 12am - 12am UTC time, I am trying to define the days at 4pm - 4pm UTC time. Therefore I need to get the high and the low prices for each day between ex: '2021-12-15 16:00:00' and '2021-12-16 15:00:00' as that would be the open and close of the trading day.
I have this right now:
SELECT convert(date,dateadd(S, TimeStamp/1000, '1970-01-01')) as 'date'
,symbol
,Max([high]) as 'Max'
,Min([low]) as 'Min'
FROM [Crypto].[tblMessariPriceHistory]
WHERE symbol = 'DOGE'
and dateadd(S, TimeStamp/1000, '1970-01-01') between '2021-12-15 16:00:00' and '2021-12-16 15:00:00'
Group By convert(date,dateadd(S, TimeStamp/1000, '1970-01-01')),symbol
But it results like this:
date
symbol
Max
Min
2021-12-15
DOGE
0.175059052503167
0.170510833636204
2021-12-16
DOGE
0.180266282681554
0.177596458601872
I could just group by Symbol but I want to be able to do this over multiple days, and that wouldn't work.
Any ideas on how to define a select date range as a group or table over multiple days?
If you think about it, subtracting 16 h off every time would slide the time back to some time within the "starting day"
Monday 16:00 becomes midnight Monday
Monday 23:59 becomes 7:59 Monday
Tuesday 00:00 becomes 8:00 Monday
Tuesday 15:59 becomes 23:59 Monday
Tuesday 16:00 becomes midnight Tuesday
Anyway, once you've slid your time backwards 16h, you can just chop the time part off by dividing the unix time stamp by the number of milliseconds in a day and all the trades between Monday 16:00 and Tuesday 15:59:59.999 will go down as "Monday". If it were a DateTime we could cast it to a Date to achieve the same thing. It's handy to find ways of treating datetimes as decimal numbers where the integral is the date and the fractional is the time because chopping it to an int discards the time and allows daily aggregating. If you wanted hourly aggregating, adjusting the number so it represents the number of hours and fractions of an hour (divide by 3600000, the number of milliseconds in an hour) helps to the same end
--take off 16h then truncate to number of days since epoch
SELECT
(timestamp-57600000)/86400000 as timestamp,
symbol,
min(low) as minlow,
max(high) as maxhigh
FROM trades
GROUP BY (timestamp-57600000)/86400000 as timestamp, symbol

SQLite: Get monthly, weekly or daily average of all entries

I have a SQLite database of records in this format:
date location temperature
1568463916 room 1 20.0
1568463916 room 2 25.0
1568463916 room 3 30.0
...
1568460316 room 1 15.5
1568460316 room 2 20.5
1568460316 room 3 21.3
Every hour three new records get inserted, one for every room.
For a monthly average this output is desired:
month avg_temperature location
01 21.333 room 1
01 24.5 room 2
01 19.0 room 3
...
12 20.4 room 1
12 31.31 room 2
12 13.37 room 3
The same query might be reused to get weekly averages (day 00-07) and daily averages (hour 00-23).
To get a monthly average, I'm assuming I will select:
All records with date between now and now - 1 year
Month of every record with strftime(date, "unixepoch") as month
For every location, then for every month get avg(temperature)
The result is rooms*12 rows of average temperature of each room for each month
When I'm using the GROUP BY statement however, I'm only getting the last row of every month. What's the correct way to construct this kind of query?
This is the query I've tried:
SELECT strftime("%m", date, "unixepoch") month,
avg(temperature) avg_temperature,
location
FROM table
WHERE date > date("now", "unixepoch", "-1 year")
AND date < date("now", "unixepoch")
GROUP BY location, month
ORDER BY month
This should do what you want:
select date(datetime(date, 'unixepoch'), 'start of month') as month,
location,
avg(temperature)
from t
group by date(datetime(date, 'unixepoch'), 'start of month') as month,
location
order by month, location;

SQLite - Determine average sales made for each day of week

I am trying to produce a query in SQLite where I can determine the average sales made each weekday in the year.
As an example, I'd say like to say
"The average sales for Monday are $400.50 in 2017"
I have a sales table - each row represents a sale you made. You can have multiple sales for the same day. Columns that would be of interest here:
Id, SalesTotal, DayCreated, MonthCreated, YearCreated, CreationDate, PeriodOfTheDay
Day/Month/Year are integers that represent the day/month/year of the week. DateCreated is a unix timestamp that represents the date/time it was created too (and is obviously equal to day/month/year).
PeriodOfTheDay is 0, or 1 (day, or night). You can have multiple records for a given day (typically you can have at most 2 but some people like to add all of their sales in individually, so you could have 5 or more for a day).
Where I am stuck
Because you can have two records on the same day (i.e. a day sales, and a night sales, or multiple of each) I can't just group by day of the week (i.e. group all records by Saturday).
This is because the number of sales you made does not equal the number of days you worked (i.e. I could have worked 10 saturdays, but had 30 sales, so grouping by 'saturday' would produce 30 sales since 30 records exist for saturday (some just happen to share the same day)
Furthermore, if I group by daycreated,monthcreated,yearcreated it works in the sense it produces x rows (where x is the number of days you worked) however that now means I need to return this resultset to the back end and do a row count. I'd rather do this in the query so I can take the sales and divide it by the number of days you worked.
Would anyone be able to assist?
Thanks!
UPDATE
I think I got it - I would love someone to tell me if I'm right:
SELECT COUNT(DISTINCT CAST(( julianday((datetime(CreationDate / 1000, 'unixepoch', 'localtime'))) ) / 7 AS INT))
FROM Sales
WHERE strftime('%w', datetime(CreationDate / 1000, 'unixepoch'), 'localtime') = '6'
AND YearCreated = 2017
This would produce the number for saturday, and then I'd just put this in as an inner query, dividing the sale total by this number of days.
Buddy,
You can group your query by getting the day of week and week number of day created or creation date.
In MSSQL
DATEPART(WEEK,'2017-08-14') // Will give you week 33
DATEPART(WEEKDAY,'2017-08-14') // Will give you day 2
In MYSQL
WEEK('2017-08-14') // Will give you week 33
DAYOFWEEK('2017-08-14') // Will give you day 2
See this figures..
Day of Week
1-Sunday, 2- Monday, 3-Tuesday, 4-Wednesday, 5-Thursday, 6-Saturday
Week Number
1 - 53 Weeks in a year
This will be the key so that you will have a separate Saturday's in every month.
Hope this can help in building your query.

Number of specific one-hour periods between two date/times

I have a table of table records, call it "game"
It has an id and timestamp.
What I need to know is unrelated to the table specifically. In order to know the average number of games played per hour, I need to know :
Total games played for each hour over the date range
Number of hourly
periods between the date range.
Finding the first is a matter of extracting the hour from the timestamp and grouping by it.
For the second, if the date range was rounded to the nearest day, finding this value would be easy (totalgames/numdays).
Unfortunately I can't assume this. What I need help with is finding the number of specific hour periods existing within a time range.
Example:
If the range is 5 PM today to 8 PM tomorrow, there is one "00" hour (midnight to 1 AM), but two 17, 18, 19 hours (5-6, 6-7, 7-8)
Thanks for the help
Edit: for clarity, consider the following query:
I have table game:
id, daytime
select EXTRACT(hour from daytime) as hour_period, count (*)
from game
where daytime > dateFrom and daytime < dayTo
group by hour_period
This will give me the number of games played broken down into hourly chunks for the time period.
In order to find the average games played per hour, I need to know exactly how many specific hour durations are between two timestamps. Simply dividing by the number of days is not accurate.
Edit: The ideal output will look something like this:
00 275
01 300
02 255
...
Consider the following: How many times does midnight occur between date 1 and date 2 ? If you have 1.5 days, that doesn't guarantee that midnight will occur twice. 6 AM today to 6 PM tomorrow night, for example, has 1 midnight, but 9PM tonight to 9 AM two days from now has 2 midnights.
What I'm trying to find is how many of the EXACT HOUR occurs between two timestamps, so I can use it to average the number of games played at THAT HOUR over a time period.
EDIT:
The following query gets the days, hours, and # of games, giving an output as below:
29 23 100
29 00 130
30 22 140
30 23 150
Then, the outer query adds up the number of games for each distinct hour and divides by the number of hours, as follows
22 140
23 125
00 130
The modified query is below:
SELECT
hour_period,
sum(hourly_no_of_games) / count(hour_period)
FROM
(
SELECT
EXTRACT(DAY from daytime) as day_period,
EXTRACT(HOUR from daytime) as hour_period,
count (*) hourly_no_of_games
from game
where daytime > dateFrom and daytime < dayTo
group by EXTRACT(DAY from daytime), EXTRACT(HOUR from daytime)
) hourly_data
GROUP BY hour_period
ORDER BY hour_period;
SQL Fiddle demo
If you need something to GROUP BY, you can truncate the timestamp to the level of hour, as in the following:
DECLARE #Date DATETIME
SET #Date = GETDATE()
SELECT #Date, DATEADD(Hour, DATEDIFF(Hour, 0, #Date), 0) AS RoundedDate
If you just need to find the total hours, you can just select the DATEDIFF in hours, such as with
SELECT DATEDIFF(Hour, '5/29/2014 20:01:32.999', GETDATE())
Extract not only the hour of the day but the day of the year (1-366). Then group on those. If there is the possibility the interval could span a year, then add the year itself and group by all three.
year dy hr games
2013 365 23 115
2014 1 00 103