Oracle - Count based on previous and next column - sql

I've got a rather unusual question about some database query with oracle.
I got asked if it's possible to get the number of cases where the patient got a resumption on the same station they were discharged from within 48 / 72 hours.
Consider the following example:
Case
Station
From
To
1
Stat_1
2020-01-03 20:10:00
2020-01-04 17:40:00
1
Stat_2
2020-01-04 17:40:00
2020-01-05 09:35:00
1
Stat_1
2020-01-05 09:35:00
2020-01-10 12:33:00
In this example, I'd have to check the difference between the last discharge time from station one and the first admission time when he's again registered at station 1. This should then count as one readmission.
I've tried some stuff with LAG and LEAD, but you can't use them in the WHERE-Clause, so that's not too useful I guess.
LAG (o.OEBENEID, 1, 0) OVER (ORDER BY vfs.GUELTIG_BIS) AS Prev_Stat,
LEAD (o.OEBENEID, 1, 0) OVER (ORDER BY vfs.GUELTIG_BIS) AS Next_Stat,
LAG (vfs.GUELTIG_BIS, 1) OVER (ORDER BY vfs.GUELTIG_BIS) AS End_Prev_Stat,
LEAD (vfs.GUELTIG_AB, 1) OVER (ORDER BY vfs.GUELTIG_AB) AS Begin_Next_Stat
I am able to get the old values, but I can't do something like calculate the difference between those two dates.
Is this even possible to achieve? I can't really wrap my head around how to do it with SQL.
Thanks in advance!

You need a partition by clause to retrieve the previous discharge date of the same user in the same station. Then, you can filter in an outer query:
select count(*) as cnt
from (
select case_no, station, dt_from, dt_to
lag(dt_to) over(partition by case_no, station order by dt_from) as lag_dt_to
from mytable t
) t
where dt_from < lag_dt_to + 2
This counts how many rows have a gap of less than 2 days with the previous discharge date of the same user in the same station.
This assumes that your are string your dates as dates. If you have timestamps instead, you need interval arithmetics, so:
where dt_from < lag_dt_to + interval '2' day
Note that case, from and to are reserverd words in Oracle: I used alternative names in the query.

Related

Calculate total working hours of employee based swipe in/ swipe out using oracel sql

I was recently given a task to calculate an employee's total office hours based on his card swipe in/swipe out. I have the following data :
id gate_1 gate_2 gate_3 gate_4
100 null null null 9:00
100 null 13:30 null null
100 null null 16:00 null
100 null null 18:00 null
Image
Here, the employee 100 comes in via gate_4 at 9 am and takes a break at 13:30 and goes out using gate_2. Then he comes back at 16:00 using gate_3 and leave office at 18:00 using gate_3. So, how to calculate the total in office timing using this data ?
Thanks in advance.
As has been pointed out your data model is denormalized to not even satisfy 1st normal form. The first step is to correct that (doing so in a query). Then there is no indication as to swipe in or swipe out, therefore it must be assumed that the first swipe time is always in and the ins/outs always alternate properly. Finally there is no indication of multiple days being covered so the assumption is just 1 period. That is a lot of assumptions.
Since an Oracle data type date contains time as well as the date and summing differences is much easier than with timestamps I convert timestamp to date in the first step of normalizing the data. Given all this we arrive at: (See Demo)
with normal (emp_id, inout_tm) as
( select emp_id, cast(gate1 as date)
from emp_gate_time
where gate1 is not null
union all
select emp_id, cast(gate2 as date)
from emp_gate_time
where gate2 is not null
union all
select emp_id, cast(gate3 as date)
from emp_gate_time
where gate3 is not null
union all
select emp_id, cast(gate4 as date)
from emp_gate_time
where gate4 is not null
)
select emp_id, round(24.0*(sum(hours)),1) hours_in_office
from ( select emp_id,(time_out - time_in) hours
from ( select emp_id, inout_tm time_in, rn
, lead(inout_tm) over(partition by emp_id order by inout_tm) time_out
from ( select n.*
, row_number() over(partition by emp_id order by inout_tm) rn
from normal n
)
)
where mod(rn,2) = 1
)
group by emp_id;
Items of Interest:
Subquery Factoring (CTE)
Date Arithmatic - in Hours ...Difference Between Dates in hours ...
Oracle Analytic Functions - Row_number, lead
You have a denormalized structure of your db scheme. You have fields as gate_1, gate_2 and etc. It's wrong way. The better way is following, you should have reference table of gates, for example like this
id|gate_name
--|---------
And your table with data for employee will be looks like this.
id_employee|id_gate|time
Then you can sort data in this table, and then count period of time between two consecutive rows.

SQL query to select records with altering granularity

I have a table of news articles, articles, each of which have a date attribute. Most days have multiple articles recorded, some days have none at all.
I'd like to be able to get a selection of articles with varying granularity - for instance, one per day, one per month, etc. I've found questions that deal with daily and even monthly, but as a user can select granularity (for instance, one article per 3 days), having multiple queries to deal with each possible level of granularity isn't feasible
Is this something possible using SQL, or will every article need to be selected, and then filtered through using a different language?
Maybe granularity is the wrong word - here's an example of the table:
id
date
headline
1
2020-01-01
This one weird trick...
2
2020-01-01
These two weird tricks...
3
2020-01-01
These fifty weird tricks...
4
2020-01-02
This one crazy trick...
5
2020-01-02
This one odd trick...
6
2020-01-03
These tricks...
7
2020-01-04
These tricks...
8
2020-01-05
These tricks...
With a granularity of one day, the query should return rows 1, 4, 6, 7, 8. With a granularity of 3 days, 1 and 7 will be picked, as 7 is the first record that's 3 days after the first.
You can use a recursive CTE that returns all the dates that you want to include in the results and join it to the table:
WITH cte(date) AS (
SELECT MIN(date) FROM articles
UNION ALL
SELECT date(date, '+3 days')
FROM cte
WHERE date(date, '+3 days') <= (SELECT MAX(date) FROM articles)
)
SELECT MIN(a.id) id, a.date, a.headline
FROM articles a INNER JOIN cte c
ON c.date = a.date
GROUP BY a.date
See the demo.

How to check if dates overlap on different lines in SQL Server?

I have a database with electricity meter readings. Sometimes people get a new meter and then their original meter gets an end date and the new meter gets a start date and the end date remains NULL. This can happen multiple times in a year and I want to know if there are no gaps in measurement. In other words, I need to figure out if end date 1 is the same as start date 2 and so on.
Sample data:
cust_id meter_id start_date end_date
--------------------------------------------------
a 1 2017-01-01 2017-05-02
a 2 2017-05-02 Null
b 3 2017-01-01 2017-06-01
b 4 2017-06-05 Null
This is what the data looks like and the result I am looking for is that for customer a the end date of meter 1 is equal to the start date of meter 2. For customer b however, there are 4 days between the end date of meter 3 and the start date of meter 4. That is something I want to flag.
I found customers for whom this can happen up to 8 times in the period I am researching. I tried something with nested queries and very complex cases but even I lost my way around it, so I was wondering if someone here has an idea of how to get to the answer a little smarter.
You can get the offending rows using lag():
select r.*
from (select r.*,
lag(end_date) over (partition by cust_id, meter_id order by start_date) as prev_end_date,
row_number() over (partition by cust_id, meter_id order by start_date) as seqnum
from readings r
) r
where prev_end_date <> start_date or prev_end_date is null and seqnum > 1;
Guessing there is now a better way to pull this off using LEAD and LAG, but I wrote an article in SQL 2008R2 called T-SQL: Identify bad dates in a time series where you can modify the big cte in the middle of the article to handle your definition of a bad date.
Good luck. There's too much detail in the article to post in a single SO question, otherwise I'd do that here.

Multiple aggregate sums from different conditions in one sql query

Whereas I believe this is a fairly general SQL question, I am working in PostgreSQL 9.4 without an option to use other database software, and thus request that any answer be compatible with its capabilities.
I need to be able to return multiple aggregate totals from one query, such that each sum is in a new row, and each of the groupings are determined by a unique span of time, e.g. WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'. The number of records that satisfy there WHERE clause is unknown and may be zero, in which case ideally the result is "0". This is what I have worked out so far:
(
SELECT SUM(minutes) AS min
FROM downtime
WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-14' AND '2016-02-21'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-28' AND '2016-03-06'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-06' AND '2016-03-13'
)
UNION ALL
(
SELECT SUM(minutes))
FROM downtime
WHERE time_stamp BETWEEN '2016-03-13' AND '2016-03-20'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-20' AND '2016-03-27'
)
Result:
min
---+-----
1 | 119
2 | 4
3 | 30
4 |
5 | 62
6 | 350
That query gets me almost the exact result that I want; certainly good enough in that I can do exactly what I need with the results. Time spans with no records are blank but that was predictable, and whereas I would prefer "0" I can account for the blank rows in software.
But, while it isn't terrible for the 6 weeks that it represents, I want to be flexible and to be able to do the same thing for different time spans, and for a different number of data points, such as each day in a week, each week in 3 months, 6 months, each month in 1 year, 2 years, etc... As written above, it feels as if it is going to get tedious fast... for instance 1 week spans over a 2 year period is 104 sub-queries.
What I'm after is a more elegant way to get the same (or similar) result.
I also don't know if doing 104 iterations of a similar query to the above (vs. the 6 that it does now) is a particularly efficient usage.
Ultimately I am going to write some code which will help me build (and thus abstract away) the long, ugly query--but it would still be great to have a more concise and scale-able query.
In Postgres, you can generate a series of times and then use these for the aggregation:
select g.dte, coalesce(sum(dt.minutes), 0) as minutes
from generate_series('2016-02-07'::timestamp, '2016-03-20'::timestamp, interval '7 day') g(dte) left join
downtime dt
on dt.timestamp >= g.dte and dt.timestamp < g.dte + interval '7 day'
group by g.dte
order by g.dte;

oracle sql: efficient way to calculate business days in a month

I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.