Selecting only the very last set of rows matching the criteria - sql

Considering following data
test1=# select * from sample order by created_at DESC;
id | status | service | created_at
----+--------+---------+---------------------
8 | OK | 1 | 2015-09-16 11:54:00
7 | OK | 1 | 2015-09-16 11:53:00
6 | FAIL | 1 | 2015-09-16 11:52:00
5 | OK | 1 | 2015-09-16 11:51:00
How can I select only the rows with ID 7 and 8. Using window functions I can get row numbers partitioned over status, but so far did not figure out the way how to limit the results only to the last rows identifying 'successful period' for given service.

You need to find the time of the most recent status = 'FAIL' for each service, then select those records of the same service that are more recent:
SELECT *
FROM sample
LEFT JOIN (
SELECT service, max(created_at) AS last_fail
FROM sample
WHERE status = 'FAIL'
GROUP BY service) f USING (service)
WHERE created_at > last_fail
OR last_fail IS NULL; -- also show services without ever failing
This assumes there are only two status codes. If there are more, add a status = 'OK' filter to the WHERE clause.

The most simple approach would be this:
SELECT *
FROM sample AS s
LEFT JOIN (SELECT service, max(id)
FROM sample
WHERE status = 'FAIL'
GROUP BY service) AS q
ON s.id > q.id
AND s.service = q.service

Related

SQL: Calculate number of days since last success

Following table represents results of given test.
Every result for the same test is either pass ( error_id=0) or fail ( error_id <> 0)
I need help to write a query, that returns the number of runs since last good run ( error_id= 0) and the date.
| Date | test_id | error_id |
-----------------------------------
| 2019-12-20 | 123 | 23
| 2019-12-19 | 123 | 23
| 2019-12-17 | 123 | 22
| 2019-12-18 | 123 | 0
| 2019-12-16 | 123 | 11
| 2019-12-15 | 123 | 11
| 2019-12-13 | 123 | 11
| 2019-12-12 | 123 | 0
So the result for this example should be:
| 2019-12-18 | 123 | 4
as the test 123 was PASS on 2019-12-18 and this happened 4 runs ago.
I have a query to determine whether given run is error or not, but I have trouble applying appropriate window function to it to get the wanted result
select test_id, Date, error_id, (CASE WHEN error_id 0 THEN 1 ELSE 0 END) as is_error
from testresults
You can generate a row number, in reverse order from the sorting of the query itself:
SELECT test_date, test_id, error_code,
(row_number() OVER (ORDER BY test_date asc) - 1) as runs_since_last_pass
FROM tests
WHERE test_date >= (SELECT MAX(test_date) FROM tests WHERE error_code=0)
ORDER BY test_date DESC
LIMIT 1;
Note that this will run into issues if test_date is not unique. Better use a timestamp (precise to the millisecond) instead of a date.
Here's a DBFiddle: https://www.db-fiddle.com/f/8gSHVcXMztuRiFcL8zLeEx/0
If there's more than one test_id, you'll want to add a PARTITION BY clause to the row number function, and the subquery would become a bit more complex. It may be more efficient to come up with a way to do this by a JOIN instead of a subquery, but it would be more cognitively complex.
I think you just want aggregation and some filtering:
select id, count(*),
max(date) over (filter where error_id = 0) as last_success_date
from t
where date > (select max(t2.date) from t t2 where t2.error_id = 0);
group by id;
You have to use the Maximum date of the good runs for every test_id in your query. You can try this query:
select tr2.Date_error, tr.test_id, count(tr.error_id) from
testresults tr inner join (select max(Date_error), test_id
from testresult where error_id=0 group by test_id) tr2 on
tr.test_id=tr2.test_id and tr.date_error >=tr2.date_error
group by test_id
This should do the trick:
select count(*) from table t,
(select max(date) date from table where error_id = 0) good
where t.date >= good.date
Basically you are counting the rows that have a date >= the date of the last success.
Please note: If you need the number of days, it is a complete different query:
select now()::date - max(test_date) last_valid from tests
where error_code = 0;

Greatest N Per Group with JOIN and multiple order columns

I have two tables:
Table0:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-18 | 100 |
| aa | 1 | 12-10 | 101 |
| bb | 2 | 12-10 | 102 |
| cc | 1 | 12-09 | 100 |
| cc | 2 | 12-12 | 103 |
| cc | 2 | 12-01 | 109 |
| cc | 1 | 12-07 | 101 |
| dd | 1 | 12-08 | 100 |
and
Table1:
| ID |
|----|
| aa |
| cc |
| cc |
| dd |
| dd |
I'm trying to output results where:
ID must exist in both tables.
TYPE must be the maximum for each ID.
TIME must be the minimum value for the maximum TYPE for each ID.
SITE should be the value from the same row as the minimum TIME value.
Given my sample data, my results should look like this:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-10 | 101 |
| cc | 2 | 12-01 | 109 |
| dd | 1 | 12-08 | 100 |
I've tried these statements:
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MASTY, MIN("TIME") AS MASTM
FROM TABLE0
GROUP BY "ID") AS MAS,
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MSD.MASTY =MA."TYPE"
...which generates a syntax error
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MAB
FROM TABLE0
GROUP BY "ID") AS MAS,
((SELECT "ID", MIN("TIME") AS MACTM, MIN("TYPE") AS MACTY
FROM TABLE0
WHERE "TYPE" = 1
GROUP BY "ID")
UNION
(SELECT "ID", MIN("TIME"), MAX("TYPE")
FROM TABLE0
WHERE "TYPE" = 2
GROUP BY "ID")) AS MACU
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MACU."ID" = QTS."ID"
AND MA."TIME" = MACU.MACTM
AND MA."TYPE" = MACU.MACTB
... which is getting the wrong results.
Answering your direct question "how to avoid...":
You get this error when you specify a column in a SELECT area of a statement that isn't present in the GROUP BY section and isn't part of an aggregating function like MAX, MIN, AVG
in your data, I cannot say
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id
I didn't say what to do with SITE; it's either a key of the group (in which case I'll get every unique combination of ID,site and the min time in each) or it should be aggregated (eg max site per ID)
These are ok:
SELECT
ID, max(site), min(time)
FROM
table
GROUP BY
id
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id,site
I cannot simply not specify what to do with it- what should the database return in such a case? (If you're still struggling, tell me in the comments what you think the db should do, and I'll better understand your thinking so I can tell you why it can't do that ). The programmer of the database cannot make this decision for you; you must make it
Usually people ask this when they want to identify:
The min time per ID, and get all the other row data as well. eg "What is the full earliest record data for each id?"
In this case you have to write a query that identifies the min time per id and then join that subquery back to the main data table on id=id and time=mintime. The db runs the subquery, builds a list of min time per id, then that effectively becomes a filter of the main data table
SELECT * FROM
(
SELECT
ID, min(time) as mintime
FROM
table
GROUP BY
id
) findmin
INNER JOIN table t ON t.id = findmin.id and t.time = findmin.mintime
What you cannot do is start putting the other data you want into the query that does the grouping, because you either have to group by the columns you add in (makes the group more fine grained, not what you want) or you have to aggregate them (and then it doesn't necessarily come from the same row as other aggregated columns - min time is from row 1, min site is from row 3 - not what you want)
Looking at your actual problem:
The ID value must exist in two tables.
The Type value must be largest group by id.
The Time value must be smallest in the largest type group.
Leaving out a solution that involves having or analytics for now, so you can get to grips with the theory here:
You need to find the max type group by id, and then join it back to the table to get the other relevant data also (time is needed) for that id/maxtype and then on this new filtered data set you need the id and min time
SELECT t.id,min(t.time) FROM
(
SELECT
ID, max(type) as maxtype
FROM
table
GROUP BY
id
) findmax
INNER JOIN table t ON t.id = findmax.id and t.type = findmax.maxtype
GROUP BY t.id
If you can't see why, let me know
demo:db<>fiddle
SELECT DISTINCT ON (t0.id)
t0.id,
type,
time,
first_value(site) OVER (PARTITION BY t0.id ORDER BY time) as site
FROM table0 t0
JOIN table1 t1 ON t0.id = t1.id
ORDER BY t0.id, type DESC, time
ID must exist in both tables
This can be achieved by joining both tables against their ids. The result of inner joins are rows that exist in both tables.
SITE should be the value from the same row as the minimum TIME value.
This is the same as "Give me the first value of each group ofids ordered bytime". This can be done by using the first_value() window function. Window functions can group your data set (PARTITION BY). So you are getting groups of ids which can be ordered separately. first_value() gives the first value of these ordered groups.
TYPE must be the maximum for each ID.
To get the maximum type per id you'll first have to ORDER BY id, type DESC. You are getting the maximum type as first row per id...
TIME must be the minimum value for the maximum TYPE for each ID.
... Then you can order this result by time additionally to assure this condition.
Now you have an ordered data set: For each id, the row with the maximum type and its minimum time is the first one.
DISTINCT ON gives you exactly the first row of each group. In this case the group you defined is (id). The result is your expected one.
I would write this using distinct on and in/exists:
select distinct on (t0.id) t0.*
from table0 t0
where exists (select 1 from table1 t1 where t1.id = t0.id)
order by t0.id, type desc, time asc;

SQL query that finds dates between a range and takes values from another query & iterates range over them?

Sorry if the wording for this question is strange. Wasn't sure how to word it, but here's the context:
I'm working on an application that shows some data about the how often individual applications are being used when users make a request from my web server. The way we take data is by every time the start page loads, it increments a data table called WEB_TRACKING at the date of when it loaded. So there are a lot of holes in data, for example, an application might've been used heavily on September 1st but not at all September 2nd. What I want to do, is add those holes with a value on hits of 0. This is what I came up with.
Select HIT_DATA.DATE_ACCESSED, HIT_DATA.APP_ID, HIT_DATA.NAME, WORKDAYS.BENCH_DAYS, NVL(HIT_DATA.HITS, 0) from (
select DISTINCT( TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY')) as BENCH_DAYS
FROM WEB_TRACKING WEB
) workDays
LEFT join (
SELECT TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY') as DATE_ACCESSED, APP.APP_ID, APP.NAME,
COUNT(WEB.IP_ADDRESS) AS HITS
FROM WEB_TRACKING WEB
INNER JOIN WEB_APP APP ON WEB.APP_ID = APP.APP_ID
WHERE APP.IS_ENABLED = 1 AND (APP.APP_ID = 1 OR APP.APP_ID = 2)
AND (WEB.ACCESS_TIME > TO_DATE('08/04/2018', 'MM/DD/YYYY')
AND WEB.ACCESS_TIME < TO_DATE('09/04/2018', 'MM/DD/YYYY'))
GROUP BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), APP.APP_ID, APP.NAME
ORDER BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), app_id DESC
) HIT_DATA ON HIT_DATA.DATE_ACCESSED = WORKDAYS.BENCH_DAYS
ORDER BY WORKDAYS.BENCH_DAYS
It returns all the dates that between the date range and even converts null hits to 0. However, it returns null for app id and app name. Which makes sense, and I understand how to give a default value for 1 application. I was hoping someone could help me figure out how to do it for multiple applications.
Basically, I am getting this (in the case of using just one application):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| NULL | NULL | 08/04/2018 | 0 |
| 1 | test_app | 08/05/2018 | 1 |
| NULL | NULL | 08/06/2018 | 0 |
But I want this(with multiple applications):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| 1 | test_app | 08/04/2018 | 0 |<- these 0's are converted from null
| 1 | test_app | 08/05/2018 | 1 |
| 1 | test_app | 08/06/2018 | 0 | <- these 0's are converted from null
| 2 | prod_app | 08/04/2018 | 2 |
| 2 | prod_app | 08/05/2018 | 0 | <- these 0's are converted from null
So again to reiterate the question in this long post. How should I go about populating this query so that it fills up the holes in the dates but also reuses the application names and ids and populates that information as well?
You need a list of dates, that probably comes from a number generator rather than a table (if that table has holes, your report will too)
Example, every date for the past 30 days:
select trunc(sysdate-30) + level as bench_days from dual connect by level < 30
Use TRUNC instead of turning a date into a string in order to cut the time off
Now you have a list of dates, you want to add in repeating app id and name:
select * from
(select trunc(sysdate-30) + level as bench_days from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
Now you have all your dates, crossed with all your apps. 2 apps and 30 days will make a 60 row resultset via a cross join. Left join your stat data onto it, and group/count/sum/aggregate ...
select app.app_id, app.name, dat.artificialday, COALESCE(stat.ct, 0) as hits from
(select trunc(sysdate-30) + level as artificialday from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
LEFT JOIN
(SELECT app_id, trunc(access_time) accdate, count(ip_address) ct from web_tracking group by app_id, trunc(access_time)) stat
ON
stat.app_id = app.app_id AND
stat.accdate = dat.artificialday
You don't have to write the query this way/do your grouping as a subquery, I'm just representing it this way to lead you to thinking about your data in blocks, that you build in isolation and join together later, to build more comprehensive blocks

Reporting task complementation status with only create and operation_date params

I have two tables that the first one stores task data (task name, create date, assign_to etc) and the second table stores task history data e.g operation_date,task completed, task rejected etc. (Task and Task_history tables)
Company creates tasks and assign them to employees, then employees accepted tasks and complete them.
Task create_date column specify the sequence of the task to do, both operation_date and completed status columns specify the sequence of the task complementation.
I need a query for reporting in employee detail that Does An Employee complete the tasks in a sequence specified at the beginning ? How many tasks completed accordance with the given sequence ?
I tried a query for status completed tasks that order tables for task_creation and operation_date for an employee for a given day. Then, add the rownum for select queries then join two tables. If rownums are equals, employee completes the task for given sequence else not. But the query result was not like what I expected. Rownums displaying like that, r_h--> 1,2,3 ; r_t--> 1,15,17
SELECT *
FROM (SELECT W.id, w.create_date, ROWNUM as r_t
FROM wfm_task_1 W where W.task_status = 3
ORDER BY W.create_date ASC) TASK_SEQ LEFT OUTER JOIN
( SELECT H.wfm_task, H.record_date, ROWNUM as r_h
FROM wfm_task_history H
WHERE H.task_status = 3
AND H.record_date BETWEEN (TO_DATE ('12.07.2013',
'DD.MM.YYYY')
- 1)
AND (TO_DATE ('12.07.2013',
'DD.MM.YYYY')
+ 1)
ORDER BY H.record_date ASC) HISTORY_SEQ
ON TASK_SEQ.id = HISTORY_SEQ.wfm_task
Sample dataset
wfm_task (ID, CREATION_DATE, TASK_NAME)
49361 | 06.07.2013 11:50:00 | missionx
49404 | 10.07.2013 13:01:00 | missiony
49407 | 11.07.2013 11:02:00 | missiona
49108 | 01.07.2013 21:02:00 | missionb
task_history (ID,WFM_TASK,OP_DATE, STATUS)
98 | 49361 | 12.07.2013 15:19:19 | 3
92 | 49404 | 12.07.2013 11:10:50 | 3
90 | 49407 | 12.07.2013 11:06:58 | 3
78 | 49108 | 03.07.2013 11:02:00 | 1
result (WFM_TASK,RECORD_DATE,R_H,ID,CREATE_DATE,R_T)
49361 | 12.07.2013 15:19:19 | 3 | 49361 | 06.07.2013 11:50:00 | 15
49404 | 12.07.2013 11:10:50 | 2 | 49404 | 10.07.2013 13:01:00 | 17
49407 | 12.07.2013 11:06:58 | 1 | 49407 | 11.07.2013 11:02:00 | 1
Status 3 = completed. I want to find that are the tasks completed by an order. I check that task complete order is likely to task creation order.
You'll probably have to use ROW_NUMBER function instead of ROWNUM.
SELECT a.id, a.create_date,
row_number() over (order by a.create_date) r_t,
b.record_date,
row_number() over (order by b.record_date) r_h
from wfm_task a left outer join task_history b
on a.id = b.wfm_task
where b.status = 3
and b.record_date between date'2013-07-12' - 1 and date'2013-07-12' + 1
Demo here.

complex django or SQL query

A record can have status 'renewal_required'. If it enters this status, and the applicant indeed renews, a copy is generated, which enters status 'in_process' (But an application can have status 'in_process' for other reasons too).
Now I need to get all records that have renewal_required status, BUT, if a copy exists in status 'in_process' for a given applicant, I shall only show that one...the key is the applicant_id, being the same for copied records.
| id | status | applicant_id |
| 1 | renewal_required | 2 |
| 2 | in_process | 3 |
| 3 | renewal_required | 4 |
| 4 | in_process | 4 |
in the above example, records with id 1 and 4 would be returned...
Can this be done? Thanks for any suggestion (DB-redesign excluded, even if the design looks ridiculous - can't do anything about it right now)
Solution needs to be for django but if a SQL solution is being proposed I will happily accept it and adapt/execute directly
select a.applicant_id,COALESCE(b.status,a.status) status from
(select applicant_id,status from yourtable where status='renewal_required') a
left join
(select applicant_id,status from yourtable where status='in_process') b
on a.applicant_id = b.applicant_id;
check the DEMO
Here, a possible solution
SELECT MAX(t1.id) as max_id, t1.status, t1.applicant_id
FROM t1
JOIN (
SELECT MIN(status) as status, applicant_id
FROM t1
WHERE status in ('renewal_required', 'in_process')
GROUP by applicant_id ) tmp
ON t1.status = tmp.status
AND t1.applicant_id = tmp.applicant_id
GROUP BY t1.status, t1.applicant_id
SQL Fiddle
EDIT: Rethought it, now this one won't work if there are more statuses than just these two, because of SELECT MIN(status). Could you comment on that?
EDIT2: Might be like this it will. added WHERE status in ('renewal_required', 'in_process')