I have a somewhat complicated SQL query to perform, and I'm not sure what the right strategy is.
Consider the model:
event
foreignId Int
time UTCTime
success Bool
and suppose I have a predicate, which we can call trailingSuccess, that is True if the last n events were successful. I want to test for this predicate. That is, I want to run a query on event that returns a count of foreignId's for which the event was a success each of the last n times (or more) that the event was logged.
I am using Postgres, if it matters, but I'd rather stay in the ANSI fragment if possible.
What is a sensible strategy for computing this query?
So far, I have code like:
SELECT count (*)
FROM (SELECT e.foreignId
FROM event e
...
ORDER BY e.time ASC
LIMIT n)
Obviously, I didn't get very far. I'm not sure how to express a predicate that quantifies over multiple rows.
For hypothetical usage, n = 4 is fine.
Example data:
foreign_id time success
1 14:00 True
1 15:00 True
1 16:00 True
1 17:00 True
2 14:00 False
2 15:00 True
2 16:00 True
2 17:00 True
3 14:00 True
3 15:00 True
3 16:00 True
For the sample data, the query should return 1, because there are n = 4 successful events with foreign_id = 1. foreign_id 2 does not count because there is a False one in the last 4. foreign_id 3 does not count because there aren't enough events with foreign_id = 3.
Try finding the latest "unsuccessful" entry fur each foreignID, using a simple GROUP BY clause. With this in a sub-query, you can join it back to the table, counting how many records there are (for each foreignID) that matches foreignID and has newer time.
Something like:
SELECT lastn.foreignID, count(*)
FROM
(SELECT foreignID, MAX(time) AS lasttime
FROM event
WHERE success = 'n'
GROUP BY foreignID
) AS lastn
JOIN event AS e
ON e.foreignID = lastn.foreignID
AND e.time > lastn.lasttime
GROUP BY lastn.foreignID;
And you can experiment with left joins and the like to tweak it to your needs.
select count(*)
from (
select
foreignId,
row_number() over(partition by foreignId order by "time" desc) as rn,
success
from event
) s
where rn <= n
group by foreignId
having bool_and(success)
The first derived table selects all foreignIds that have at least n events. The subquery checks if the last n events for each foreignId were all successful.
SELECT COUNT(*)
FROM (
SELECT foreignId
FROM event
GROUP BY foreignId
HAVING COUNT(*) >= n
) t1
WHERE (
SELECT COUNT(CASE WHEN NOT success THEN 1 END) = 0
FROM event
WHERE foreignId = t1.foreignId
ORDER BY time DESC
LIMIT n
)
I ended up messing around on sqlfiddle for a while, until I arrived at this:
select count (*)
from (select count (last.foreignId) as cnt
from (select foreignId
from event
and success = True
order by time desc
) as last
group by last.foreignId) as correct
where correct.cnt >= 4
I guess the insight I'm adding is that every layer of "selecting" can be thought of as a filter on the inner selections.
Related
I'm trying to query event data from firebase. The goal is to get the last event for users with an event sequence starting with event a. The events are ordered by time. I have tried some approaches with lead, join etc. couldn't produce the desired result.
Eample data:
user_id
event_name
1
a
1
b
1
c
2
b
2
a
3
a
4
a
4
b
the ideal output:
user_id
event_name
1
c
3
a
4
b
The events are ordered by time.
So, I assume you do have column named somehow like time
Consider below approach
select user_id, event_name
from your_table
where true
qualify 1 = row_number() over(partition by user_id order by time desc)
and 'a' = first_value(event_name) over(partition by user_id order by time)
if applied to sample data in your question - output is
I have a log table which logs a start row, and a finish row for a particular event.
Each event should have a start row, and if everything goes ok it should have an end row.
But if something goes wrong then the end row may not be created.
I want to SELECT everything in the table that has a start row but not an associated end row.
For example, consider the table like this:
id event_id event_status
1 123 1
2 123 2
3 234 1
4 234 2
5 456 1
6 678 1
7 678 2
Notice that the id column 5 has a start row but no end row. Start is an event_status of 1, end is an event_status of 2.
How can i pull back all the event_ids which have a start row but not an end row>?
This is for mssql.
You could use a not exists subquery to demand that no other row exists that ends the event:
select *
from YourTable t1
where status = 1
and not exists
(
select *
from YourTable t2
where t2.event_id = t1.event_id
and t2.status = 2
)
You can try with left self join as below:
select y1.event_id from #yourevents y1 left join #yourevents y2
on y1.event_id = y2.event_id
and y1.event_status = 1
and y2.event_status = 2
where y2.event_id is null
and y1.event_status = 1
In this particular case you could use one of 3 solutions:
Solution 1. The classic
Check if there is no end status
SELECT *
FROM myTable t1
WHERE NOT EXISTS (
SELECT *
FROM myTable t2
WHERE t1.event_id = t2.event_id AND t2.status=2
)
Solution 2. Make it pretty. Don't do subqueries with so many parentheses
The same check, but in a more concise and pretty manner
SELECT t1.*
FROM myTable t1
LEFT JOIN myTable t2 ON t1.event_id = t2.event_id AND t2.status=2
-- Doesn't exist
WHERE t2.event_id IS NULL
Solution 3. Look for the last status for each event
More flexibility in case the status logic becomes more complicated
WITH last_status AS (
SELECT
id,
event_id,
status,
-- The ROWS BETWEEN ..yadda yadda ... FOLLOWING might be unnecessary. Try, check.
last_value(status) OVER (PARTITION BY event_id ORDER BY status ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_status
FROM myTable
)
SELECT
id,
event_id,
status
FROM last_events
WHERE last_status<>2
There are more, with min/max queries and others. Pick what best suits your need for cleanliness, readability and versatility.
So i have a table of readings (heavily simplified version below) - sometimes there is a break in the reading history (see the record i have flagged as N) - The 'From Read' should always match a previous 'To Read' or the 'To Read' should always match a later 'From Read' BUT I want to only select records as far back as the first 'break' in the reads.
How would i write a query in DB2 SQL to only return the rows flagged with a 'Y'?
EDIT: The contiguous flag is something i have added manually to represent the records i would like to select, it does not exist on the table.
ID From To Contiguous
ABC 01/01/2014 30/06/2014 Y
ABC 01/06/2013 01/01/2014 Y
ABC 01/05/2013 01/06/2013 Y
ABC 01/01/2013 01/02/2013 N
ABC 01/10/2012 01/01/2013 N
Thanks in advance!
J
you will need a recursive select
something like that:
WITH RECURSIVE
contiguous_intervals(start, end) AS (
select start, end
from intervals
where end = (select max(end) from intervals)
UNION ALL
select i.start, i.end
from contiguous_intervals m, intervals i
where i.end = m.start
)
select * from contiguous_intervals;
You can do this with lead(), lag(). I'm not sure what the exact logic is for your case, but I think it is something like:
select r.*,
(case when (prev_to = from or prev_to is null) and
(next_from = to or next_from is null)
then 'Y'
else 'N'
end) as Contiguous
from (select r.*, lead(from) over (partition by id order by from) as next_from,
lag(to) over (partition by id order by to) as prev_to
from readings r
) r;
I have a table containing a set of tasks to perform:
Task
ID Name
1 Washing Up
2 Hoovering
3 Dusting
The user can add one or more Notes to a Note table. Each note is associated with a task:
Note
ID ID_Task Completed(%) Date
11 1 25 05/07/2013 14:00
12 1 50 05/07/2013 14:30
13 1 75 05/07/2013 15:00
14 3 20 05/07/2013 16:00
15 3 60 05/07/2013 17:30
I want a query that will select the Task ID, Name and it's % complete, which should be zero if there aren't any notes for it. The query should return:
ID Name Completed (%)
1 Washing Up 75
2 Hoovering 0
3 Dusting 60
I've really been struggling with the query for this, which I've read is a "greatest n per group" type problem, of which there are many examples on SO, none of which I can apply to my case (or at least fully understand). My intuition was to start by finding the MAX(Date) for each task in the note table:
SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task
Annoyingly, I can't just add "Complete %" to the above query unless it's contained in a GROUP clause. Argh! I'm not sure how to jump through this hoop in order to somehow get the task table rows with the column appended to it. Here is my pathetic attempt, which fails as it only returns tasks with notes and then duplicates task records at that (one for each note, so it's a complete fail).
SELECT Task.ID,
Task.Name,
Note.Complete
FROM
Task
JOIN
(SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task) AS InnerNote
ON
Task.ID = InnerNote.ID_Task
JOIN
Note
ON
Task.ID = Note.ID_Task
Can anyone help me please?
If we assume that tasks only become more complete, you can do this with a left outer join and aggregation:
select t.ID, t.Name, coalesce(max(n.complete), 0)
from tasks t left outer join
notes n
on t.id = n.id_task
group by t.id, t.name
If tasks can become "less complete" then you want the one with the last date. For this, you can use row_number():
select t.ID, t.Name, coalesce(n.complete, 0)
from tasks t left outer join
(select n.*, row_number() over (partition by id_task order by date desc) as seqnum
from notes n
) n
on t.id = n.id_task and n.seqnum = 1;
In this case, you don't need a group by, because the seqnum = 1 performs the same role.
How about this just get the max of completed and group by taskid
SELECT t.ID_Task as ID,n.`name`,MAX(t.completed) AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
OR
SELECT t.ID_Task as ID,n.`name`,
(CASE when MAX(t.completed) IS NULL THEN '0' ELSE MAX(t.completed))AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
select a.ID,
a.Name,
isnull((select completed
from Note
where ID_Task = b.ID_Task
and Date = b.date),0)
from Task a
LEFT OUTER JOIN (select ID_Task,
max(date) date
from Note
group by ID_Task) b
ON a.ID = b.ID_Task;
See DEMO here
I want to make a specific counter which will raise by one after a specific record is found in a row.
time event revenue counter
13.37 START 20 1
13.38 action A 10 1
13.40 action B 5 1
13.42 end 1
14.15 START 20 2
14.16 action B 5 2
14.18 end 2
15.10 START 20 3
15.12 end 3
I need to find out total revenue for every visit (actions between START and END). I was thinking the best way would be to set a counter like this:
so I could group events. But if you have a better solution, I would be grateful.
You can use a query similar to the following:
with StartTimes as
(
select time,
startRank = row_number() over (order by time)
from events
where event = 'START'
)
select e.*, counter = st.startRank
from events e
outer apply
(
select top 1 st.startRank
from StartTimes st
where e.time >= st.time
order by st.time desc
) st
SQL Fiddle with demo.
May need to be updated based on the particular characteristics of the actual data, things like duplicate times, missing events, etc. But it works for the sample data.
SQL Server 2012 supports an OVER clause for aggregates, so if you're up to date on version, this will give you the counter you want:
count(case when eventname='START' then 1 end) over (order by eventtime)
You could also use the latest START time instead of a counter to group by, like this:
with t as (
select
*,
max(case when eventname='START' then eventtime end)
over (order by eventtime) as timeStart
from YourTable
)
select
timeStart,
max(eventtime) as timeEnd,
sum(revenue) as totalRevenue
from t
group by timeStart;
Here's a SQL Fiddle demo using the schema Ian posted for his solution.