Retrieve data if next line of data equals a particular value - sql

I am very new to SQL and I need some assistance with a query.
I am writing a script which is reviewing a log file. Basically the query is retrieving the instance of when a particular status occurred. This is working as expected however I would like to now add a new condition which states that only if the immediate value after this value equals 'Accepted' or 'Attended'. How would I do this. I have pasted the current script below and commented in italics where I think this condition should be. Any help would be greatly appreciated!
WITH Test AS
(
Select j.jobcode, min(log.timestamp) as 'Time First Assigned'
from Job J
inner join JobLog Log
on J.JobID = Log.JobID
and log.JobStatusID = 'Assigned' *-- and record after this equals accepted or attended*
where j.CompletionDate >= #Start_date
and j.CompletionDate < #End_date
Group by j.jobcode
)

I recommend lead(), but using it in a subquery on one table:
with test as (
select j.jobcode, min(log.timestamp) as time_first_assigned
from Job j join
(select jl.*,
lead(jl.JobStatusID) over (partition by jl.jobid order by jl.timestamp) as next_status
from JobLog jl
) jl
on J.JobID = Log.JobID
where jl.JobStatusID = 'Assigned' and
jl.next_JobStatusID in ('accepted', 'attended') and
j.CompletionDate >= #Start_date and
j.CompletionDate < #End_date
group by j.jobcode
)
In particular, this enables the optimizer to use an index on JobLog(jobid, timestamp, JobStatusId) for the lead(). That said, this will not always improve performance, particularly if the filter on the CompletionDate filters out most rows.

You can use the LEAD windows function as follows:
Select jobcode, min(ts) as 'Time First Assigned' from
(select j.jobcode, log.timestamp as ts, JobStatusID ,
lead(log.JobStatusID)
over (partition by Log.JobID order by Log.timestamp) as lead_statusid
from Job J
inner join JobLog Log on J.JobID = Log.JobID
where j.CompletionDate >= #Start_date and j.CompletionDate < #End_date
) t
where JobStatusID = 'Assigned' and lead_statusid in ('accepted', 'attended')
Group by jobcode

Thank you very much.
I used Gordon's suggested code and once I changed the values to the names I used in my code I can confirm that it works.
I did look at the Lead function however I didn't know how to apply it.
Again thanks to everyone for helping with my query.

Related

SQL Rowwise comparison between groups

Question
The following is a snippet of my data:
Create Table Emps(person VARCHAR(50), started DATE, stopped DATE);
Insert Into Emps Values
('p1','2015-10-10','2016-10-10'),
('p1','2016-10-11','2017-10-11'),
('p1','2017-10-12','2018-10-13'),
('p2','2019-11-13','2019-11-13'),
('p2','2019-11-14','2020-10-14'),
('p3','2020-07-15','2021-08-15'),
('p3','2021-08-16','2022-08-16');
db<>fiddle.
I want to use T-SQL to get a count of how many persons fulfil the following criteria at least once - multiples should also count as one:
For a person:
One of the dates in 'started' (say s1) is larger than at least one of the dates in 'ended' (say e1)
s1 and e1 are in the same year, to be set manually - e.g. '2021-01-01' until '2022-01-01'
Example expected response
If I put the date range '2016-01-01' until '2017-01-01' somewhere in a WHERE / HAVING clause, the output should be 1 as only p1 has both a start date and an end date that fall in 2016 where the start date is larger than the end date:
s1 = '2016-10-11', and e1 = '2016-10-10'.
Why can't I do this myself
The reason I'm stuck is that I don't know how to do this rowwise comparison between groups. The question requires comparing values across columns (start with end) across rows, within a person ID.
Use conditional aggregation to get the maximum start date and the minimum stop date in the given range.
select person
from emps
group by person
having max(case when started >= '2016-01-01' and started < '2017-01-01'
then started end) >
min(case when stopped >= '2016-01-01' and stopped < '2017-01-01'
then stopped end);
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=45adb153fcac9ce72708f1283cac7833
I would choose to use a self-outer-join with an exists correlation, it should be pretty much the most performant, all things being equal.
select Count(*)
from emps e
where exists (
select * from emps e2
where e2.person = e.person
and e2.stopped > e.started
and e.started between '20160101' and '20170101'
and e2.started between '20160101' and '20170101'
);
You said you plan to set the dates manually, so this works where we set the start date in one CTE, and the end date in another CTE. Then we calculate the min/max for each, and use that criteria in the query where statement.
with min_max_start as (
select person,
min(started) as min_start, --obsolete
max(started) as max_start
from emps
where started >= '2016-01-01'
group by person
),
min_max_end as (
select person,
min(stopped) as min_stop,
max(stopped) as max_stop --obsolete
from emps
where stopped < '2017-01-01'
group by person
)
select count(distinct e.person)
from emps e
join min_max_start mms
on e.person = mms.person
join min_max_end mme
on e.person = mme.person
where mms.max_start> mme.min_stop
Output: 1
Try the following:
With CTE as
(
Select D.person, D.started, T.stopped,
case
when Year(D.started) = Year(T.stopped) and D.started > T.stopped
then 1
else 0
end as chk
From
(Select person, started From Emps Where started >= '2016-01-01') D
Join
(Select person, stopped From Emps Where stopped <= '2017-01-01') T
On D.person = T.person
)
Select Count(Distinct person) as CNT
From CTE
Where chk = 1;
To get the employee list who met the criteria use the following on the CTE instead of the above Select Count... query:
Select person, started, stopped
From CTE
Where chk = 1;
See a demo from db<>fiddle.

Is this simple SQL query correct?

The query below is pretty self-explanatory, and although I'm not good at SQL, I can't find anything wrong with it. However, the number it yields in not in accordance with my gut feeling and I would like it double-checked, if this is appropriate for StackOverflow.
I'm simply trying to get the number of users that joined my website in 2020, and also made a payment in 2020. I'm trying to figure out "new revenue".
This is the query:
SELECT Count(DISTINCT( auth_user.id )) AS "2020"
FROM auth_user
JOIN subscription_transaction
ON ( subscription_transaction.event = 'one-time payment'
AND subscription_transaction.user_id = auth_user.id
AND subscription_transaction.timestamp >= '2020-01-01'
AND subscription_transaction.timestamp <= '2020-12-31' )
WHERE auth_user.date_joined >= '2020-01-01'
AND auth_user.date_joined <= '2020-12-31';
I use PostgreSQL 10.
Thanks in advance!
I would write the query using EXISTS to get rid of the COUNT(DISTINCT):
SELECT count(*) AS "2020"
FROM auth_user au
WHERE au.date_joined >= '2020-01-01' AND
au.date_joined < '2021-01-01' AND
EXISTS (SELECT 1
FROM subscription_transaction st
WHERE st.event = 'one-time payment' AND
st.user_id = au.id AND
st.timestamp >= '2020-01-01' AND
st.timestamp < '2021-01-01'
) ;
This should be faster than your version. However, the results should be the same.

SQL Query to show order of work orders

First off sorry for the poor subject line.
EDIT: The Query here duplicates OrderNumbers I am needing the query to NOT duplicate OrderNumbers
EDIT: Shortened the question and provided a much cleaner question
I have a table that has a record of all of the work orders that have been performed. there are two types of orders. Installs and Trouble Calls. My query is to find all of the trouble calls that have taken place within 30 days of an install and match that trouble call (TC) to the proper Install (IN). So the Trouble Call date has to happen after the install but no more than 30 days after. Additionally if there are two installs and two trouble calls for the same account all within 30 days and they happen in order the results have to reflect that. The problem I am having is I am getting an Install order matching to two different Trouble Calls (TC) and a Trouble Call(TC) that is matching to two different Installs(IN)
In the example on SQL Fiddle pay close attention to the install order number 1234567810 and the Trouble Call order number 1234567890 and you will see the issue I am having.
http://sqlfiddle.com/#!3/811df/8
select b.accountnumber,
MAX(b.scheduleddate) as OriginalDate,
b.workordernumber as OriginalOrder,
b.jobtype as OriginalType,
MIN(a.scheduleddate) as NewDate,
a.workordernumber as NewOrder,
a.jobtype as NewType
from (
select workordernumber,accountnumber,jobtype,scheduleddate
from workorders
where jobtype = 'TC'
) a join
(
select workordernumber,accountnumber,jobtype,scheduleddate
from workorders
where jobtype = 'IN'
) b
on a.accountnumber = b.accountnumber
group by b.accountnumber,
b.scheduleddate,
b.workordernumber,
b.jobtype,
a.accountnumber,
a.scheduleddate,
a.workordernumber,
a.jobtype
having MIN(a.scheduleddate) > MAX(b.scheduleddate) and
DATEDIFF(day,MAX(b.scheduleddate),MIN(a.scheduleddate)) < 31
Example of what I am looking for the results to look like.
Thank you for any assistance you can provide in setting me on the correct path.
You were actually very close. I realized that what you really want is the MIN() TC date that is greater than each install date for that account number so long as they are 30 days or less apart.
So really you need to group by the install dates from your result set excluding WorkOrderNumbers still. Something like:
SELECT a.AccountNumber, MIN(a.scheduleddate) TCDate, b.scheduleddate INDate
FROM
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'TC'
) a
INNER JOIN
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'IN'
) b
ON a.AccountNumber = b.AccountNumber
WHERE b.ScheduledDate < a.ScheduledDate
AND DATEDIFF(DAY, b.ScheduledDate, a.ScheduledDate) <= 30
GROUP BY a.AccountNumber, b.AccountNumber, b.ScheduledDate
This takes care of the dates and AccountNumbers, but you still need the WorkOrderNumbers, so I joined the workorders table back twice, once for each type.
NOTE: I assume that each workorder has a unique date for each account number. So, if you have workorder 1 ('TC') for account 1 done on '1/1/2015' and you also have workorder 2 ('TC') for account 1 done on '1/1/2015' then I can't guarantee that you will have the correct WorkOrderNumber in your result set.
My final query looked like this:
SELECT
aggdata.AccountNumber, inst.workordernumber OriginalWorkOrderNumber, inst.JobType OriginalJobType, inst.ScheduledDate OriginalScheduledDate,
tc.WorkOrderNumber NewWorkOrderNumber, tc.JobType NewJobType, tc.ScheduledDate NewScheduledDate
FROM (
SELECT a.AccountNumber, MIN(a.scheduleddate) TCDate, b.scheduleddate INDate
FROM
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'TC'
) a
INNER JOIN
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'IN'
) b
ON a.AccountNumber = b.AccountNumber
WHERE b.ScheduledDate < a.ScheduledDate
AND DATEDIFF(DAY, b.ScheduledDate, a.ScheduledDate) <= 30
GROUP BY a.AccountNumber, b.AccountNumber, b.ScheduledDate
) aggdata
LEFT OUTER JOIN workorders tc
ON aggdata.TCDate = tc.ScheduledDate
AND aggdata.AccountNumber = tc.AccountNumber
AND tc.JobType = 'TC'
LEFT OUTER JOIN workorders inst
ON aggdata.INDate = inst.ScheduledDate
AND aggdata.AccountNumber = inst.AccountNumber
AND inst.JobType = 'IN'
select in1.accountnumber,
in1.scheduleddate as OriginalDate,
in1.workordernumber as OriginalOrder,
'IN' as OriginalType,
tc.scheduleddate as NewDate,
tc.workordernumber as NewOrder,
'TC' as NewType
from
workorders in1
out apply (Select min(in2.scheduleddate) as scheduleddate from workorders in2 Where in2.jobtype = 'IN' and in1.accountnumber=in2.accountnumber and in2.scheduleddate>in1.scheduleddate) ins
join workorders tc on tc.jobtype = 'TC' and tc.accountnumber=in1.accountnumber and tc.scheduleddate>in1.scheduleddate and (ins.scheduleddate is null or tc.scheduleddate<ins.scheduleddate) and DATEDIFF(day,in1.scheduleddate,tc.scheduleddate) < 31
Where in1.jobtype = 'IN'

Having difficulty writing sub-query

I am a beginner level with HiveQL, I am trying to write a faster, more efficient query but am having trouble with it. Can someone help me rewrite this query? Any tips you can provide for improving my queries would be appreciated as well.
select "AUDIOONLYtopctrbyweek37Q32015", weekofyear(day),op.order_id,oppty_amount, mv.order_start_date, mv.order_end_date, count(distinct rdz.listener_id) as listeners, sum(impressions) , sum(clicks), (sum(clicks)/sum(impressions)) as ctr, sum(oline_net_amount)
from ROLLUP_PST rdz
join dfp2ss mv on (rdz.order_id = mv.dfp_order_id)
join oppty_order_oline op on (mv.order_id = op.order_id)
where day >= '2015-09-07'
and day <= '2015-09-13'
and creative_size in ('2000x132','134x1285','2000x114')
group by "AUDIOONLYtopctrbyweek37Q32015", weekofyear(day),op.order_id,oppty_amount, mv.order_start_date, mv.order_end_date
order by ctr desc
limit 150;
Please try the below modified query. It will work for you.
select "AUDIOONLYtopctrbyweek37Q32015",week_of_year,order_id,oppty_amount,order_start_date,order_end_date, count(distinct listener_id) over (partition by "AUDIOONLYtopctrbyweek37Q32015",week_of_year,order_id,oppty_amount,order_start_date,order_end_date) from (select "AUDIOONLYtopctrbyweek37Q32015", weekofyear(day) as week_of_year,op.order_id as order_id,
oppty_amount, mv.order_start_date as order_start_date, mv.order_end_date as order_end_date,rdz.listener_id as listener_id
from
ROLLUP_PST rdz,
dfp2ss mv,
oppty_order_oline op where rdz.order_id = mv.dfp_order_id and mv.order_id = op.order_id and day >= '2015-09-07' and day <= '2015-09-13'
and creative_size in ('2000x132','134x1285','2000x114')) z

SQL Query to Return Count Based on Time

I have an interesting problem and am unsure how to write a query to solve it. Say I have a table named "Cars". It has two columns, CarId (int PK), and ArrivalTime (datetime). As cars enter a space, the arrival time is entered.
What I need to know is this: for each car, how many entered the space in the 24 time period prior to it's arrival.
I'd like to write this without using a cursor, but don't know how I can do it. Any SQL gurus out there with an idea?
Oh - I should mention that the SQL Server version being used is 2005.
You could use a query with either a correlated subquery or a join.
Here's an example of query using a join operation:
SELECT n.CarId
, n.ArrivalDate
, COUNT(p.CarId) AS cnt_previous_arrivals
FROM cars n
LEFT
JOIN cars p
ON p.CarId = n.CarId
AND p.ArrivalDate >= DATEADD(HOUR,-24,n.ArrivalDate)
AND p.ArrivalDate < n.ArrivalDate
GROUP
BY n.CarId
, n.ArrivalDate
To get an equivalent result with correlated subquery, one option:
SELECT n.CarId
, n.ArrivalDate
, ( SELECT SUM(1)
FROM cars p
WHERE p.CarId = n.CarId
AND p.ArrivalDate >= DATEADD(HOUR,-24,n.ArrivalDate)
AND p.ArrivalDate < n.ArrivalDate
) AS cnt_previous_arrivals
FROM cars n
ORDER
BY n.CarId
, n.ArrivalDate
select el1.car_id,
( select count(*)
from entry_log el2
where el2.datetime between DATEADD(day, -1, el1.datetime)
and el1.datetime
and el2.car_id != el1. car_id
)
from entry_log el1
where el1.car_id = :my_car_id