Count how many first and last entries in given period of time are equal - sql

Given a table structured like that:
id | news_id(fkey)| status | date
1 10 PUBLISHED 2016-01-10
2 20 UNPUBLISHED 2016-01-10
3 10 UNPUBLISHED 2016-01-12
4 10 PUBLISHED 2016-01-15
5 10 UNPUBLISHED 2016-01-16
6 20 PUBLISHED 2016-01-18
7 10 PUBLISHED 2016-01-18
8 20 UNPUBLISHED 2016-01-20
9 30 PUBLISHED 2016-01-20
10 30 UNPUBLISHED 2016-01-21
I'd like to count distinct news that, in given period time, had first and last status equal(and also status equal to given in query)
So, for this table query from 2016-01-01 to 2016-02-01 would return:
1 (with WHERE status = 'PUBLISHED') because news_id 10 had PUBLISHED in both first( 2016-01-10 ) and last row (2016-01-18)
1 (with WHERE status = 'UNPUBLISHED' because news_id 20 had UNPUBLISHED in both first and last row
notice how news_id = 30 does not appear in results, as his first/last statuses were contrary.
I have done that using following query:
SELECT count(*) FROM
(
SELECT DISTINCT ON (news_id)
news_id, status as first_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date
) first
JOIN (
SELECT DISTINCT ON (news_id)
news_id, status as last_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date DESC
) last
using (news_id)
where first_status = last_status
and first_status = 'PUBLISHED'
Now, I have to transform query into SQL our internal Java framework, unfortunately it does not support subqueries, except when using EXISTS or NOT EXISTS. I was told to transform the query to one using EXISTS clause(if it is possible) or try finding another solution. I am, however, clueless. Could anyone help me do that?
edit: As I am being told right now, the problem lies not with our framework, but in Hibernate - if I understood correctly, "you cannot join an inner select in HQL" (?)

Not sure if this adresses you problem correctly, since it is more of a workaround. But considering the following:
News need to be published before they can be "unpublished". So if you'd add 1 for each "published" and substract 1 for each "unpublished" your balance will be positive (or 1 to be exact) if first and last is "published". It will be 0 if you have as many unpublished as published and negative, if it has more unpublished than published (which logically cannot be the case but obviously might arise, since you set a date threshhold in the query where a 'published' might be occured before).
You might use this query to find out:
SELECT SUM(CASE status WHEN 'PUBLISHED' THEN 1 ELSE -1 END) AS 'publishbalance'
FROM news_events
WHERE date >= '2015-11-12 15:01:56.195'
GROUP BY news_id

First of all, subqueries are a substantial part of SQL. A framework forbidding their use is a bad framework.
However, "first" and "last" can be expressed with NOT EXISTS: where not exists an earlier or later entry for the same news_id and date range.
select count(*)
from mytable first
join mytable last on last.news_id = first.news_id
where date between #from and #to
and not exists
(
select *
from mytable before_first
where before_first.news_id = first.news_id
and before_first.date < first.date
and before_first.date >= #from
)
and not exists
(
select *
from mytable after_last
where after_last.news_id = last.news_id
and after_last.date > last.date
and after_last.date <= #to
)
and first.status = #status
and last.status = #status;

NOT EXISTS to the rescue:
SELECT ff.id ,ff.news_id ,ff.status , ff.zdate AS startdate
, ll.zdate AS enddate
FROM newsflash ff
JOIN newsflash ll
ON ff.news_id = ll.news_id
AND ff.status = ll.status
AND ff.zdate < ll.zdate
AND NOT EXISTS (
SELECT * FROM newsflash nx
WHERE nx.news_id = ff.news_id
AND nx.zdate >= '2016-01-01' AND nx.zdate < '2016-02-01'
AND (nx.zdate < ff.zdate OR nx.zdate > ll.zdate)
)
ORDER BY ff.id
;

Related

Counting ED (emergency department) visits but only one per 8 day period

I am working in MS SQL Server 2017 on counting member ED visits where a member may go months between ER visits or may have multiple consecutive days each with a visit. The rule that I am trying to calculate is this:
If a member has more than one ED visit in an 8-day period, include
only the first eligible ED visit. For example, if a member has an
eligible ED visit on January 1, include the January 1 visit and do not
include ED visits that occur on or between January 2 and January 8.
Then, if applicable, include the next eligible ED visit that occurs on
or after January 9. Identify visits chronologically, including only
one visit per 8-day period.
If the days since the last visit is NULL or >= 8 then that is always counted and I have that. The issue I am having is how to look at the running total of Days_Since_Last_Visit to find the next valid visit when there are multiple in a given 8 day period.
In the example below the rows marked in green are flagged for inclusion because the Days_Since_Last_Visit is NULL or >= 8.
The rows highlighted yellow are the ones that should be counted as the first valid visit in the next 8 day period. The bold outline shows the days that are adding up to reach the threshold of 8.
Example data with highlighted entries that should be counted
I have prepared the SQL for the example in the image hoping someone can help me get unstuck.
IF OBJECT_ID('tempdb..#Example') IS NOT NULL
DROP TABLE #Example
CREATE TABLE
#Example (
Subscriber_ID VARCHAR(16),
Member_Seq VARCHAR(2),
Measurement_Year INT,
Visit_Date DATETIME,
)
INSERT INTO #Example (Subscriber_ID,Member_Seq,Measurement_Year,Visit_Date) VALUES
('788768646','02','2019','2019-07-09'),
('788768646','02','2019','2019-08-05'),
('788768646','02','2019','2019-08-18'),
('788768646','02','2019','2019-09-13'),
('788768646','02','2019','2019-09-15'),
('788768646','02','2019','2019-09-19'),
('788768646','02','2019','2019-09-25'),
('788768646','02','2019','2019-10-14'),
('788768646','02','2019','2019-10-21'),
('788768646','02','2019','2019-10-24'),
('788768646','02','2019','2019-10-27'),
('788768646','02','2019','2019-10-28'),
('788768646','02','2019','2019-11-03'),
('788768646','02','2019','2019-11-06'),
('788768646','02','2019','2019-11-18'),
('788768646','02','2019','2019-12-11')
SELECT y.Subscriber_ID,
y.Member_Seq,
y.Measurement_Year,
y.Visit_Date,
y.Prior_Visit_Date,
y.Days_Since_Last_Visit,
CASE
WHEN Days_Since_Last_Visit >= 8 OR Days_Since_Last_Visit IS NULL THEN
'Y'
ELSE
NULL
END Include_Visist,
CASE
WHEN Days_Since_Last_Visit >= 8 OR Days_Since_Last_Visit IS NULL THEN
NULL
ELSE
SUM (CASE
WHEN Days_Since_Last_Visit >= 8 OR Days_Since_Last_Visit IS NULL THEN
NULL
ELSE
y.Days_Since_Last_Visit
END
) OVER (PARTITION BY y.Subscriber_ID, y.Member_Seq,y.Measurement_Year ORDER BY y.Visit_Date)
END Running_Total
FROM (
SELECT x.Subscriber_ID,
x.Member_Seq,
x.Measurement_Year,
x.Visit_Date,
LAG(Visit_Date) OVER (
PARTITION BY x.Subscriber_ID, x.Member_Seq, x.Measurement_Year
ORDER BY x.Visit_Date) Prior_Visit_Date,
DATEDIFF(DAY,
LAG(Visit_Date) OVER (
PARTITION BY x.Subscriber_ID, x.Member_Seq, x.Measurement_Year
ORDER BY x.Visit_Date),
x.Visit_Date) Days_Since_Last_Visit
FROM #Example x
) y
This was challenging. I added some additional test records to make sure it handled a long string of visits close together over several periods.
WITH Example as (
SELECT Subscriber_ID, Member_Seq, Measurement_Year, CAST(Visit_Date as datetime) as [Visit_Date]
FROM (
VALUES
('788768646','02','2019','2019-06-01'),
('788768646','02','2019','2019-06-09'),
('788768646','02','2019','2019-07-09'),
('788768646','02','2019','2019-08-05'),
('788768646','02','2019','2019-08-18'),
('788768646','02','2019','2019-09-13'),
('788768646','02','2019','2019-09-15'),
('788768646','02','2019','2019-09-19'),
('788768646','02','2019','2019-09-25'),
('788768646','02','2019','2019-10-14'),
('788768646','02','2019','2019-10-21'),
('788768646','02','2019','2019-10-24'),
('788768646','02','2019','2019-10-27'),
('788768646','02','2019','2019-10-28'),
('788768646','02','2019','2019-11-03'),
('788768646','02','2019','2019-11-06'),
('788768646','02','2019','2019-11-18'),
('788768646','02','2019','2019-12-11'),
('788768646','02','2020','2020-01-01'),
('788768646','02','2020','2020-01-08'),
('788768646','02','2020','2020-01-09'),
('788768646','02','2020','2020-01-16'),
('788768646','02','2020','2020-01-17'),
('788768646','02','2020','2020-01-24'),
('788768646','02','2020','2020-01-25'),
('788768699','02','2019','2019-06-06'),
('788768699','02','2019','2019-06-07'),
('788768699','02','2019','2019-07-17'),
('788768699','02','2019','2019-08-23')
) t (Subscriber_ID, Member_Seq, Measurement_Year, Visit_Date)
), AllVisits as (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Subscriber_ID ORDER BY Visit_Date) as [VisitSeq]
FROM Example
), Report as (
SELECT *, Visit_Date as [PeriodStart]
FROM AllVisits
WHERE VisitSeq = 1
UNION ALL
SELECT a.*,
-- if not in the prior period, start a new 8 day period
CASE WHEN a.Visit_Date > r.PeriodStart + 7 THEN a.Visit_Date ELSE r.PeriodStart END
FROM AllVisits a
INNER JOIN Report r
ON r.Subscriber_ID = a.Subscriber_ID AND r.[VisitSeq] + 1 = a.[VisitSeq]
)
select *,
PeriodStart + 7 AS [PeriodEnD],
CASE WHEN Visit_Date = PeriodStart THEN 1 ELSE 0 END as [IsFirstDayOfPeriod]
from Report
where Visit_Date = PeriodStart
order by Subscriber_ID, Visit_Date
The query below gets most of desired records, but not all of them. (This is where it misses a long string of close visits.) I wanted to start here, but but I could not use a subquery or a group by in a recursive part of the CTE. I would have to do something like the above, but the anchor would need both the sequence start and finish. Then recursion for each anchor record would be within it's range. I might try it someday.
SELECT *
FROM Example e
WHERE NOT EXISTS( -- those with no prior within 8 days
SELECT *
FROM Example x
WHERE x.Subscriber_ID = e.Subscriber_ID
AND x.Visit_Date < e.Visit_Date -- prior
AND x.Visit_Date > e.Visit_Date - 8 -- within 8 days
)

See the distribution of secondary requests grouped by time interval in sql

I have the following table:
RequestId,Type, Date, ParentRequestId
1 1 2020-10-15 null
2 2 2020-10-19 1
3 1 2020-10-20 null
4 2 2020-11-15 3
For this example I am interested in the request type 1 and 2, to make the example simpler. My task is to query a big database and to see the distribution of the secondary transaction based on the difference of dates with the parent one. So the result would look like:
Interval,Percentage
0-7 days,50 %
8-15 days,0 %
16-50 days, 50 %
So for the first line from teh expected result we have the request with the id 2 and for the third line from the expected result we have the request with the id 4 because the date difference fits in this interval.
How to achieve this?
I'm using sql server 2014.
We like to see your attempts, but by the looks of it, it seems like you're going to need to treat this table as 2 tables and do a basic GROUP BY, but make it fancy by grouping on a CASE statement.
WITH dateDiffs as (
/* perform our date calculations first, to get that out of the way */
SELECT
DATEDIFF(Day, parent.[Date], child.[Date]) as daysDiff,
1 as rowsFound
FROM (SELECT RequestID, [Date] FROM myTable WHERE Type = 1) parent
INNER JOIN (SELECT ParentRequestID, [Date] FROM myTable WHERE Type = 2) child
ON parent.requestID = child.parentRequestID
)
/* Now group and aggregate and enjoy your maths! */
SELECT
case when daysDiff between 0 and 7 then '0-7'
when daysDiff between 8 and 15 then '8-15'
when daysDiff between 16 and 50 THEN '16-50'
else '50+'
end as myInterval,
sum(rowsFound) as totalFound,
(select sum(rowsFound) from dateDiffs) as totalRows,
1.0 * sum(rowsFound) / (select sum(rowsFound) from dateDiffs) * 100.00 as percentFound
FROM dateDiffs
GROUP BY
case when daysDiff between 0 and 7 then '0-7'
when daysDiff between 8 and 15 then '8-15'
when daysDiff between 16 and 50 THEN '16-50'
else '50+'
end;
This seems like basically a join and group by query:
with dates as (
select 0 as lo, 7 as hi, '0-7 days' as grp union all
select 8 as lo, 15 as hi, '8-15 days' union all
select 16 as lo, 50 as hi, '16-50 days'
)
select d.grp,
count(*) as cnt,
count(*) * 1.0 / sum(count(*)) over () as raio
from dates left join
(t join
t tp
on tp.RequestId = t. ParentRequestId
)
on datediff(day, tp.date, t.date) between d.lo and d.hi
group by d.grp
order by d.lo;
The only trick is generating all the date groups, so you have rows with zero values.

SQL Query: Calculating the deltas in a time series

For a development aid project I am helping a small town in Nicaragua improving their water-network-administration.
There are about 150 households and every month a person checks the meter and charges the houshold according to the consumed water (reading from this month minus reading from last month). Today all is done on paper and I would like to digitalize the administration to avoid calculation-errors.
I have an MS Access Table in mind - e.g.:
*HousholdID* *Date* *Meter*
0 1/1/2013 100
1 1/1/2013 130
0 1/2/2013 120
1 1/2/2013 140
...
From this data I would like to create a query that calculates the consumed water (the meter-difference of one household between two months)
*HouseholdID* *Date* *Consumption*
0 1/2/2013 20
1 1/2/2013 10
...
Please, how would I approach this problem?
This query returns every date with previous date, even if there are missing months:
SELECT TabPrev.*, Tab.Meter as PrevMeter, TabPrev.Meter-Tab.Meter as Diff
FROM (
SELECT
Tab.HousholdID,
Tab.Data,
Max(Tab_1.Data) AS PrevData,
Tab.Meter
FROM
Tab INNER JOIN Tab AS Tab_1 ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID, Tab.Data, Tab.Meter) As TabPrev
INNER JOIN Tab
ON TabPrev.HousholdID = Tab.HousholdID
AND TabPrev.PrevData=Tab.Data
Here's the result:
HousholdID Data PrevData Meter PrevMeter Diff
----------------------------------------------------------
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2012 140 130 10
The query above will return every delta, for every households, for every month (or for every interval). If you are just interested in the last delta, you could use this query:
SELECT
MaxTab.*,
TabCurr.Meter as CurrMeter,
TabPrev.Meter as PrevMeter,
TabCurr.Meter-TabPrev.Meter as Diff
FROM ((
SELECT
Tab.HousholdID,
Max(Tab.Data) AS CurrData,
Max(Tab_1.Data) AS PrevData
FROM
Tab INNER JOIN Tab AS Tab_1
ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID) As MaxTab
INNER JOIN Tab TabPrev
ON TabPrev.HousholdID = MaxTab.HousholdID
AND TabPrev.Data=MaxTab.PrevData)
INNER JOIN Tab TabCurr
ON TabCurr.HousholdID = MaxTab.HousholdID
AND TabCurr.Data=MaxTab.CurrData
and (depending on what you are after) you could only filter current month:
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
DateSerial(Year(DATE()), Month(DATE()), 1)
this way if you miss a check for a particular household, it won't show.
Or you might be interested in showing last month present in the table (which can be different than current month):
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
(SELECT MAX(DateSerial(Year(Data), Month(Data), 1))
FROM Tab)
(here I am taking in consideration the fact that checks might be on different days)
I think the best approach is to use a correlated subquery to get the previous date and join back to the original table. This ensures that you get the previous record, even if there is more or less than a 1 month lag.
So the right query looks like:
select t.*, tprev.date, tprev.meter
from (select t.*,
(select top 1 date from t t2 where t2.date < t.date order by date desc
) prevDate
from t
) join
t tprev
on tprev.date = t.prevdate
In an environment such as the one you describe, it is very important not to make assumptions about the frequency of reading the meter. Although they may be read on average once per month, there will always be exceptions.
Testing with the following data:
HousholdID Date Meter
0 01/12/2012 100
1 01/12/2012 130
0 01/01/2013 120
1 01/01/2013 140
0 01/02/2013 120
1 01/02/2013 140
The following query:
SELECT a.housholdid,
a.date,
b.date,
a.meter,
b.meter,
a.meter - b.meter AS Consumption
FROM (SELECT *
FROM water
WHERE Month([date]) = Month(Date())
AND Year([date])=year(Date())) a
LEFT JOIN (SELECT *
FROM water
WHERE DateSerial(Year([date]),Month([date]),Day([date]))
=DateSerial(Year(Date()),Month(Date())-1,Day([date])) ) b
ON a.housholdid = b.housholdid
The above query selects the records for this month Month([date]) = Month(Date()) and compares them to records for last month ([date]) = Month(Date()) - 1)
Please do not use Date as a field name.
Returns the following result.
housholdid a.date b.date a.meter b.meter Consumption
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2013 140 130 10
Try
select t.householdID
, max(s.theDate) as billingMonth
, max(s.meter)-max(t.meter) as waterUsed
from myTbl t join (
select householdID, max(theDate) as theDate, max(meter) as meter
from myTbl
group by householdID ) s
on t.householdID = s.householdID and t.theDate <> s.theDate
group by t.householdID
This works in SQL not sure about access
You can use the LAG() function in certain SQL dialects. I found this to be much faster and easier to read than joins.
Source: http://blog.jooq.org/2015/05/12/use-this-neat-window-function-trick-to-calculate-time-differences-in-a-time-series/

Query Based on Current Status of a Sub Item

Trying to write a query to display the current status of services in a company department. What I have written is just not working. I feel like I am going in the wrong direction.
SELECT MAX(v_StatusEvents.EventTimeStamp) as EventTimeStamp
, MAX(v_StatusEvents.StatusTypeID) as StatusTypeID
, v_StatusEvents.ServiceID
, v_StatusEvents.StatusTypeDescription
, v_StatusEvents.ServiceName
, v_StatusEvents.CategoryName
FROM v_StatusEvents
WHERE v_StatusEvents.CategoryID = 100
AND YEAR(v_StatusEvents.EventTimeStamp) = YEAR(getdate())
AND MONTH(v_StatusEvents.EventTimeStamp) = MONTH(getdate())
AND DAY(v_StatusEvents.EventTimeStamp) = DAY(getdate())
GROUP BY v_StatusEvents.ServiceID
, v_StatusEvents.StatusTypeDescription
, v_StatusEvents.ServiceName
, v_StatusEvents.CategoryName
I have three CATEGORIES: (100 - Internet, 101 - Applications, and 102 - Network).
Each CATEGORY contains SERVICES.
As an example, I have three SERVICES that belong to the CATEGORY Interenet: (50 - Internal, 51 - External, 52 - Development).
Each SERVICE will always have at least one status record for the current date.
The CURRENT STATUS will be set to one of three different STATUS TYPES values: 1 = no issue, 2 = disruption, 3 = critical.
I want to show the highest STATUS TYPE for each category for today.
Here is a sample record set for today's date.
SeID CatID EventTimeStamp SvcID StatTypeID
201 100 11/11/2012 12:01am 52 1
202 100 11/11/2012 12:01am 51 1
203 100 11/11/2012 12:01am 50 1
204 100 11/11/2012 08:00am 51 3
205 100 11/11/2012 10:50am 50 2
206 100 11/11/2012 11:00am 50 1
207 100 11/11/2012 11:25am 52 2
As you can see, there was a disruption problem with the Internal web site at 10:50m, but it was resolved at 11:00am.
There is an ongoing critical issue with the External web site that has not yet been resolved. I would like the for the query to return the value 3 because this is the highest CURRENT STATUS for a SERVICE that has not been resolved.
(If all services had "no issue", I would expect the query to return the value 1)
Thanks,
crjunk
This is where I find CTEs (Common Table Expressions) useful. They allow you to break the problem apart into steps that you can easily solve. Let's apply that here.
First, get the max status for each service/day:
SELECT CatID, SvcID, MAX(StatTypeID) As MaxStatus
FROM v_StatusEvents
WHERE EventTimeStamp >= cast(cast(current_timestamp as Date) as datetime)
GROUP BY CatID, SvcID
Now that we have this information, we can find that most recent time today that each of these events occured:
WITH StatusInfo As
(
SELECT CatID, SvcID, MAX(StatTypeID) As MaxStatus
FROM v_StatusEvents
WHERE EventTimeStamp >= cast(cast(current_timestamp as Date) as datetime)
GROUP BY CatID, SvcID
)
SELECT se.CatID, se.SvcID, se.StatTypeID, MAX(EventTimeStamp) As EventTimeStamp
FROM v_StatusEvents se
INNER JOIN StatusInfo si ON se.CatID = si.CatID AND se.SvcID = si.SvcID AND se.StatTypeID = si.MaxStatus
WHERE EventTimeStamp >= cast(cast(current_timestamp as Date) as datetime)
You might choose instead to use the sequence ID to narrow it down here, in case you could have two events with the same timestamp for a service. Now that we have this information, we can go back to the table one more time to pick up any other fields we might want (in this case, sequence ID):
WITH StatusInfo As
(
SELECT CatID, SvcID, MAX(StatTypeID) As MaxStatus
FROM v_StatusEvents
WHERE EventTimeStamp >= cast(cast(current_timestamp as Date) as datetime)
GROUP BY CatID, SvcID
), StatusAndTimeInfo As
(
SELECT se.CatID, se.SvcID, se.StatTypeID, MAX(EventTimeStamp) As EventTimeStamp
FROM v_StatusEvents se
INNER JOIN StatusInfo si ON se.CatID = si.CatID AND se.SvcID = si.SvcID AND se.StatTypeID = si.MaxStatus
WHERE EventTimeStamp >= cast(cast(current_timestamp as Date) as datetime)
)
SELECT se.*
FROM v_StatusEvents se
INNER JOIN StatusAndTimeInfo sati ON se.CatID = sati.CatID AND se.SvcID = sati.SvcID AND se.StatTypeID = sati.StatTypeID AND se.EventTimeStamp = sati.EventTimeStamp
WHERE EventTimeStamp >= cast(cast(current_timestamp as Date) as datetime)
Note again that you might prefer to use the SeID (which I presume is a sequence ID) on this last iteration rather than timestamp. Note also that this is NOT the only way to solve this problem, or even likely the fastest. In fact, it would be possible to re-write this using only subqueries or joins. But this is any easy method you can use to get something that works, and can be easily understood later.
Ended up coming with the following solution:
SELECT (IsNull(MAX(tblStatusTypes.StatusTypeImgURL), (SELECT tblStatusTypes.StatusTypeImgURL FROM tblStatusTypes WHERE tblStatusTypes.StatusTypeID = 1)))
FROM tblStatusTypes
WHERE tblStatusTypes.StatusTypeID in (
SELECT MAX(StatusTypeID)
FROM
( SELECT vse.StatusTypeID, vse.ServiceID, vse.EventTimeStamp
FROM v_StatusEvents vse,
(SELECT MAX(EventTimeStamp)AS MaxDate
,ServiceID
FROM v_StatusEvents
WHERE EventTimeStamp >= Cast(Cast(CURRENT_TIMESTAMP AS DATE) AS DATETIME)
AND CategoryID = #CategoryID
GROUP BY ServiceID) MaxResults
WHERE vse.ServiceID = MaxResults.ServiceID
AND vse.EventTimeStamp = MaxResults.MaxDate
) MaxStatusType )

Count records with a criteria like "within days"

I have a table as below on sql.
OrderID Account OrderMethod OrderDate DispatchDate DispatchMethod
2145 qaz 14 20/3/2011 23/3/2011 2
4156 aby 12 15/6/2011 25/6/2011 1
I want to count all records that have reordered 'within 30 days' of dispatch date where Dispatch Method is '2' and OrderMethod is '12' and it has come from the same Account.
I want to ask if this all can be achieved with one query or do I need to create different tables and do it in stages as I think I wll have to do now? Please can someone help with a code/query?
Many thanks
T
Try the following, replacing [tablename] with the name of your table.
SELECT Count(OriginalOrders.OrderID) AS [Total_Orders]
FROM [tablename] AS OriginalOrders
INNER JOIN [tablename] AS Reorders
ON OriginalOrders.Account = Reorders.Account
AND OriginalOrders.OrderDate < Reorders.OrderDate
AND DATEDIFF(day, OriginalOrders.DispatchDate, Reorders.OrderDate) <= 30
AND Reorders.DispatchMethod = '2'
AND Reorders.OrderMethod = '12';
By using an inner join you'll be sure to only grab orders that meet all the criteria.
By linking the two tables (which are essentially the same table with itself using aliases) you make sure only orders under the same account are counted.
The results from the join are further filtered based on the criteria you mentioned requiring only orders that have been placed within 30 days of the dispatch date of a previous order.
Totally possible with one query, though my SQL is a little stale..
select count(*) from table
where DispatchMethod = 2
AND OrderMethod = 12
AND DATEDIFF(day, OrderDate, DispatchDate) <= 30;
(Untested, but it's something similar)
One query can do it.
SELECT COUNT(*)FROM myTable reOrder
INNER JOIN myTable originalOrder
ON reOrder.Account = originalOrder.Account
AND reOrder.OrderID <> originalOrder.OrderID
-- all re-orders that are within 30 days or the
-- original orders dispatch date
AND DATEDIFF(d, originalOrder.DispatchDate, reOrder.OrderDate) <= 30
WHERE reOrder.DispatchMethod = 2
AND reOrder.OrderMethod = 12
You need a self-join.
The query below assumes that a given account will have either 1 or 2 records in the table - 2 if they've reordered, else 1.
If 3 records exist for a given account, 2 orders + 1 reorder then this won't work - but we'd then need more information on how to distinguish between an order and a reorder.
SELECT COUNT(*) FROM myTable new, myTable prev
WHERE new.DispatchMethod = 2
AND new.OrderMethod = 12
AND DATEDIFF(day, prev.DispatchDate, new.OrderDate) <=30
AND prev.Account == new.Account
AND prev.OrderDate < new.OrderDate
Can we use GROUP BY in this case, such as the following?
SELECT COUNT(Account)
FROM myTable
WHERE DispatchMethod = 2 AND OrderMethod = 12
AND DATEDIFF(d, DispatchDate, OrderDate) <=30
GROUP BY Account
Will the above work or am I missing something here?