How to find N Consecutive records in a table using SQL - sql-server-2005

I have the following Table definition with sample data. In the following table, Customer Product & Date are key fields
Table One
Customer Product Date SALE
X A 01/01/2010 YES
X A 02/01/2010 YES
X A 03/01/2010 NO
X A 04/01/2010 NO
X A 05/01/2010 YES
X A 06/01/2010 NO
X A 07/01/2010 NO
X A 08/01/2010 NO
X A 09/01/2010 YES
X A 10/01/2010 YES
X A 11/01/2010 NO
X A 12/01/2010 YES
In the above table, I need to find the N or > N consecutive records where there was no sale, Sale value was 'NO'
For example, if N is 2, the the result set would return the following
Customer Product Date SALE
X A 03/01/2010 NO
X A 04/01/2010 NO
X A 06/01/2010 NO
X A 07/01/2010 NO
X A 08/01/2010 NO
Can someone help me with a SQL query to get the desired results. I am using SQL Server 2005. I started playing using ROW_NUMBER() AND PARTITION clauses but no luck.
Thanks for any help

You need to match your table against itself, as if there where 2 tables. So you use two aliases, o1 and o2 to refer to your table:
SELECT DISTINCT o1.customer, o1.product, o1.datum, o1.sale
FROM one o1, one o2
WHERE (o1.datum = o2.datum-1 OR o1.datum = o2.datum +1)
AND o1.sale = 'NO'
AND o2.sale = 'NO';
customer | product | datum | sale
----------+---------+------------+------
X | A | 2010-01-03 | NO
X | A | 2010-01-04 | NO
X | A | 2010-01-06 | NO
X | A | 2010-01-07 | NO
X | A | 2010-01-08 | NO
Note that I performed the query on an postgresql database - maybe the syntax differs on ms-sql-server, maybe at the alias 'FROM one AS o1' perhaps, and maybe you cannot add/substract in that way.

A different approach, inspired by munchs last line.
Get - for a given date the first date with YES later than that, and the last date with YES earlier than that. These form the boundary, where our dates shall fit in.
SELECT (o1.datum),
MAX (o3.datum) - MIN (o2.datum) AS diff
FROM one o1, one o2, one o3
WHERE o1.sale = 'NO'
AND o3.datum <
(SELECT MIN (datum)
FROM one
WHERE datum >= o1.datum
AND SALE = 'YES')
AND o2.datum >
(SELECT MAX (datum)
FROM one
WHERE datum <= o1.datum
AND SALE = 'YES')
GROUP BY o1.datum
HAVING MAX (o3.datum) - MIN (o2.datum) >= 2
ORDER BY o1.datum;
Maybe it needs some kind of optimization, because table one is 5 times involved in the query. :)

Thanks to everyone for posting your solution. Thought, I would also share my solution with everyone. Just as an FYI, I received this solution from another SQL Server Central forum member. I am definitely not going to take credit for this solution.
DECLARE #CNT INT
SELECT #CNT = 3
SELECT * FROM
(
SELECT
[Customer], [Product], [Date], [Sale], groupID,
COUNT(*) OVER (PARTITION BY [Customer], [Product], [Sale], groupID) AS groupCnt
FROM
(
SELECT
[Customer], [Product], [Date], [Sale],
ROW_NUMBER() OVER (PARTITION BY [Customer], [Product] ORDER BY [Date])
- ROW_NUMBER() OVER (PARTITION BY [Customer], [Product], [Sale] ORDER BY [Date]) AS groupID
FROM
[TableSales]
) T1
) T2
WHERE
T2.[Sale] = 'NO' AND T2.[groupCnt] >= #CNT

Ok, we need a variable answer. We search for a date, where we have N following dates, all with the sale-field being NO.
SELECT d1.datum
FROM one d1, one d2, i
WHERE d1.sale = 'NO' AND d2.sale = 'NO'
AND d1.datum = (d2.datum - i)
AND i > 0 AND i < 4
GROUP BY d1.datum
HAVING COUNT (*) = 3;
This will give us the date, which we use for subquerying.
Notes:
I used 'datum' instead of date, because date is a reserved keyword on postgresql.
In Oracle you can use a virtual table dummy, which contains anything you ask for, like 'SELCT foo FROM dual WHERE foo in (1, 2, 3);' which will give you 1, 2, 3, if I remember correctly. Depending on the vendor, there might be other tricks to get a sequence 1 to N. I created a table i with column i, and filled it with the values 1 to 100, and I expect N not to exceed 100; Since a few versions, postgresql contains a function 'generate_series (from, to) which would solve the problem too, and might have similarities with solutions for your specific database. But table i should work vendor independent.
if N == 17, you have to modify 3 places from 3 to 17.
The final query will be:
SELECT o4.*
FROM one o3, one o4
WHERE o3.datum = (
SELECT d1.datum
FROM one d1, one d2, i
WHERE d1.sale = 'NO' AND d2.sale = 'NO'
AND d1.datum = (d2.datum - i)
AND i > 0 AND i <= 3
GROUP BY d1.datum
HAVING COUNT (*) = 3)
AND o4.datum <= o3.datum + 3
AND o4.datum >= o3.datum;
customer | product | datum | sale
----------+---------+------------+------
X | A | 2010-02-06 | NO
X | A | 2010-02-07 | NO
X | A | 2010-02-08 | NO
X | A | 2010-02-09 | NO

Related

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.
Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.
Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

Combining Two Tables & Summing REV amts by Mth

Below are my two tables of data
Acct BillingDate REV
101 01/05/2018 5
101 01/30/2018 4
102 01/15/2018 2
103 01/4/2018 3
103 02/05/2018 2
106 03/06/2018 5
Acct BillingDate Lease_Rev
101 01/15/2018 2
102 01/16/2018 1
103 01/19/2018 2
104 02/05/2018 3
105 04/02/2018 1
Desired Output
Acct Jan Feb Mar Apr
101 11
102 3
103 5 2
104 3
105 1
106 5
My SQL Script is Below:
SELECT [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,SUM(case when [NewSalesHistory].[billingdate] between '6/1/2016' and '6/30/2016' then REV else 0 end ) + [X].[Jun-16] AS 'Jun-16'
FROM [NewSalesHistory]
FULL join (SELECT [Account]
,SUM(case when [BWLease].[billingdate] between '6/1/2016' and '6/30/2016' then Lease_REV else 0 end ) as 'Jun-16'
FROM [AirgasPricing].[dbo].[BWLease]
GROUP BY [Account]) X ON [NewSalesHistory].[Account] = [X].[Account]
GROUP BY [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,[X].[Jun-16]
I am having trouble combining these tables. If there is a rev amt and lease rev amt then it will combine (sum) for that account. If there is not a lease rev amt (which is the majority of the time), it brings back NULLs for all other rev amts accounts in Table 1. Table one can have duplicate accounts with different Rev, while the Table two is one unique account only w Lease rev. The output above is how I would like to see the data.
What am I missing here? Thanks!
I would suggest union all and group by:
select acct,
sum(case when billingdate >= '2016-01-01' and billingdate < '2016-02-01' then rev end) as rev_201601,
sum(case when billingdate >= '2016-02-01' and billingdate < '2016-03-01' then rev end) as rev_201602,
. . .
from ((select nsh.acct, nsh.billingdate, nsh.rev
from NewSalesHistory
) union all
(select bl.acct, bl.billingdate, bl.rev
from AirgasPricing..BWLease bl
)
) x
group by acct;
Okay, so there are a few things going on here:
1) As Gordon Linoff mentioned you can perform a union all on the two tables. Be sure to limit your column selections and name your columns appropriately:
select
x as consistentname1,
y as consistentname2,
z as consistentname3
from [NewSalesHistory]
union all
select
a as consistentname1,
b as consistentname2,
c as consistentname3
from [BWLease]
2) Your desired result contains a pivoted month column. Generate a column with your desired granularity on the result of the union in step one. F.ex. months:
concat(datepart(yy, Date_),'-',datename(mm,Date_)) as yyyyM
Then perform aggregation using a group by:
select sum(...) as desiredcolumnname
...
group by PK1, PK2, yyyyM
Finally, PIVOT to obtain your result: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
3) If you have other fields/columns that you wish to present then you first need to determine whether they are measures (can be aggregated) or are dimensions. That may be best addressed in a follow up question after you've achieved what you set out for in this part.
Hope it helps
As an aside, it seems like you are preparing data for reporting. Performing these transformations can be facilitated using a GUI such as MS Power Query. As long as your end goal is not data manipulation in the DB itself, you do not need to resort to raw sql.

SQL Server: Finding date given EndDate and # Days, excluding days from specific date ranges

I have a TableA in a database similar to the following:
Id | Status | Start | End
1 | Illness | 2013-04-02 | 2013-04-23
2 | Illness | 2013-05-05 | 2014-01-01
3 | Vacation | 2014-02-01 | 2014-03-01
4 | Illness | 2014-03-08 | 2014-03-09
5 | Vacation | 2014-05-05 | NULL
Imagine it's keeping track of a specific user's "Away" days. Given the following Inputs:
SomeEndDate (Date),
NumDays (Integer)
I want to find the SomeStartDate (Date) that is Numdays non-illness days from EndDate. In other words, say I am given a SomeEndDate value '2014-03-10' and a NumDays value of 60; the matching SomeStartDate would be:
2014-03-10 to 2014-03-09 = 1
2014-03-08 to 2014-01-01 = 57
2013-05-05 to 2013-05-03 = 2
So, at 60 non-illness days, we get a SomeStartDate of '2013-05-03'. IS there any easy way to accomplish this in SQL? I imagine I could loop each day, check whether or not it falls into one of the illness ranges, and increment a counter if not (exiting the loop after counter = #numdays)... but that seems wildly inefficient. Appreciate any help.
Make a Calendar table that has a list of all the dates you will ever care about.
SELECT MIN([date])
FROM (
SELECT TOP(#NumDays) [date]
FROM Calendar c
WHERE c.Date < #SomeEndDate
AND NOT EXISTS (
SELECT 1
FROM TableA a
WHERE c.Date BETWEEN a.Start AND a.END
AND Status = 'Illness'
)
ORDER BY c.Date
) t
The Calendar table method lets you also easily exclude holidays, weekends, etc.
SQL Server 2012:
Try this solution:
DECLARE #NumDays INT = 70, #SomeEndDate DATE = '2014-03-10';
SELECT
[RangeStop],
CASE
WHEN RunningTotal_NumOfDays <= #NumDays THEN [RangeStart]
WHEN RunningTotal_NumOfDays - Current_NumOfDays <= #NumDays THEN DATEADD(DAY, -(#NumDays - (RunningTotal_NumOfDays - Current_NumOfDays))+1, [RangeStop])
END AS [RangeStart]
FROM (
SELECT
y.*,
DATEDIFF(DAY, y.RangeStart, y.RangeStop) AS Current_NumOfDays,
SUM( DATEDIFF(DAY, y.RangeStart, y.RangeStop) ) OVER(ORDER BY y.RangeStart DESC) AS RunningTotal_NumOfDays
FROM (
SELECT LEAD(x.[End]) OVER(ORDER BY x.[End] DESC) AS RangeStart, -- It's previous date because of "ORDER BY x.[End] DESC"
x.[Start] AS RangeStop
FROM (
SELECT #SomeEndDate AS [Start], '9999-12-31' AS [End]
UNION ALL
SELECT x.[Start], x.[End]
FROM #MyTable AS x
WHERE x.[Status] = 'Illness'
AND x.[End] <= #SomeEndDate
) x
) y
) z
WHERE RunningTotal_NumOfDays - Current_NumOfDays <= #NumDays;
/*
Output:
RangeStop RangeStart
---------- ----------
2014-03-10 2014-03-09
2014-03-08 2014-01-01
2013-05-05 2013-05-03
*/
Note #1: LEAD(End) will return the previous End date (previous because of ORDER BY End DESC)
Note #2: DATEDIFF(DAY, RangeStart, RangeStop) computes the num. of days between current start (alias x.RangeStop) and "previous" end (alias x.RangeStar) => Current_NumOfDays
Note #3: SUM( Current_NumOfDays ) computes a running total thus: 1 + 66 + (3)
Note #4: I've used #NumOfDays = 70 (not 60)

Compare 2 subsets of data from table?

I'm not sure if this is possible - I'm having real trouble getting my head around it.
This is for a product schedule, showing how much we are expecting to deliver on a given date. Data is imported into this schedule weekly which creates a new entry.
For example, if the schedule for the day currently totals 10, and you import 15, a new row is inserted with Qty 5, bringing the sum to 15.
The data I have is like so:
Product | Delivery Required Date | Qty
Prod1 | 1/1/13 | 10
Prod1 | 1/1/13 | -10
Prod1 | 1/1/13 | 10
Prod1 | 1/1/13 | -10
Prod1 | 1/1/13 | 25
I want to design a query which shows the variance between the previous schedule, and the current schedule.
For example, the query will sum all of the rows "Qty", excluding the last entry - and compare it to the last entry. In the data above, the variance is 25 (Existing total was 0, latest entry is 25, 0+25 =25).
Is this possible?
Thanks
I suspect there'a better answer using Common Table Expressions, but a quick & ugly solution might be
select sum(case when EntryNo <> MAX(EntryNo) then Qty else 0 end) as 'sumLessLast'
from MyTable
If MyTable has a million rows in it you'll want a better solution.
SqlServer 2005 and 2008:
;with r1 as (
select DeliveryReqDate, sum(Qty) as TotalQty
from TableName
group by DeliveryReqDate)
, r2 as (
select DeliveryReqDate, Qty
, row_number() over (partition by DeliveryReqDate order by EntryNo desc) rn
from TableName)
select r1.DeliveryReqDate, r1.TotalQty, r2.Qty as LastQty
, r1.TotalQty - r2.Qty as TotalButLastQty
from r1
join r2 on r2.DeliveryReqDate = r1.DeliveryReqDate and r2.rn = 1
SqlServer 2012
;with r1 as (
select DeliveryReqDate, Qty
, sum(Qty) over (partition by DeliveryReqDate) as TotalQty
, row_number() over (partition by DeliveryReqDate order by EntryNo desc) rn
from TableName)
select DeliveryReqDate, TotalQty, Qty as LastQty
, TotalQty - Qty as TotalButLastQty
from r1
where rn = 1
I'm not sure that I completely understand logic regarding the accounting of product and date, but I hope you can adapt above queries to your needs.

Count two Columns with two Where Clauses

I know it's just late in the day and my brain is just fried....
Using Teradata, I need to COUNT DISTINCT MEMBERS that haven't had a TRANS in the past six months and also COUNT the number of TRANS they had historically (prior to the six months). We can just assume the cutoff date to be 01/01/2012. All table is contained in a single table.
For example:
Member | Tran Date
123 | 01/01/2011
789 | 06/01/2011
123 |10/31/2011
678 | 04/03/2011
789 | 06/01/2012
So 2 members had a total of 3 transactions dated prior to 1/1/2012 with no transactions later than 1/1/2012.
In this example, my result would be:
MEMBERS | TRANS
2 | 3
Try this solution:
SELECT
COUNT(DISTINCT member_id) AS MEMBERS,
COUNT(*) AS TRANS
FROM
tbl
WHERE
member_id NOT IN
(
SELECT DISTINCT member_id
FROM tbl
WHERE trans_date > '2012-01-01'
)
You can't do it in one SQL statement. Use subqueries. This is TSQL coz I am unfamiliar with Teradata.
DECLARE #CUTOFF DATETIME = DATEADD(MO,-6,GETDATE()) --6MTHS AGO
SELECT COUNT(MEMBERID) AS MEMBERS, SUM(TRANSCOUNT) AS TRANS FROM (
SELECT DISTINCT
MEMBERID,
(SELECT COUNT(*) TRANSDATE WHERE TRANSDATA.MEMBERID = MEMBER.MEMBERIF) AS TRANSCOUNT
FROM MEMBER WHERE NOT EXISTS
(SELECT * FROM TRANSDATA, MEMBER WHERE
TRANSDATA.MEMBERID = MEMBER.MEMBERIF
AND TRANDATE > #CUTOFF)
)