How do I identify events that occurred in consecutive years? - sql

I am trying to figure out if an event occurred in the three consecutive previous years by month. For example:
Item Type Month Year
Hat S May 2015
Shirt P June 2015
Hat S June 2015
Hat S May 2016
Shirt P May 2016
Hat S May 2017
I am interested in seeing what item was purchased/sold for three consecutive years in the same month. Hat was sold in May in 2015, 2016, and 2017; therefore, I would like to identify that. Shirt was purchased in June 2015 and May 2016. Since this is different months in consecutive years, it does not qualify.
Essentially, I want it to be able to look back 3 years and identify those purchases/sales that reoccurred in the same month each year, preferably with an indicator variable.
I tried the following code:
select distinct a.*
from dataset as a inner join dataset as b
on a.type = b.type
and a.month = b.month
and a.item = b.item
and a.year = b.year-1
and a.year = b.year-2;
I want to get:
Item Type Month Year
Hat S May 2015
Hat S May 2016
Hat S May 2017
I guess I should add that my data is longer than 2015-2017. It spans 10 years, but I want to see if there are any 3 consecutive years (or more) within that 10 year span.

There are many ways to do this, however, one way in SQL, with the key understanding that rows can be grouped by Item and Month, is to restrict Year to the three years between 2015 and 2017. In order to qualify for 3 consecutive the count of the distinct values of year within the group should be 3. Such criteria will handle data with repetition, such as a group with 3 S-type Hats and 3 P-type Hats.
select item, type, month, year
from have
where year between 2015 and 2017
group by item, month
having count(distinct year) = 3
order by item, type, month, year
For the more generic problem of identifying runs within a group, SAS Data step is very suited and powerful. The serial DOW loop technique loops first over a range of rows based on some condition, whilst computing a group metric -- in this case, consecutive year runlength. A second loops over the same rows and utilizes the group metric within.
Consider this example in which the rungroup is computed based on year adjacency of item/month. Once the rungroups are established, the double DOW technique is applied.
data have;
do comboid = 1 to 1000;
itemid = ceil(10 * ranuni(123));
typeid = ceil(2* ranuni(123));
month = ceil(12 * ranuni(123));
year = 2009 + floor (10 * ranuni(123));
output;
end;
run;
proc sort data=have;
by itemid month year;
run;
data have_rungrouped;
set have;
by itemid month year;
rungroup + (first.month or not first.month and year - lag(year) > 1);
run;
data want;
do index = 1 by 1 until (last.rungroup);
set have_rungrouped;
by rungroup;
* distinct number of years in rungroup;
years_runlength = sum (years_runlength, first.rungroup or year ne lag(year));
end;
do index = 1 to index;
set have_rungrouped;
if years_runlength >= 3 then output;
end;
run;

Here is an example that would check if any item happened in consecutive years and list all from original table that qualify for at least two consecutive years:
DECLARE #table TABLE
(
Item NVARCHAR(MAX),
Type CHAR,
Month NVARCHAR(MAX),
Year INT
)
INSERT INTO #table VALUES
('Hat','S','May','2015'),
('Shirt','P','June','2015'),
('Hat','S','June','2015'),
('Hat','S','May','2016'),
('Shirt','P','May','2016'),
('Hat','S','May','2017')
SELECT * FROM #table
WHERE CONCAT(Item,Month) IN
(
SELECT CONCAT(group1.Item, group1.Month) FROM
(
SELECT Item,Year,Month FROM #table
GROUP BY Year, Item, Month
) group1
FULL OUTER JOIN
(
SELECT Item,Year,Month FROM #table
GROUP BY Year, Item, Month
) group2
ON group1.Year = group2.Year + 1 AND group1.Item = group2.Item AND group1.Month = group2.Month
WHERE group1.Item IS NOT NULL AND group2.Item IS NOT NULL
)
ORDER BY Item,Month,Year
As you can see I found all items that matched year + 1 in the same month.
OUTPUT:
Hat S May 2015
Hat S May 2016
Hat S May 2017

Related

Teradata loop for dates, column adding within loop

I have a table where every row is transaction and there are few columns: clients IDs and dates for every transaction.
I am trying to write a query which will give a table where column N shows number of clients whose first transaction happened in month N made transactions in months: N, N+1, N+2, ...
For example (desired table for 3 months data):
1 2 3
100 90 78
80 80
60
First row of the column 1 shows number of clients whose first transaction happened in month 1, second row shows how many of this clients stayed after 1 month, third row - after two month etc
My current query (Year is a column wit year for the date, like 2017, month is a number of month like 1 for January):
WITH not_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date<date "2017-01-01"),
ID_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date BETWEEN date "2017-01-01" AND date "2017-01-31"
),
from_this AS(
SELECT ID, Year, month
FROM table
)
SELECT Year, Month, count(distinct ID)
FROM from_this
WHERE ID IN (select ID from ID_in)
AND
ID NOT IN (select ID from not_in)
GROUP BY 1,2
ORDER BY 1,2
But this gives only one column (for January 2017) of the desired table. I need to change dates for other months in 2017, 2018 and so on manually.
How to avoid this?
I guess, it should be looped somehow. And I think, I should create volatile table and add columns to it within loop, then select * from it.
Also I can not find an instruction for variables declaration and while loops in Teradata, any clearifications are appreciated.

Get consecutive months and days difference from date range?

So let's say I have a table like this:
subscriber_id
package_id
package_start_date
package_end_date
package_price_per_day
1081
231
2014-01-13
2014-12-31
$3.
1084
231
2014-03-21
2014-06-05
$3
1086
235
2014-06-21
2014-09-09
$4
Now I want the result for top 3 packages based on total revenue for each month for year 2014.
Note: For example for package 231 Revenue should be calculated such as 18 days of Jan * $3 +
28 days of feb * $3 + .... and so on.
For the second row the calculation would be same as first row (9 days of March* $3 + 30 days of April *$3 ....)
On the result the package should group by according to month and show rank depending on total revenue.
Sample result:
Month
Package_id
Revenue
Rank
Jan
231.
69499
1.
Jan.
235.
34345.
2.
Jan.
238.
23455.
3.
Feb.
231.
89274
1.
I wrote a query to filter the dates so that I get the active subscriber throughout the year 2014 (since initially there were values from different years),which shows the first table in the question, but I am not sure how do I break the months and days afterwards.
select subscriber_id, package_id, package_start_date, package_end_date
from (
select subscriber_id, package_id
, case when year(package_start_date) < '2014' then package_start_date = '01-Jan-2014' else package_start_date end as package_start_date
, case when year(package_start_date) > '2014' then package_end_date = '31-Dec-2014' else package_start_date end as package_end_date
, price_per_day
from subscription
) a
where year(package_start_date) = '2014' and year(package_end_date) = '2014'
Please do not emphasize on syntax - I am just trying to understand the logical approach in SQL.
Suppose you have a table that is a list of unique dates in a column called d, and the table is called d
It is then relatively trivial to do
SELECT *
FROM t
INNER JOIN d on d.d >= t.package_start_date AND d.d < t.package_end_date
Assuming you class a start date of jan 1 and an end date of jan 2 as 1 day. If you class as two, use <=
This will cause your package rows to multiply into the number of days, so start and end days of jan 1 and jan 11 would mean that row repeats 10 times. The d.d date is different on every row and you can extract the month from d.d and then group on it to give you totals for each month per package
Suppose you've CTEd that query above as x, it's like
SELECT DATEPART(month, x.dd), --the d.d date
package_id,
SUM(revenue)
FROM x
GROUP BY DATEPART(month, x.dd), package_id
Because the rows from T are repeated by Cartesian explosion when joined to d, you can safely group them or aggregate them to get them back to single values per month per package. If you have packages that stay with you more than a year you should also group on datepart year, to avoid mixing up the months from packages that stay from eg jan 2020 to feb 2021(they stay for two jans and two febs)
Then all you need to do is add the ranking of the revenue in, which looks like it would go in at the first step with something like
RANK(DATEDIFF(DAY, start, end)*revenue) OVER(PARTITION BY package_id)
I think I understand it correctly that you rank packages on total revenue over the entire period rather than per month.. look up the difference between rank and dense rank too as you may want dense instead

PROC SQL loop: looping over months in a year?

I am having trouble creating an efficient way to use PROC SQL in SAS to gather monthly data for 4 years (2017, 2018, 2019, through now of 2020).
My current (shortened) code:
PROC SQL;
select
count(VAL1) as name1, sum(VAL2) as name2
from table tbl
WHERE tbl.dte >= '20170101' and tbl.dte < '20170201'
);
I am currently just using a copy and paste method over and over, but I would need to do this over a hundred times, for four tables (equaling to about 500 times).
Is there a more efficient way to do this?
How about aggregation?
select year(tbl.dte), month(tbl.dte), count(VAL1) as name1, sum(VAL2) as name2
from table tbl
where tbl.dte >= '20170101'
group by year(tbl.dte), month(tbl.dte)
Aggregate computations can be performed using SQL or a statistics procedure such as Proc MEANS.
Consider a data set containing individual locations reports of daily sales for 50 locations. The aggregation would be total sales (over all locations) at the monthly level. Management also wants to know the number of sales reports in the month.
Example:
data raw(label='Some US Census sales data as basis of simulation' keep=_q _sales);
* some data copied from https://www.census.gov/retail;
format _q yyq.; input #1 q 1. #13 y 4. _sales: comma8.;
_q = yyq(y,q);
datalines;
4th quarter 2019 1,464,339
3rd quarter 2019 1,381,537
2nd quarter 2019 1,377,288
1st quarter 2019 1,241,540
4th quarter 2018 1,407,934
3rd quarter 2018 1,323,360
2nd quarter 2018 1,332,848
1st quarter 2018 1,219,133
4th quarter 2017 1,361,001
3rd quarter 2017 1,262,868
2nd quarter 2017 1,266,215
1st quarter 2017 1,156,810
4th quarter 2016 1,294,590
3rd quarter 2016 1,217,376
2nd quarter 2016 1,218,921
1st quarter 2016 1,120,887
4th quarter 2015 1,253,997
3rd quarter 2015 1,193,142
2nd quarter 2015 1,194,480
1st quarter 2015 1,084,374
4th quarter 2014 1,231,471
3rd quarter 2014 1,170,225
2nd quarter 2014 1,177,252
1st quarter 2014 1,060,492
run;
proc sort data=raw; by _q;
run;
data have(label='Simulate some activity to be summarized');
set raw;
days = intnx('quarter',_q,0,'E') - _q;
_x = _sales / 50 / days;
do date = _q to _q + days;
datestring = put (date, yymmddn8.);
do storeid = 1 to 50;
reportid + 1;
sales = round(_x - 25 + rand('uniform', 50));
output;
end;
end;
keep datestring storeid reportid sales;
run;
* Compute monthly aggregates - SQL way;
proc sql;
create table want as
select
intnx('month', date, 0) as month format=yymon7.
, count(reportid) as report_count format=comma7.
, sum(sales) as month_sales format=dollar12.
from
(
select
input(datestring,yymmdd8.) as date,
have.*
from have
) have /* this is now an alias for outer scope */
group by calculated month
;
* convert datestring to date value;
data have_v / view=have_v;
set have;
date = input (datestring,yymmdd8.); format date yymmddn8.;
run;
* Compute monthly aggregates - MEANS way;
* Grouping occurs at the formatted values of the BY variables(s);
* The date format yymon7. is used to force aggregation by month;
proc means noprint data=have_v;
by date;
var sales reportid;
format date yymon7.;
output out=monthly_summary n(reportid)=count sum(sales)=sales;
format sales dollar12. count comma7.;
run;
Since your strings are in YYYYMMDD order just take the first 6 characters to get a distinct value for each month.
select substr(dte,1,6) as month
, count(VAL1) as name1
, sum(VAL2) as name2
from have
group by month
;

Customers that stopped ordering monthly-SQL

I am trying to write an SQL query that shows STORES that stopped ordering in a month. That would be STORES that have orders the month before but no orders that month. For example STORES that have orders in January but do Not have orders in Febuary (these would be the STORES that stopped ordering for Febuary). I want to do this for every month (grouped) for a given date range - #datefrom-#dateto
I have one table with an INVOICE#,STORE# and a DATE column
I guess distinct STORE would be in there somewhere.
You can try something like this, break them into two select statements and left outer join them.
select table1.stores from (select * from table where date = 'January') as table1
left outer join (select * from table where date = 'Feburary') as table2
on table1.invoice= table2.invoice
this will return the unique results in January that does not match the results from February
ps. that was not an exact sql statement, just an idea
I have an example that might be close to what you desire. You may have to tweak it to your convenience and desired performance - http://sqlfiddle.com/#!3/231c4/15
create table test (
invoice int identity,
store int,
dt date
);
-- let's add some data to show that
-- store 1 ordered in Jan, Feb and Mar
-- store 2 ordered in Jan (missed Feb and Mar)
-- store 3 ordered in Jan and Mar (missed Feb)
insert into test (store, dt) values
(1, '2015-01-01'),(1, '2015-02-01'),(1, '2015-03-01'),
(2, '2015-01-01'),
(3, '2015-01-01'), (3, '2015-03-01');
Query
-----
with
months as (select distinct year(dt) as yr, month(dt) as mth from test),
stores as (select distinct store from test),
months_stores as (select * from months cross join stores)
select *
from months_stores ms
left join test t
on t.store = ms.store
and year(t.dt) = ms.yr
and month(t.dt) = ms.mth
where
(ms.yr = 2015 and ms.mth between 1 and 3)
and t.invoice is null
Result:
yr mth store ...other columns
2015 2 2
2015 2 3
2015 3 2
The results show us that store 2 missed orders in months Feb and Mar
and store 3 missed an order in Feb

SQL Query Problem

When I am doing SQL Query on the database then all the months that are there in database and all the values corresponding to that particular month will be summed up in the Amount Column.
Suppose this is a table
Month Category Amount Year
January Rent 12 2011
March Food 13 2011
January Gas 14 2011
May Enter 15 2011
March General 16 2011
So I written the query to sum all the values of a particular month by using this:-
"SELECT Month, SUM(Amount) AS OrderTotal FROM budget1 WHERE year="2011" GROUP BY month "
So I got the result as this:-
Month Amount
January 26
March 29
May 15
But I want is that it should show all the months from January to December and Value of 0 infront of those month which are not there in the database like this for above example.
Month Amount
January 26
February 0
March 29
April 0
May 15
June 0
July 0
August 0
September 0
October 0
November 0
December 0
Any help will be appreciated..!!
Create a table with all months Jan-Dec, call it Months. Just a single column with the names or add an extra integer for sort order (I usually call this the ordinal column), as follows:
create table months (
month varchar(20),
ordinal
);
insert into months values ('January', 1);
insert into months values ('February', 2);
insert into months values ('March', 3);
...
insert into months values ('December', 12);
The specific syntax may depend upon your database platform. Then, depending upon your database:
SELECT months.Month, SUM(Amount) AS OrderTotal
FROM months
left join budget1
on months.month = budget1.Month
WHERE year="2011" or year is null
GROUP BY months.month, months.ordinal
ORDER by month.ordinal
You'll need to convert SUM(Amount) to 0 when null. The specific function or approach to do this depends upon your database platform, or you can just do it in the code that is interpreting the results.
Build a month table, with your months and the sort order. Then left join your month column to the month column in your data table. That will get you the zeros.
So your table will look like
Month Sort
======================
January 1
February 2
March 3
etc.
You can create the table by using Create Table, following by Insert Scripts
CREATE TABLE #months (month VARCHAR(50), sort INT);
INSERT INTO #months VALUES ('January', 1);
INSERT INTO #months VALUES ('February', 2);
etc.
Then
SELECT m.Month, SUM(Amount) AS OrderTotal
FROM #months m LEFT OUTER JOIN budget1 on m.Month = budget1.Month
WHERE year=2011
GROUP BY m.Month
ORDER BY m.Sort